A heuristic is a practical problem-solving approach that trades optimality, completeness, or precision for speed. Derived from the Greek word heuriskein, meaning "to discover," heuristics are rules of thumb, educated guesses, or intuitive strategies that produce good-enough solutions when exact methods are too slow, too expensive, or simply unavailable. In artificial intelligence and machine learning, heuristics guide everything from search algorithms and game-playing engines to hyperparameter tuning and model architecture design.
Imagine you lose a toy somewhere in your house. You could search every single room, open every drawer, and look under every piece of furniture until you find it. That would take a very long time, but you would definitely find it. A heuristic is like thinking, "I usually play with that toy in the living room, so I'll look there first." You might not find it on the first try, but most of the time you will, and it saves you a lot of effort. In computer science, heuristics work the same way: they help computers make smart guesses so they can solve problems much faster, even if the answer is not always perfect.
The terms "heuristic" and "algorithm" are sometimes used interchangeably, but they differ in important ways.
| Property | Algorithm | Heuristic |
|---|---|---|
| Guarantee | Produces a correct or optimal solution | Produces a satisfactory (possibly suboptimal) solution |
| Completeness | Explores the full solution space if needed | Explores a promising subset of the solution space |
| Speed | Can be slow on large or complex problems | Designed to run quickly, even on hard problems |
| Use case | Sorting, shortest path in small graphs, arithmetic | NP-hard optimization, real-time decision making, large search spaces |
| Example | Dijkstra's shortest-path algorithm | Greedy nearest-neighbor tour for the Traveling Salesman Problem |
An algorithm is a step-by-step procedure that guarantees a correct result when executed properly. A heuristic sacrifices that guarantee in exchange for computational efficiency. In practice, many systems combine both: an algorithm provides the framework, while heuristics steer it toward promising regions of the search space.
Heuristic search is one of the foundational topics in artificial intelligence. Instead of blindly exploring every possible state (as in uninformed search), informed search algorithms use a heuristic function h(n) to estimate how close a given state n is to the goal.
Greedy best-first search expands the node that appears closest to the goal according to the heuristic function h(n). It ignores the cost already incurred to reach the current node, which makes it fast but prone to finding suboptimal paths. The algorithm is not complete in all cases, because it can follow a path into a dead end without backtracking.
The A* search algorithm combines the actual path cost g(n) with the heuristic estimate h(n) into a single evaluation function:
f(n) = g(n) + h(n)
A* is both complete and optimal when h(n) is an admissible heuristic. It is widely used in pathfinding (video games, robotics, GPS navigation) and planning problems.
A heuristic h(n) is admissible if it never overestimates the true cost to reach the goal from node n. Formally, for every node n:
h(n) ≤ h(n)*
where h(n)* is the actual cost of the optimal path from n to the goal. Admissibility is what guarantees A* will find the shortest path. A stronger property, consistency (also called monotonicity), requires that the heuristic estimate decreases by no more than the actual step cost along any edge. Consistent heuristics are always admissible, and they prevent A* from needing to re-expand nodes.
A classic example is pathfinding on a grid: the straight-line (Euclidean) distance to the goal is admissible because no path can be shorter than a straight line.
In adversarial search (for example, chess or Go), the minimax algorithm with alpha-beta pruning explores the game tree to choose moves. Because it is impossible to search the entire tree for complex games, the search is cut off at a certain depth, and a heuristic evaluation function estimates the value of the resulting board position.
In chess, a classic evaluation function might look like:
eval = c1 * material + c2 * mobility + c3 * king_safety + c4 * center_control + c5 * pawn_structure
Each term is a weighted difference between the two players. The weights c1, c2, ... are tuned through experience or automated search. More modern engines use neural networks trained on millions of games to produce evaluation scores, but the underlying idea is the same: assign a numeric estimate of how "good" a position is without playing the game to completion.
Machine learning relies heavily on heuristics at every stage of the modeling pipeline, from data preparation to deployment.
The 80/20 train-test split is one of the most widely used rules of thumb. It allocates 80% of the data for training and 20% for testing, loosely inspired by the Pareto principle. For smaller datasets (under 100,000 samples), practitioners often prefer k-fold cross-validation to make better use of limited data.
Selecting hyperparameters is often guided by heuristics before any automated search begins.
| Hyperparameter | Common Heuristic |
|---|---|
| Learning rate | Start with 0.001 for Adam; 0.01 for SGD. The learning rate is considered the single most important hyperparameter to tune. |
| Batch size | Start at 32 and increase in powers of 2 (64, 128, 256). Larger batches use GPU memory more efficiently; smaller batches add a regularizing effect. |
| Number of hidden layers | Start with 1 or 2 layers; add more if underfitting. Deeper networks generalize better but are harder to optimize. |
| k in k-nearest neighbors | Use k = sqrt(N), where N is the number of training samples. Larger k reduces noise sensitivity; smaller k captures local patterns. |
| Number of trees in random forest | Start with 100 or 500 trees. Increasing beyond a certain point yields diminishing returns. |
| Dropout rate | 0.5 for hidden layers, 0.2 for input layers. Less effective on very small or very large datasets. |
| Momentum | Start with 0.9. Values of 0.5, 0.9, and 0.99 are common checkpoints to try. |
These starting points save time by narrowing the search space before more systematic methods like grid search, random search, or Bayesian optimization take over.
Feature engineering is the process of creating or transforming input variables to improve model performance. Common heuristics include:
Deep learning practitioners follow several rules of thumb when designing neural network architectures:
Metaheuristics are higher-level strategies that guide subordinate heuristic procedures to explore large and complex search spaces more effectively. Unlike problem-specific heuristics, metaheuristics are general-purpose frameworks that can be applied to a wide range of optimization problems. They draw inspiration from natural processes in physics, biology, and social behavior.
Simulated annealing mimics the metallurgical process of slowly cooling a material to reduce defects. The algorithm starts with a high "temperature" that allows it to accept worse solutions with significant probability, helping it escape local optima. As the temperature decreases over time, the algorithm becomes increasingly selective, eventually converging to a near-optimal solution. Simulated annealing is effective for combinatorial problems like the Traveling Salesman Problem and circuit design.
A genetic algorithm maintains a population of candidate solutions ("chromosomes") that evolve over successive generations through three operators inspired by biological evolution:
Genetic algorithms are used in feature selection, hyperparameter tuning, scheduling, and engineering design optimization.
| Metaheuristic | Inspiration | Key Idea |
|---|---|---|
| Tabu search | Memory-based | Maintains a list of recently visited solutions to prevent cycling and force exploration of new regions |
| Particle swarm optimization | Bird flocking | Candidate solutions ("particles") move through the search space, attracted toward the best positions found by themselves and their neighbors |
| Ant colony optimization | Ant foraging | Artificial ants deposit "pheromone" on promising paths, reinforcing routes that lead to good solutions |
| Differential evolution | Population-based | Creates new candidates by combining differences between existing population members |
The concept of heuristics extends beyond computer science into cognitive psychology. In the 1970s, psychologists Daniel Kahneman and Amos Tversky published their landmark paper "Judgment Under Uncertainty: Heuristics and Biases" (1974), identifying three major cognitive heuristics that people rely on when making decisions under uncertainty:
These cognitive heuristics are "highly economical and usually effective," as Kahneman and Tversky noted, but they lead to systematic and predictable errors (cognitive biases). Kahneman received the 2002 Nobel Memorial Prize in Economic Sciences for this work, which has influenced fields from behavioral economics to the design of AI systems. Understanding human heuristic biases has also informed research on bias in machine learning models.
Heuristics are used across virtually every domain of computer science and artificial intelligence: