Heuristic
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v4 ยท 3,190 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v4 ยท 3,190 words
Add missing citations, update stale details, or suggest a clearer explanation.
A heuristic is a practical problem-solving approach that trades optimality, completeness, accuracy, or precision for speed, producing a good-enough answer when an exact method would be too slow, too expensive, or unavailable [1][9]. In artificial intelligence, the most common technical use is the heuristic function h(n), which Russell and Norvig define as the "estimated cost of the cheapest path from node n to a goal node" [2]. A heuristic function is what turns a blind, uninformed search into an informed (heuristic) search: it lets algorithms such as A* search and greedy best-first search head toward a goal instead of expanding every state. Derived from the Greek word heuriskein, meaning "to discover," heuristics are rules of thumb, educated guesses, or intuitive strategies that guide search, game-playing engines, hyperparameter tuning, and architecture design.
Imagine you lose a toy somewhere in your house. You could search every single room, open every drawer, and look under every piece of furniture until you find it. That would take a very long time, but you would definitely find it. A heuristic is like thinking, "I usually play with that toy in the living room, so I'll look there first." You might not find it on the first try, but most of the time you will, and it saves you a lot of effort. In computer science, heuristics work the same way: they help computers make smart guesses so they can solve problems much faster, even if the answer is not always perfect.
The word predates computers. In his 1945 book How to Solve It, the mathematician George Polya revived the term and wrote that "the aim of heuristic is to study the methods and rules of discovery and invention" [10]. In modern computer science the definition narrows to a technique "designed for solving a problem more quickly when classic methods are too slow for finding an exact or approximate solution, or when classic methods fail to find any exact solution in a search space," which "is achieved by trading optimality, completeness, accuracy, or precision for speed" [9].
The single most important property of a heuristic is therefore the one it gives up: a guarantee. A heuristic may return a suboptimal answer, and there is often no efficient way to know how far from optimal it is. In exchange, it scales to problems that exact algorithms cannot touch within a reasonable time.
The terms "heuristic" and "algorithm" are sometimes used interchangeably, but they differ in important ways.
| Property | Algorithm | Heuristic |
|---|---|---|
| Guarantee | Produces a correct or optimal solution | Produces a satisfactory (possibly suboptimal) solution |
| Completeness | Explores the full solution space if needed | Explores a promising subset of the solution space |
| Speed | Can be slow on large or complex problems | Designed to run quickly, even on hard problems |
| Use case | Sorting, shortest path in small graphs, arithmetic | NP-hard optimization, real-time decision making, large search spaces |
| Example | Dijkstra's shortest-path algorithm | Greedy nearest-neighbor tour for the Traveling Salesman Problem |
An algorithm is a step-by-step procedure that guarantees a correct result when executed properly. A heuristic sacrifices that guarantee in exchange for computational efficiency. In practice, many systems combine both: an algorithm provides the framework, while heuristics steer it toward promising regions of the search space.
Heuristic search is one of the foundational topics in artificial intelligence. Instead of blindly exploring every possible state (as in uninformed search), informed (heuristic) search algorithms use a heuristic function h(n) to estimate how close a given state n is to the goal. Russell and Norvig call heuristic functions "the most common form in which additional knowledge of the problem is imparted to the search algorithm," subject to one constraint: if n is a goal node, then h(n) = 0 [2].
Greedy best-first search expands the node that appears closest to the goal according to the heuristic function h(n), evaluating nodes by f(n) = h(n) alone [2]. It ignores the cost already incurred to reach the current node, which makes it fast but prone to finding suboptimal paths. The algorithm is not complete in the general case, because it can follow a path into a dead end without recovering. Its worst-case time and space complexity is O(b^m), where b is the branching factor and m is the maximum depth of the search space; a good heuristic can reduce this substantially [2].
The A* search algorithm combines the actual path cost g(n) with the heuristic estimate h(n) into a single evaluation function:
f(n) = g(n) + h(n)
Here f(n) is the estimated cost of the cheapest solution through n. A* is both complete and optimal when h(n) is an admissible heuristic [2]. It was introduced in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael of the Stanford Research Institute in the paper "A Formal Basis for the Heuristic Determination of Minimum Cost Paths," developed as part of the Shakey mobile-robot project [3]. A* is widely used in pathfinding (video games, robotics, GPS navigation) and planning problems.
A heuristic h(n) is admissible if it never overestimates the true cost to reach the goal from node n [2]. Formally, for every node n:
h(n) <= h(n)*
where h(n)* is the actual cost of the optimal path from n to the goal. Admissibility is what guarantees A* (with tree search) will find the shortest path, because an admissible heuristic is by nature optimistic: it never thinks the cost of solving the problem is more than it actually is [2].
A stronger property, consistency (also called monotonicity), requires that the heuristic estimate decrease by no more than the actual step cost along any edge. Formally, for every node n and every successor n' reached by action a:
h(n) <= c(n, a, n') + h(n')
This is a form of the triangle inequality [2]. Russell and Norvig note that "every consistent heuristic is also admissible," and that consistency is what makes A* optimal when it uses graph search rather than tree search, by guaranteeing the f-values along any path are nondecreasing [2]. Consistent heuristics therefore prevent A* from having to re-expand nodes.
A classic example is pathfinding on a grid: the straight-line (Euclidean) distance to the goal is admissible because no path can be shorter than a straight line, and Russell and Norvig point out it is also consistent [2].
The sliding-tile 8-puzzle is the canonical teaching example for search heuristics [2]. A randomly generated instance has an average solution cost of about 22 moves and a branching factor of roughly 3 (two to four legal moves depending on where the blank tile sits), so a brute-force search to depth 22 would examine on the order of 3.1 x 10^10 states. Because only 9!/2 = 181,440 of those states are actually reachable, repeated-state checking cuts the work dramatically, but a good heuristic still matters [2]. Two admissible candidates dominate the literature:
| Heuristic | Definition | Value for the AIMA example | Admissible? |
|---|---|---|---|
| h1 (misplaced tiles) | Number of tiles not in their goal position | 8 (all eight tiles out of place) | Yes [2] |
| h2 (Manhattan / city-block distance) | Sum of the horizontal and vertical distances of each tile from its goal | 18 | Yes [2] |
Neither overestimates the example's true solution cost of 26 [2]. Crucially, h2(n) >= h1(n) for every node, so h2 is said to dominate h1, and "domination translates directly into efficiency": A* using h2 never expands more nodes than A* using h1 [2]. The payoff is large. Russell and Norvig report that on randomly generated 8-puzzles with solution length 14, "A* with h2 is 30,000 times more efficient than uninformed iterative deepening search" [2].
This example also reveals a general recipe for inventing admissible heuristics: relax the problem. The exact solution cost of a relaxed problem (one with fewer restrictions on the moves) is always an admissible heuristic for the original. Misplaced-tile count is the exact cost when a tile may teleport anywhere; Manhattan distance is the exact cost when a tile may slide onto an occupied square [2].
In adversarial search (for example, chess or Go), the minimax algorithm with alpha-beta pruning explores the game tree to choose moves. Because it is impossible to search the entire tree for complex games, the search is cut off at a certain depth, and a heuristic evaluation function estimates the value of the resulting board position.
In chess, a classic evaluation function might look like:
eval = c1 * material + c2 * mobility + c3 * king_safety + c4 * center_control + c5 * pawn_structure
Each term is a weighted difference between the two players. The weights c1, c2, ... are tuned through experience or automated search. More modern engines use neural networks trained on millions of games to produce evaluation scores, and programs such as AlphaZero pair a learned value function with Monte Carlo tree search rather than alpha-beta, but the underlying idea is the same: assign a numeric estimate of how "good" a position is without playing the game to completion.
Machine learning relies heavily on heuristics at every stage of the modeling pipeline, from data preparation to deployment.
The 80/20 train-test split is one of the most widely used rules of thumb. It allocates 80% of the data for training and 20% for testing, loosely inspired by the Pareto principle. For smaller datasets (under 100,000 samples), practitioners often prefer k-fold cross-validation to make better use of limited data.
Selecting hyperparameters is often guided by heuristics before any automated search begins.
| Hyperparameter | Common Heuristic |
|---|---|
| Learning rate | Start with 0.001 for Adam; 0.01 for SGD. The learning rate is considered the single most important hyperparameter to tune. |
| Batch size | Start at 32 and increase in powers of 2 (64, 128, 256). Larger batches use GPU memory more efficiently; smaller batches add a regularizing effect. |
| Number of hidden layers | Start with 1 or 2 layers; add more if underfitting. Deeper networks generalize better but are harder to optimize. |
| k in k-nearest neighbors | Use k = sqrt(N), where N is the number of training samples. Larger k reduces noise sensitivity; smaller k captures local patterns. |
| Number of trees in random forest | Start with 100 or 500 trees. Increasing beyond a certain point yields diminishing returns. |
| Dropout rate | 0.5 for hidden layers, 0.2 for input layers. Less effective on very small or very large datasets. |
| Momentum | Start with 0.9. Values of 0.5, 0.9, and 0.99 are common checkpoints to try. |
These starting points save time by narrowing the search space before more systematic methods like grid search, random search, or Bayesian optimization take over. Bergstra and Bengio showed that even random search over hyperparameters is more efficient than grid search, because for most datasets only a few hyperparameters really matter [7].
Feature engineering is the process of creating or transforming input variables to improve model performance. Common heuristics include:
Deep learning practitioners follow several rules of thumb when designing neural network architectures [8]:
Metaheuristics are higher-level strategies that guide subordinate heuristic procedures to explore large and complex search spaces more effectively. Unlike problem-specific heuristics, metaheuristics are general-purpose frameworks that can be applied to a wide range of optimization problems. They draw inspiration from natural processes in physics, biology, and social behavior.
Simulated annealing mimics the metallurgical process of slowly cooling a material to reduce defects. It was introduced by Kirkpatrick, Gelatt, and Vecchi in a 1983 Science paper that drew an explicit analogy between statistical mechanics and combinatorial optimization [4]. The algorithm starts with a high "temperature" that allows it to accept worse solutions with significant probability, helping it escape local optima. As the temperature decreases over time, the algorithm becomes increasingly selective, eventually converging to a near-optimal solution. Simulated annealing is effective for combinatorial problems like the Traveling Salesman Problem and circuit design.
A genetic algorithm maintains a population of candidate solutions ("chromosomes") that evolve over successive generations through three operators inspired by biological evolution. The approach was formalized by John Holland in his 1975 book Adaptation in Natural and Artificial Systems [5]:
Genetic algorithms are used in feature selection, hyperparameter tuning, scheduling, and engineering design optimization.
| Metaheuristic | Inspiration | Key Idea |
|---|---|---|
| Tabu search | Memory-based | Maintains a list of recently visited solutions to prevent cycling and force exploration of new regions; introduced by Fred Glover in 1986, the paper that also coined the term "metaheuristic" [6] |
| Particle swarm optimization | Bird flocking | Candidate solutions ("particles") move through the search space, attracted toward the best positions found by themselves and their neighbors |
| Ant colony optimization | Ant foraging | Artificial ants deposit "pheromone" on promising paths, reinforcing routes that lead to good solutions |
| Differential evolution | Population-based | Creates new candidates by combining differences between existing population members |
The concept of heuristics extends beyond computer science into cognitive psychology. In the 1970s, psychologists Daniel Kahneman and Amos Tversky published their landmark paper "Judgment Under Uncertainty: Heuristics and Biases" (1974), identifying three major cognitive heuristics that people rely on when making decisions under uncertainty [1]:
As Tversky and Kahneman put it, "these heuristics are highly economical and usually effective, but they lead to systematic and predictable errors" (cognitive biases) [1]. Kahneman received the 2002 Nobel Memorial Prize in Economic Sciences for this work, which has influenced fields from behavioral economics to the design of AI systems. Understanding human heuristic biases has also informed research on bias in machine learning models.
Heuristics are used across virtually every domain of computer science and artificial intelligence: