# Heuristic

> Source: https://aiwiki.ai/wiki/heuristic
> Updated: 2026-06-27
> Categories: Artificial Intelligence, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

A **heuristic** is a practical problem-solving approach that trades optimality, completeness, accuracy, or precision for speed, producing a good-enough answer when an exact method would be too slow, too expensive, or unavailable [1][9]. In [artificial intelligence](/wiki/artificial_intelligence), the most common technical use is the **heuristic function** *h(n)*, which Russell and Norvig define as the "estimated cost of the cheapest path from node n to a goal node" [2]. A heuristic function is what turns a blind, uninformed search into an informed (heuristic) search: it lets algorithms such as [A* search](/wiki/a_star_search) and greedy best-first search head toward a goal instead of expanding every state. Derived from the Greek word *heuriskein*, meaning "to discover," heuristics are rules of thumb, educated guesses, or intuitive strategies that guide search, game-playing engines, [hyperparameter](/wiki/hyperparameter) tuning, and architecture design.

## ELI5 (Explain Like I'm 5)

Imagine you lose a toy somewhere in your house. You could search every single room, open every drawer, and look under every piece of furniture until you find it. That would take a very long time, but you would definitely find it. A heuristic is like thinking, "I usually play with that toy in the living room, so I'll look there first." You might not find it on the first try, but most of the time you will, and it saves you a lot of effort. In computer science, heuristics work the same way: they help computers make smart guesses so they can solve problems much faster, even if the answer is not always perfect.

## What is a heuristic?

The word predates computers. In his 1945 book *How to Solve It*, the mathematician George Polya revived the term and wrote that "the aim of heuristic is to study the methods and rules of discovery and invention" [10]. In modern computer science the definition narrows to a technique "designed for solving a problem more quickly when classic methods are too slow for finding an exact or approximate solution, or when classic methods fail to find any exact solution in a search space," which "is achieved by trading optimality, completeness, accuracy, or precision for speed" [9].

The single most important property of a heuristic is therefore the one it gives up: a guarantee. A heuristic may return a suboptimal answer, and there is often no efficient way to know how far from optimal it is. In exchange, it scales to problems that exact algorithms cannot touch within a reasonable time.

## How does a heuristic differ from an algorithm?

The terms "heuristic" and "algorithm" are sometimes used interchangeably, but they differ in important ways.

| Property | Algorithm | Heuristic |
|---|---|---|
| **Guarantee** | Produces a correct or optimal solution | Produces a satisfactory (possibly suboptimal) solution |
| **Completeness** | Explores the full solution space if needed | Explores a promising subset of the solution space |
| **Speed** | Can be slow on large or complex problems | Designed to run quickly, even on hard problems |
| **Use case** | Sorting, shortest path in small graphs, arithmetic | NP-hard optimization, real-time decision making, large search spaces |
| **Example** | Dijkstra's shortest-path algorithm | Greedy nearest-neighbor tour for the Traveling Salesman Problem |

An algorithm is a step-by-step procedure that guarantees a correct result when executed properly. A heuristic sacrifices that guarantee in exchange for computational efficiency. In practice, many systems combine both: an algorithm provides the framework, while heuristics steer it toward promising regions of the search space.

## What is a heuristic function in search?

Heuristic search is one of the foundational topics in [artificial intelligence](/wiki/artificial_intelligence). Instead of blindly exploring every possible state (as in uninformed search), informed (heuristic) search algorithms use a heuristic function *h(n)* to estimate how close a given state *n* is to the goal. Russell and Norvig call heuristic functions "the most common form in which additional knowledge of the problem is imparted to the search algorithm," subject to one constraint: if *n* is a goal node, then *h(n) = 0* [2].

### How does greedy best-first search work?

Greedy best-first search expands the node that appears closest to the goal according to the heuristic function *h(n)*, evaluating nodes by *f(n) = h(n)* alone [2]. It ignores the cost already incurred to reach the current node, which makes it fast but prone to finding suboptimal paths. The algorithm is not complete in the general case, because it can follow a path into a dead end without recovering. Its worst-case time and space complexity is *O(b^m)*, where *b* is the branching factor and *m* is the maximum depth of the search space; a good heuristic can reduce this substantially [2].

### How does A* search use a heuristic?

The [A* search algorithm](/wiki/a_star_search) combines the actual path cost *g(n)* with the heuristic estimate *h(n)* into a single evaluation function:

**f(n) = g(n) + h(n)**

Here *f(n)* is the estimated cost of the cheapest solution through *n*. A* is both complete and optimal when *h(n)* is an admissible heuristic [2]. It was introduced in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael of the Stanford Research Institute in the paper "A Formal Basis for the Heuristic Determination of Minimum Cost Paths," developed as part of the Shakey mobile-robot project [3]. A* is widely used in pathfinding (video games, robotics, GPS navigation) and planning problems.

## What makes a heuristic admissible?

A heuristic *h(n)* is **admissible** if it never overestimates the true cost to reach the goal from node *n* [2]. Formally, for every node *n*:

**h(n) <= h*(n)**

where *h*(n)* is the actual cost of the optimal path from *n* to the goal. Admissibility is what guarantees A* (with tree search) will find the shortest path, because an admissible heuristic is by nature optimistic: it never thinks the cost of solving the problem is more than it actually is [2].

A stronger property, **consistency** (also called monotonicity), requires that the heuristic estimate decrease by no more than the actual step cost along any edge. Formally, for every node *n* and every successor *n'* reached by action *a*:

**h(n) <= c(n, a, n') + h(n')**

This is a form of the triangle inequality [2]. Russell and Norvig note that "every consistent heuristic is also admissible," and that consistency is what makes A* optimal when it uses graph search rather than tree search, by guaranteeing the *f*-values along any path are nondecreasing [2]. Consistent heuristics therefore prevent A* from having to re-expand nodes.

A classic example is pathfinding on a grid: the straight-line (Euclidean) distance to the goal is admissible because no path can be shorter than a straight line, and Russell and Norvig point out it is also consistent [2].

## What are good heuristics for the 8-puzzle?

The sliding-tile 8-puzzle is the canonical teaching example for search heuristics [2]. A randomly generated instance has an average solution cost of about 22 moves and a branching factor of roughly 3 (two to four legal moves depending on where the blank tile sits), so a brute-force search to depth 22 would examine on the order of 3.1 x 10^10 states. Because only 9!/2 = 181,440 of those states are actually reachable, repeated-state checking cuts the work dramatically, but a good heuristic still matters [2]. Two admissible candidates dominate the literature:

| Heuristic | Definition | Value for the AIMA example | Admissible? |
|---|---|---|---|
| **h1 (misplaced tiles)** | Number of tiles not in their goal position | 8 (all eight tiles out of place) | Yes [2] |
| **h2 (Manhattan / city-block distance)** | Sum of the horizontal and vertical distances of each tile from its goal | 18 | Yes [2] |

Neither overestimates the example's true solution cost of 26 [2]. Crucially, *h2(n) >= h1(n)* for every node, so *h2* is said to **dominate** *h1*, and "domination translates directly into efficiency": A* using *h2* never expands more nodes than A* using *h1* [2]. The payoff is large. Russell and Norvig report that on randomly generated 8-puzzles with solution length 14, "A* with h2 is 30,000 times more efficient than uninformed iterative deepening search" [2].

This example also reveals a general recipe for *inventing* admissible heuristics: relax the problem. The exact solution cost of a **relaxed problem** (one with fewer restrictions on the moves) is always an admissible heuristic for the original. Misplaced-tile count is the exact cost when a tile may teleport anywhere; Manhattan distance is the exact cost when a tile may slide onto an occupied square [2].

## How are heuristic evaluation functions used in game playing?

In adversarial search (for example, chess or Go), the [minimax algorithm](/wiki/minimax) with [alpha-beta pruning](/wiki/alpha_beta_pruning) explores the game tree to choose moves. Because it is impossible to search the entire tree for complex games, the search is cut off at a certain depth, and a **heuristic evaluation function** estimates the value of the resulting board position.

In chess, a classic evaluation function might look like:

**eval = c1 * material + c2 * mobility + c3 * king_safety + c4 * center_control + c5 * pawn_structure**

Each term is a weighted difference between the two players. The weights *c1, c2, ...* are tuned through experience or automated search. More modern engines use [neural networks](/wiki/neural_network) trained on millions of games to produce evaluation scores, and programs such as [AlphaZero](/wiki/alphazero) pair a learned value function with [Monte Carlo tree search](/wiki/monte_carlo_tree_search) rather than alpha-beta, but the underlying idea is the same: assign a numeric estimate of how "good" a position is without playing the game to completion.

## What heuristics are used in machine learning?

[Machine learning](/wiki/machine_learning) relies heavily on heuristics at every stage of the modeling pipeline, from data preparation to deployment.

### Data Splitting Heuristics

The **80/20 train-test split** is one of the most widely used rules of thumb. It allocates 80% of the data for training and 20% for testing, loosely inspired by the Pareto principle. For smaller datasets (under 100,000 samples), practitioners often prefer **k-fold cross-validation** to make better use of limited data.

### Hyperparameter Tuning Heuristics

Selecting [hyperparameters](/wiki/hyperparameter) is often guided by heuristics before any automated search begins.

| Hyperparameter | Common Heuristic |
|---|---|
| [Learning rate](/wiki/learning_rate) | Start with 0.001 for Adam; 0.01 for SGD. The learning rate is considered the single most important hyperparameter to tune. |
| [Batch size](/wiki/batch_size) | Start at 32 and increase in powers of 2 (64, 128, 256). Larger batches use GPU memory more efficiently; smaller batches add a regularizing effect. |
| Number of hidden layers | Start with 1 or 2 layers; add more if underfitting. Deeper networks generalize better but are harder to optimize. |
| k in [k-nearest neighbors](/wiki/k_nearest_neighbors) | Use k = sqrt(N), where N is the number of training samples. Larger k reduces noise sensitivity; smaller k captures local patterns. |
| Number of trees in [random forest](/wiki/random_forest) | Start with 100 or 500 trees. Increasing beyond a certain point yields diminishing returns. |
| [Dropout](/wiki/dropout_regularization) rate | 0.5 for hidden layers, 0.2 for input layers. Less effective on very small or very large datasets. |
| Momentum | Start with 0.9. Values of 0.5, 0.9, and 0.99 are common checkpoints to try. |

These starting points save time by narrowing the search space before more systematic methods like grid search, random search, or [Bayesian optimization](/wiki/bayesian_optimization) take over. Bergstra and Bengio showed that even random search over hyperparameters is more efficient than grid search, because for most datasets only a few hyperparameters really matter [7].

### Feature Engineering Heuristics

[Feature engineering](/wiki/feature_engineering) is the process of creating or transforming input variables to improve model performance. Common heuristics include:

- **Remove features with near-zero variance**, because they carry almost no predictive information.
- **Log-transform** heavily skewed numerical features to reduce the influence of outliers.
- **One-hot encode** categorical variables with fewer than 10 to 15 unique values; use target encoding or embeddings for higher-cardinality features.
- **Normalize or standardize** features before distance-based algorithms (k-NN, SVM, [gradient descent](/wiki/gradient_descent)-based optimizers) to ensure all features contribute equally.

### Architecture Design Heuristics for Deep Learning

Deep learning practitioners follow several rules of thumb when designing [neural network](/wiki/neural_network) architectures [8]:

- Use **ReLU** (Rectified Linear Unit) as the default [activation function](/wiki/activation_function) for hidden layers. It reduces the vanishing gradient problem and is computationally cheap.
- Prefer **cross-entropy** over mean squared error as the [loss function](/wiki/loss_function) for classification tasks, because it produces stronger gradients when predictions are wrong.
- Always use **[early stopping](/wiki/early_stopping)**: monitor validation loss and stop training when it begins to increase. This is one of the most efficient regularization techniques.
- Apply **[batch normalization](/wiki/batch_normalization)** to stabilize and accelerate training. When using batch normalization, dropout may be redundant.
- For a new problem similar to a well-studied one, start by copying a proven architecture and fine-tuning it rather than designing from scratch ([transfer learning](/wiki/transfer_learning)).

## What is a metaheuristic?

Metaheuristics are higher-level strategies that guide subordinate heuristic procedures to explore large and complex search spaces more effectively. Unlike problem-specific heuristics, metaheuristics are general-purpose frameworks that can be applied to a wide range of optimization problems. They draw inspiration from natural processes in physics, biology, and social behavior.

### Simulated Annealing

[Simulated annealing](/wiki/simulated_annealing) mimics the metallurgical process of slowly cooling a material to reduce defects. It was introduced by Kirkpatrick, Gelatt, and Vecchi in a 1983 *Science* paper that drew an explicit analogy between statistical mechanics and combinatorial optimization [4]. The algorithm starts with a high "temperature" that allows it to accept worse solutions with significant probability, helping it escape local optima. As the temperature decreases over time, the algorithm becomes increasingly selective, eventually converging to a near-optimal solution. Simulated annealing is effective for combinatorial problems like the Traveling Salesman Problem and circuit design.

### Genetic Algorithms

A [genetic algorithm](/wiki/genetic_algorithm) maintains a population of candidate solutions ("chromosomes") that evolve over successive generations through three operators inspired by biological evolution. The approach was formalized by John Holland in his 1975 book *Adaptation in Natural and Artificial Systems* [5]:

1. **Selection**: fitter individuals are more likely to be chosen as parents.
2. **Crossover**: pairs of parents exchange portions of their solutions to produce offspring.
3. **Mutation**: random changes are introduced to maintain diversity and prevent premature convergence.

Genetic algorithms are used in feature selection, hyperparameter tuning, scheduling, and engineering design optimization.

### Other Metaheuristics

| Metaheuristic | Inspiration | Key Idea |
|---|---|---|
| **Tabu search** | Memory-based | Maintains a list of recently visited solutions to prevent cycling and force exploration of new regions; introduced by Fred Glover in 1986, the paper that also coined the term "metaheuristic" [6] |
| **Particle swarm optimization** | Bird flocking | Candidate solutions ("particles") move through the search space, attracted toward the best positions found by themselves and their neighbors |
| **Ant colony optimization** | Ant foraging | Artificial ants deposit "pheromone" on promising paths, reinforcing routes that lead to good solutions |
| **Differential evolution** | Population-based | Creates new candidates by combining differences between existing population members |

## How do heuristics work in human decision making?

The concept of heuristics extends beyond computer science into cognitive psychology. In the 1970s, psychologists Daniel Kahneman and Amos Tversky published their landmark paper "Judgment Under Uncertainty: Heuristics and Biases" (1974), identifying three major cognitive heuristics that people rely on when making decisions under uncertainty [1]:

- **Representativeness**: judging the probability that something belongs to a category based on how similar it looks to a typical member, often ignoring base rates.
- **Availability**: estimating the frequency or likelihood of an event based on how easily examples come to mind. Dramatic events (plane crashes) are overestimated because they are more memorable.
- **Anchoring and adjustment**: starting from an initial value (the "anchor") and adjusting insufficiently toward the correct answer.

As Tversky and Kahneman put it, "these heuristics are highly economical and usually effective, but they lead to systematic and predictable errors" (cognitive biases) [1]. Kahneman received the 2002 Nobel Memorial Prize in Economic Sciences for this work, which has influenced fields from behavioral economics to the design of AI systems. Understanding human heuristic biases has also informed research on [bias](/wiki/bias) in machine learning models.

## Advantages and Limitations

### Advantages

- **Speed**: heuristics produce solutions in seconds or minutes where exact methods might take hours, days, or longer.
- **Scalability**: they handle large problem instances that are computationally intractable for exact algorithms.
- **Simplicity**: many heuristics are easy to implement and understand.
- **Flexibility**: metaheuristics can be adapted to new problem domains with minimal modification.

### Limitations

- **No optimality guarantee**: the solution may be far from optimal, and there is often no way to know how far.
- **Parameter sensitivity**: many heuristics (and metaheuristics) have their own parameters that require tuning.
- **Inconsistency**: a heuristic that works well on one dataset or problem instance may perform poorly on another.
- **Lack of theoretical grounding**: some heuristics are based purely on empirical observation, making it difficult to predict when they will fail.

## What are heuristics used for?

Heuristics are used across virtually every domain of computer science and [artificial intelligence](/wiki/artificial_intelligence):

- **Pathfinding and navigation**: A* and its variants power GPS routing, video game AI, and robotic motion planning.
- **Scheduling and logistics**: metaheuristics solve vehicle routing, job-shop scheduling, and airline crew assignment problems.
- **Cybersecurity**: antivirus software uses heuristic scanning to detect malware by identifying suspicious behavioral patterns, even for previously unknown threats.
- **Compiler optimization**: compilers use heuristics to decide register allocation, instruction scheduling, and loop unrolling strategies.
- **Machine learning pipelines**: heuristics guide data splitting, feature selection, hyperparameter initialization, and model selection at every stage of the ML workflow.
- **Game AI**: evaluation functions allow game-playing programs to assess positions without exhaustively searching the game tree.

## See Also

- [Artificial Intelligence](/wiki/artificial_intelligence)
- [Machine Learning](/wiki/machine_learning)
- [Reinforcement Learning](/wiki/reinforcement_learning)
- [Hyperparameter](/wiki/hyperparameter)
- [Gradient Descent](/wiki/gradient_descent)
- [Neural Network](/wiki/neural_network)
- [Beam Search](/wiki/beam_search)
- [Monte Carlo Tree Search](/wiki/monte_carlo_tree_search)
- [A* Search](/wiki/a_star_search)
- [Genetic Algorithm](/wiki/genetic_algorithm)
- [Simulated Annealing](/wiki/simulated_annealing)
- [Bayesian Optimization](/wiki/bayesian_optimization)

## References

1. Tversky, A., & Kahneman, D. (1974). "Judgment Under Uncertainty: Heuristics and Biases." *Science*, 185(4157), 1124-1131. https://www.science.org/doi/10.1126/science.185.4157.1124
2. Russell, S., & Norvig, P. (2010). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson. Chapter 4, "Informed Search and Exploration," and Chapter 4.2, "Heuristic Functions." http://aima.cs.berkeley.edu/
3. Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." *IEEE Transactions on Systems Science and Cybernetics*, 4(2), 100-107.
4. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). "Optimization by Simulated Annealing." *Science*, 220(4598), 671-680.
5. Holland, J. H. (1975). *Adaptation in Natural and Artificial Systems*. University of Michigan Press.
6. Glover, F. (1986). "Future Paths for Integer Programming and Links to Artificial Intelligence." *Computers & Operations Research*, 13(5), 533-549.
7. Bergstra, J., & Bengio, Y. (2012). "Random Search for Hyper-Parameter Optimization." *Journal of Machine Learning Research*, 13, 281-305.
8. Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep Learning*. MIT Press. Chapter 11: Practical Methodology. https://www.deeplearningbook.org/
9. "Heuristic (computer science)." *Wikipedia*. https://en.wikipedia.org/wiki/Heuristic_(computer_science)
10. Polya, G. (1945). *How to Solve It: A New Aspect of Mathematical Method*. Princeton University Press (glossary entry "Heuristic").
