Heuristic

A heuristic is a practical problem-solving approach that trades optimality, completeness, or precision for speed. Derived from the Greek word heuriskein, meaning "to discover," heuristics are rules of thumb, educated guesses, or intuitive strategies that produce good-enough solutions when exact methods are too slow, too expensive, or simply unavailable. In artificial intelligence and machine learning, heuristics guide everything from search algorithms and game-playing engines to hyperparameter tuning and model architecture design.

ELI5 (Explain Like I'm 5)

Imagine you lose a toy somewhere in your house. You could search every single room, open every drawer, and look under every piece of furniture until you find it. That would take a very long time, but you would definitely find it. A heuristic is like thinking, "I usually play with that toy in the living room, so I'll look there first." You might not find it on the first try, but most of the time you will, and it saves you a lot of effort. In computer science, heuristics work the same way: they help computers make smart guesses so they can solve problems much faster, even if the answer is not always perfect.

Heuristic vs. Algorithm

The terms "heuristic" and "algorithm" are sometimes used interchangeably, but they differ in important ways.

Property	Algorithm	Heuristic
Guarantee	Produces a correct or optimal solution	Produces a satisfactory (possibly suboptimal) solution
Completeness	Explores the full solution space if needed	Explores a promising subset of the solution space
Speed	Can be slow on large or complex problems	Designed to run quickly, even on hard problems
Use case	Sorting, shortest path in small graphs, arithmetic	NP-hard optimization, real-time decision making, large search spaces
Example	Dijkstra's shortest-path algorithm	Greedy nearest-neighbor tour for the Traveling Salesman Problem

An algorithm is a step-by-step procedure that guarantees a correct result when executed properly. A heuristic sacrifices that guarantee in exchange for computational efficiency. In practice, many systems combine both: an algorithm provides the framework, while heuristics steer it toward promising regions of the search space.

Heuristics in Search Algorithms

Heuristic search is one of the foundational topics in artificial intelligence. Instead of blindly exploring every possible state (as in uninformed search), informed search algorithms use a heuristic function h(n) to estimate how close a given state n is to the goal.

Greedy Best-First Search

Greedy best-first search expands the node that appears closest to the goal according to the heuristic function h(n). It ignores the cost already incurred to reach the current node, which makes it fast but prone to finding suboptimal paths. The algorithm is not complete in all cases, because it can follow a path into a dead end without backtracking.

A* Search

The A* search algorithm combines the actual path cost g(n) with the heuristic estimate h(n) into a single evaluation function:

f(n) = g(n) + h(n)

A* is both complete and optimal when h(n) is an admissible heuristic. It is widely used in pathfinding (video games, robotics, GPS navigation) and planning problems.

Admissible Heuristics

A heuristic h(n) is admissible if it never overestimates the true cost to reach the goal from node n. Formally, for every node n:

h(n) ≤ h(n)*

where h(n)* is the actual cost of the optimal path from n to the goal. Admissibility is what guarantees A* will find the shortest path. A stronger property, consistency (also called monotonicity), requires that the heuristic estimate decreases by no more than the actual step cost along any edge. Consistent heuristics are always admissible, and they prevent A* from needing to re-expand nodes.

A classic example is pathfinding on a grid: the straight-line (Euclidean) distance to the goal is admissible because no path can be shorter than a straight line.

Heuristic Evaluation Functions in Game Playing

In adversarial search (for example, chess or Go), the minimax algorithm with alpha-beta pruning explores the game tree to choose moves. Because it is impossible to search the entire tree for complex games, the search is cut off at a certain depth, and a heuristic evaluation function estimates the value of the resulting board position.

In chess, a classic evaluation function might look like:

eval = c1 * material + c2 * mobility + c3 * king_safety + c4 * center_control + c5 * pawn_structure

Each term is a weighted difference between the two players. The weights c1, c2, ... are tuned through experience or automated search. More modern engines use neural networks trained on millions of games to produce evaluation scores, but the underlying idea is the same: assign a numeric estimate of how "good" a position is without playing the game to completion.

Heuristics in Machine Learning

Machine learning relies heavily on heuristics at every stage of the modeling pipeline, from data preparation to deployment.

Data Splitting Heuristics

The 80/20 train-test split is one of the most widely used rules of thumb. It allocates 80% of the data for training and 20% for testing, loosely inspired by the Pareto principle. For smaller datasets (under 100,000 samples), practitioners often prefer k-fold cross-validation to make better use of limited data.

Hyperparameter Tuning Heuristics

Selecting hyperparameters is often guided by heuristics before any automated search begins.

Hyperparameter	Common Heuristic
Learning rate	Start with 0.001 for Adam; 0.01 for SGD. The learning rate is considered the single most important hyperparameter to tune.
Batch size	Start at 32 and increase in powers of 2 (64, 128, 256). Larger batches use GPU memory more efficiently; smaller batches add a regularizing effect.
Number of hidden layers	Start with 1 or 2 layers; add more if underfitting. Deeper networks generalize better but are harder to optimize.
k in k-nearest neighbors	Use k = sqrt(N), where N is the number of training samples. Larger k reduces noise sensitivity; smaller k captures local patterns.
Number of trees in random forest	Start with 100 or 500 trees. Increasing beyond a certain point yields diminishing returns.
Dropout rate	0.5 for hidden layers, 0.2 for input layers. Less effective on very small or very large datasets.
Momentum	Start with 0.9. Values of 0.5, 0.9, and 0.99 are common checkpoints to try.

These starting points save time by narrowing the search space before more systematic methods like grid search, random search, or Bayesian optimization take over.

Feature Engineering Heuristics

Feature engineering is the process of creating or transforming input variables to improve model performance. Common heuristics include:

Remove features with near-zero variance, because they carry almost no predictive information.
Log-transform heavily skewed numerical features to reduce the influence of outliers.
One-hot encode categorical variables with fewer than 10 to 15 unique values; use target encoding or embeddings for higher-cardinality features.
Normalize or standardize features before distance-based algorithms (k-NN, SVM, gradient descent-based optimizers) to ensure all features contribute equally.

Architecture Design Heuristics for Deep Learning

Deep learning practitioners follow several rules of thumb when designing neural network architectures:

Use ReLU (Rectified Linear Unit) as the default activation function for hidden layers. It reduces the vanishing gradient problem and is computationally cheap.
Prefer cross-entropy over mean squared error as the loss function for classification tasks, because it produces stronger gradients when predictions are wrong.
Always use early stopping: monitor validation loss and stop training when it begins to increase. This is one of the most efficient regularization techniques.
Apply batch normalization to stabilize and accelerate training. When using batch normalization, dropout may be redundant.
For a new problem similar to a well-studied one, start by copying a proven architecture and fine-tuning it rather than designing from scratch (transfer learning).

Metaheuristics

Metaheuristics are higher-level strategies that guide subordinate heuristic procedures to explore large and complex search spaces more effectively. Unlike problem-specific heuristics, metaheuristics are general-purpose frameworks that can be applied to a wide range of optimization problems. They draw inspiration from natural processes in physics, biology, and social behavior.

Simulated Annealing

Simulated annealing mimics the metallurgical process of slowly cooling a material to reduce defects. The algorithm starts with a high "temperature" that allows it to accept worse solutions with significant probability, helping it escape local optima. As the temperature decreases over time, the algorithm becomes increasingly selective, eventually converging to a near-optimal solution. Simulated annealing is effective for combinatorial problems like the Traveling Salesman Problem and circuit design.

Genetic Algorithms

A genetic algorithm maintains a population of candidate solutions ("chromosomes") that evolve over successive generations through three operators inspired by biological evolution:

Selection: fitter individuals are more likely to be chosen as parents.
Crossover: pairs of parents exchange portions of their solutions to produce offspring.
Mutation: random changes are introduced to maintain diversity and prevent premature convergence.

Genetic algorithms are used in feature selection, hyperparameter tuning, scheduling, and engineering design optimization.

Other Metaheuristics

Metaheuristic	Inspiration	Key Idea
Tabu search	Memory-based	Maintains a list of recently visited solutions to prevent cycling and force exploration of new regions
Particle swarm optimization	Bird flocking	Candidate solutions ("particles") move through the search space, attracted toward the best positions found by themselves and their neighbors
Ant colony optimization	Ant foraging	Artificial ants deposit "pheromone" on promising paths, reinforcing routes that lead to good solutions
Differential evolution	Population-based	Creates new candidates by combining differences between existing population members

Heuristics in Human Decision Making

The concept of heuristics extends beyond computer science into cognitive psychology. In the 1970s, psychologists Daniel Kahneman and Amos Tversky published their landmark paper "Judgment Under Uncertainty: Heuristics and Biases" (1974), identifying three major cognitive heuristics that people rely on when making decisions under uncertainty:

Representativeness: judging the probability that something belongs to a category based on how similar it looks to a typical member, often ignoring base rates.
Availability: estimating the frequency or likelihood of an event based on how easily examples come to mind. Dramatic events (plane crashes) are overestimated because they are more memorable.
Anchoring and adjustment: starting from an initial value (the "anchor") and adjusting insufficiently toward the correct answer.

These cognitive heuristics are "highly economical and usually effective," as Kahneman and Tversky noted, but they lead to systematic and predictable errors (cognitive biases). Kahneman received the 2002 Nobel Memorial Prize in Economic Sciences for this work, which has influenced fields from behavioral economics to the design of AI systems. Understanding human heuristic biases has also informed research on bias in machine learning models.

Advantages and Limitations

Advantages

Speed: heuristics produce solutions in seconds or minutes where exact methods might take hours, days, or longer.
Scalability: they handle large problem instances that are computationally intractable for exact algorithms.
Simplicity: many heuristics are easy to implement and understand.
Flexibility: metaheuristics can be adapted to new problem domains with minimal modification.

Limitations

No optimality guarantee: the solution may be far from optimal, and there is often no way to know how far.
Parameter sensitivity: many heuristics (and metaheuristics) have their own parameters that require tuning.
Inconsistency: a heuristic that works well on one dataset or problem instance may perform poorly on another.
Lack of theoretical grounding: some heuristics are based purely on empirical observation, making it difficult to predict when they will fail.

Applications

Heuristics are used across virtually every domain of computer science and artificial intelligence:

Pathfinding and navigation: A* and its variants power GPS routing, video game AI, and robotic motion planning.
Scheduling and logistics: metaheuristics solve vehicle routing, job-shop scheduling, and airline crew assignment problems.
Cybersecurity: antivirus software uses heuristic scanning to detect malware by identifying suspicious behavioral patterns, even for previously unknown threats.
Compiler optimization: compilers use heuristics to decide register allocation, instruction scheduling, and loop unrolling strategies.
Machine learning pipelines: heuristics guide data splitting, feature selection, hyperparameter initialization, and model selection at every stage of the ML workflow.
Game AI: evaluation functions allow game-playing programs to assess positions without exhaustively searching the game tree.

References

Tversky, A., & Kahneman, D. (1974). "Judgment Under Uncertainty: Heuristics and Biases." *Science*, 185(4157), 1124-1131.
Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (4th ed.). Pearson. Chapters on informed search and game playing.
Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." *IEEE Transactions on Systems Science and Cybernetics*, 4(2), 100-107.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). "Optimization by Simulated Annealing." *Science*, 220(4598), 671-680.
Holland, J. H. (1975). *Adaptation in Natural and Artificial Systems*. University of Michigan Press.
Glover, F. (1986). "Future Paths for Integer Programming and Links to Artificial Intelligence." *Computers & Operations Research*, 13(5), 533-549.
Bergstra, J., & Bengio, Y. (2012). "Random Search for Hyper-Parameter Optimization." *Journal of Machine Learning Research*, 13, 281-305.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). *Deep Learning*. MIT Press. Chapter 11: Practical Methodology.
"Heuristic (computer science)." *Wikipedia*. https://en.wikipedia.org/wiki/Heuristic_(computer_science)
Cornell University Computational Optimization Open Textbook. "Heuristic Algorithms." https://optimization.cbe.cornell.edu/index.php?title=Heuristic_algorithms

ELI5 (Explain Like I'm 5)

Heuristic vs. Algorithm

Heuristics in Search Algorithms

Greedy Best-First Search

A* Search

Admissible Heuristics

Heuristic Evaluation Functions in Game Playing

Heuristics in Machine Learning

Data Splitting Heuristics

Hyperparameter Tuning Heuristics

Feature Engineering Heuristics

Architecture Design Heuristics for Deep Learning

Metaheuristics

Simulated Annealing

Genetic Algorithms

Other Metaheuristics

Heuristics in Human Decision Making

Advantages and Limitations

Advantages

Limitations

Applications

See Also

References

Improve this article

Related Articles

ARC-AGI 2

Machine learning terms/Fairness

Open-source AI

DeepSeek 3.0

AI search

Agentic Context Engineering

ELI5 (Explain Like I'm 5)

Heuristic vs. Algorithm

Heuristics in Search Algorithms

Greedy Best-First Search

A* Search

Admissible Heuristics

Heuristic Evaluation Functions in Game Playing

Heuristics in Machine Learning

Data Splitting Heuristics

Hyperparameter Tuning Heuristics

Feature Engineering Heuristics

Architecture Design Heuristics for Deep Learning

Metaheuristics

Simulated Annealing

Genetic Algorithms

Other Metaheuristics

Heuristics in Human Decision Making

Advantages and Limitations

Advantages

Limitations

Applications

See Also

References

Related Articles

ARC-AGI 2

Machine learning terms/Fairness

Open-source AI

DeepSeek 3.0

AI search

Agentic Context Engineering