# Pattern Recognition

> Source: https://aiwiki.ai/wiki/pattern_recognition
> Updated: 2026-06-24
> Categories: Artificial Intelligence, Computer Science, Machine Learning, Statistics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Pattern recognition** is the automatic discovery of regularities in data through the use of computer algorithms, and the use of those regularities to take actions such as classifying data into categories, segmenting images, transcribing speech, or flagging anomalies in measured signals. Christopher Bishop opens the standard 2006 textbook by defining the field as "concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories." [12] It is closely related to [machine learning](/wiki/machine_learning) and statistical learning, and historically grew out of work in statistics, signal processing, and engineering aimed at building systems that could read printed characters, recognize spoken words, identify fingerprints, and classify radar returns.

A pattern recognition system typically takes raw measurements (pixels, audio samples, sensor readings, text), transforms them into a representation suitable for analysis through [feature extraction](/wiki/feature_extraction) or learned encodings, and then applies a decision procedure that maps representations to outputs such as discrete labels, real-valued predictions, or structured objects like sequences and graphs. The core task is [classification](/wiki/classification): assigning each observation a label drawn from a finite set. The decision procedure may be designed by hand, derived from probabilistic models, or learned from examples. Modern pattern recognition systems are dominated by learned representations produced by deep neural networks, but classical statistical and structural methods remain important in domains where data are scarce, latency budgets are tight, or interpretability is required.

Pattern recognition spans a wide range of subjects, including [computer vision](/wiki/computer_vision), [speech recognition](/wiki/speech_recognition), [optical character recognition](/wiki/optical_character_recognition), bioinformatics, medical diagnostics, financial fraud detection, industrial inspection, and document analysis. Its central problems include classification, regression, [clustering](/wiki/clustering), sequence labeling, detection, and density estimation. The field has its own conferences and journals, such as the International Conference on Pattern Recognition (ICPR) and the journal Pattern Recognition, and it overlaps strongly with the broader machine learning, computer vision, and pattern analysis communities.

## What is pattern recognition? Definition and scope

The term pattern recognition refers to a family of computational problems in which a system must assign meaningful structure to observations. In the most common formulation, the system observes an input vector or sequence and produces a label drawn from a finite set, a real-valued estimate, a structured prediction such as a parse tree or segmentation mask, or a probability distribution over candidate outputs. The defining characteristic is that the mapping is learned or derived from data and statistical assumptions, rather than specified by an explicit set of hand-coded rules.

Pattern recognition shares its core tools and theory with [supervised learning](/wiki/supervised_learning) and [unsupervised learning](/wiki/unsupervised_learning) in machine learning, with statistical inference, and with signal processing. The field also covers structural and syntactic methods that operate on graphs, strings, and grammars rather than fixed-length feature vectors. Closely related areas include [computer vision](/wiki/computer_vision), where the inputs are images or video; [natural language processing](/wiki/nlp), where the inputs are text or speech transcripts; bioinformatics, where the inputs are biological sequences; and time series analysis, where the inputs are signals indexed by time.

In engineering practice, pattern recognition systems are usually evaluated on quantitative metrics such as classification accuracy, error rate, precision and recall, F1 score, area under the receiver operating characteristic curve, mean average precision, word error rate, or domain-specific scores. The combination of a problem definition, a dataset, and an evaluation protocol is sometimes called a benchmark, and progress in the field has often been driven by competitive evaluation on shared benchmarks.

## History

### Statistical and engineering roots

The statistical foundations of pattern recognition were laid in the first half of the 20th century. In 1936, the British statistician Ronald A. Fisher published "The Use of Multiple Measurements in Taxonomic Problems" in the Annals of Eugenics (volume 7, pages 179 to 188), introducing what is now called Fisher's linear discriminant. [1] Fisher used measurements of iris flowers collected by Edgar Anderson to derive a linear combination of features that maximally separated species, establishing one of the earliest formal classifiers and a key technique in [linear discriminant analysis](/wiki/lda). The resulting iris dataset remains one of the most widely used teaching datasets in statistics and machine learning.

In the late 1950s, work on machine perception began to emerge as a distinct activity. Frank Rosenblatt, a research psychologist at the Cornell Aeronautical Laboratory in Buffalo, New York, described his [perceptron](/wiki/perceptron) in a 1957 technical report titled "The Perceptron: A Perceiving and Recognizing Automaton" and in a 1958 article in Psychological Review, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain." [2] The perceptron was both a theoretical model and a hardware system, and it demonstrated that simple [neural network](/wiki/neural_network) models could be trained from examples to classify visual stimuli. The same period saw the construction of early optical character recognition machines for sorting mail and reading financial documents.

### Establishment of the field

During the 1960s and early 1970s, pattern recognition consolidated into a recognizable research community. Statistical decision theory, Bayes classifiers, nearest neighbor rules, and feature selection methods were developed and applied to character recognition, speech, and remote sensing. In 1973, Richard O. Duda and Peter E. Hart published "Pattern Classification and Scene Analysis" with John Wiley and Sons. [3] The 512-page book provided a unified treatment of Bayesian decision theory, parametric and nonparametric density estimation, clustering, and scene analysis, and it became the standard graduate textbook in the field for the next two decades.

The community organized itself institutionally during the 1970s. The First International Joint Conference on Pattern Recognition was held October 30 to November 1, 1973 at the Mayflower Hotel in Washington, D.C., and the conference series later became the International Conference on Pattern Recognition (ICPR). [5] The International Association for Pattern Recognition (IAPR) came into official existence in January 1978, with formal organization completed around the 3rd International Joint Conference on Pattern Recognition in Coronado in 1976, under the leadership of Purdue University computer scientist King-Sun Fu. [4]

### Neural networks, kernels, and ensembles

The 1980s and 1990s saw a revival of neural network methods alongside the rise of statistical learning theory and ensemble approaches. Hidden Markov models became the dominant framework for speech recognition, supporting both acoustic modeling and word-level decoding. In 1989, Yann LeCun and colleagues published "Backpropagation Applied to Handwritten Zip Code Recognition" in Neural Computation, describing a [convolutional neural network](/wiki/convolutional_neural_network) trained end to end with backpropagation on handwritten digits from the United States Postal Service. [6] Trained on 7,291 images and tested on 2,007, the network reported a test error of 5.0 percent and a training error of 0.14 percent, an early real-world success for trainable convolutional architectures. [6]

In 1995, Corinna Cortes and Vladimir Vapnik published "Support-Vector Networks" in Machine Learning (volume 20, pages 273 to 297), presenting the modern soft-margin [support vector machine](/wiki/support_vector_machine) and demonstrating its effectiveness on optical character recognition benchmarks. [7] They described a learning machine in which "input vectors are non-linearly mapped to a very high-dimension feature space," where "a linear decision surface is constructed" with properties that ensure high generalization. [7] The same year, Yoav Freund and Robert Schapire introduced AdaBoost, with an extended journal version, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," appearing in the Journal of Computer and System Sciences in 1997. [8] Boosting and the closely related [random forest](/wiki/random_forest) algorithm, introduced by Leo Breiman in 2001 in Machine Learning (volume 45, pages 5 to 32), made ensemble methods a standard part of the practical pattern recognition toolkit. [8]

### Computer vision benchmarks and the deep learning era

In 2001, Paul Viola and Michael Jones published "Rapid Object Detection Using a Boosted Cascade of Simple Features" at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [10] Their detector combined integral images, Haar-like features, and a cascade of AdaBoost classifiers to achieve real-time face detection on consumer hardware, and it became one of the most widely deployed computer vision algorithms of the 2000s. Hand-crafted features such as SIFT (David Lowe, 1999 and 2004) [9] and HOG (Navneet Dalal and Bill Triggs, 2005) [11] provided robust descriptors for object recognition, image matching, and pedestrian detection, often paired with linear classifiers or support vector machines.

In 2006, Christopher Bishop published "Pattern Recognition and Machine Learning" with Springer. [12] The book gave a Bayesian-flavored treatment of probabilistic graphical models, kernel methods, mixture models, and approximate inference, and it became a standard reference at the graduate level alongside Duda and Hart.

The modern deep learning era is usually dated to 2012, when Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton's AlexNet won the ImageNet Large Scale Visual Recognition Challenge by a large margin, achieving a top-5 error rate of about 15.3 percent compared with 26.2 percent for the runner-up. [14] Subsequent years saw the rapid adoption of deep convolutional networks across computer vision, recurrent and convolutional models for speech recognition that surpassed Gaussian mixture and hidden Markov baselines, and end-to-end deep learning for many traditional pattern recognition tasks. From 2017 onward, transformer architectures and large pretrained foundation models extended these gains to language, multimodal data, and increasingly to vision, blurring the boundary between [deep learning](/wiki/deep_learning) and the rest of pattern recognition.

## What are the main types of pattern recognition tasks?

Pattern recognition problems are usually grouped by the structure of their outputs and the form of the available supervision.

| Task | Output type | Typical examples |
|------|-------------|------------------|
| Classification | Discrete label from a finite set | Digit recognition, spam detection, image categorization |
| Regression | Real-valued scalar or vector | Age estimation from images, dose prediction, signal regression |
| Clustering | Partition or hierarchy of data points | Customer segmentation, gene expression grouping |
| Density estimation | Probability distribution | Anomaly detection, generative modeling |
| Sequence labeling | Label per element in a sequence | Phoneme recognition, named entity recognition, part-of-speech tagging |
| Structured prediction | Parse trees, alignments, graphs | Syntactic parsing, optical music recognition |
| Detection | Set of objects with locations | Pedestrian detection, face detection, sound event detection |
| Segmentation | Per-pixel or per-token label | Medical image segmentation, semantic segmentation |
| Anomaly and novelty detection | Score or binary flag | Fraud detection, defect detection, intrusion detection |

Classification is the canonical pattern recognition task and the focus of much of the classical literature. Regression and density estimation extend the framework to real-valued and probabilistic outputs. Clustering provides unsupervised structure when labels are absent. Sequence and structured prediction generalize classification to outputs with internal structure, while detection and segmentation are central to computer vision. Anomaly and novelty detection address situations in which only a poor sample of one class, or no labeled examples at all, are available.

## What approaches are used in pattern recognition?

Pattern recognition methods can be grouped into broad families that differ in their assumptions, representations, and computational tools. Most modern systems combine ideas from several of these families.

| Approach | Key idea | Representative methods |
|----------|----------|------------------------|
| Statistical pattern recognition | Treat features as samples from probability distributions and apply decision theory | Bayes classifier, naive Bayes, Gaussian mixture models, linear and quadratic discriminant analysis, logistic regression |
| Structural and syntactic | Represent patterns as structured objects and use grammars or graph matching | String grammars, attributed graphs, edit distance, graph kernels |
| Template matching | Compare inputs to stored prototypes using a similarity measure | Cross-correlation, dynamic time warping, nearest prototype classifiers |
| Instance-based learning | Predict by looking up similar training examples | k-nearest neighbor, locally weighted regression |
| Kernel methods | Implicitly map inputs into high-dimensional spaces via kernel functions | Support vector machines, Gaussian processes, kernel ridge regression |
| Neural and deep methods | Learn layered representations directly from data | Perceptron, multilayer perceptron, convolutional neural network, recurrent neural network, transformer |
| Ensemble methods | Combine many classifiers to reduce variance or bias | Bagging, boosting (AdaBoost, gradient boosting), random forests, stacking |
| Decision tree methods | Recursively partition the input space | CART, ID3, C4.5, decision stumps |

Statistical pattern recognition emphasizes probability models, including [Bayesian inference](/wiki/bayesian_inference), [naive Bayes](/wiki/naive_bayes) classifiers, and Gaussian mixture models. Structural and syntactic methods are useful for chemical structures, document layouts, and other domains where patterns are naturally graphs or strings. Template matching and the [k-nearest neighbors](/wiki/k_nearest_neighbors) classifier illustrate how very simple ideas can be effective in low-dimensional, well-curated problems. [Decision tree](/wiki/decision_tree) methods and their ensembles, including boosting and bagging, often serve as strong baselines on tabular data. Kernel methods and support vector machines dominated many benchmarks in the 1990s and 2000s, while neural and deep methods are now the default in vision, speech, and language.

## Feature extraction and representation

The choice of representation has long been recognized as central to pattern recognition. Two broad strategies have shaped the field: hand-crafted feature extraction and learned representations.

### Hand-crafted features

In classical pipelines, raw measurements were transformed into feature vectors using domain knowledge. For images, edge maps, color histograms, Gabor filters, Haar-like features, and gradient-based descriptors were all common. SIFT, introduced by David Lowe in 1999 and developed in detail in his 2004 paper "Distinctive Image Features from Scale-Invariant Keypoints" in the International Journal of Computer Vision, provided keypoints and descriptors invariant to scale and rotation and robust to moderate viewpoint and illumination changes. [9] HOG, introduced by Navneet Dalal and Bill Triggs in their 2005 CVPR paper "Histograms of Oriented Gradients for Human Detection," computed local histograms of gradient orientations on a dense grid and proved especially effective for pedestrian detection. [11]

For audio, mel-frequency cepstral coefficients (MFCCs) became the standard front end for speech recognition and speaker identification, summarizing the short-term spectrum on a perceptual frequency scale. For text, hand-crafted features included bag-of-words representations, n-gram statistics, term frequency inverse document frequency weights, and part-of-speech tags. Other domains contributed their own feature engineering traditions, ranging from k-mer profiles in bioinformatics to wavelet coefficients in time-series analysis. The construction of such features is often called [feature engineering](/wiki/feature_engineering) and remains important in domains where labeled data are limited.

### Dimensionality reduction

High-dimensional feature vectors are often projected into lower-dimensional spaces to reduce noise, computation, and the risk of overfitting. [Principal component analysis](/wiki/pca) finds orthogonal linear directions of maximum variance and underlies many classical pipelines. Linear discriminant analysis seeks projections that maximize between-class variance relative to within-class variance and is often used as a supervised counterpart to PCA. Nonlinear methods include kernel PCA, isomap, locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). These methods are especially useful for visualization and exploratory analysis.

### Learned representations

A recurring theme in modern pattern recognition is the replacement of hand-crafted features by representations learned from data. Multilayer neural networks already learn intermediate features as a byproduct of training. Autoencoders explicitly learn compact codes that reconstruct the input, and denoising and variational variants impose additional structure on the latent space. Self-supervised pretraining, including masked language modeling, masked image modeling, and contrastive objectives such as those used in SimCLR and CLIP, learns reusable representations from unlabeled data that can then be adapted to many downstream tasks. In contemporary systems, the boundary between feature extraction and classification has largely disappeared: a single deep network is trained end to end to map inputs directly to outputs.

## How are pattern recognition systems evaluated?

Evaluating a pattern recognition system requires both a clear definition of the problem and a careful experimental protocol. Evaluation typically takes place on a held-out test set, separate from the data used to fit the model and to choose hyperparameters.

| Metric | Definition | Used for |
|--------|------------|----------|
| Accuracy | Fraction of correctly labeled examples | Balanced classification problems |
| Precision | True positives divided by predicted positives | Information retrieval, detection |
| Recall (sensitivity) | True positives divided by actual positives | Medical screening, search |
| F1 score | Harmonic mean of precision and recall | Imbalanced classification |
| Specificity | True negatives divided by actual negatives | Diagnostic testing |
| ROC curve and AUC | True positive rate versus false positive rate over thresholds | Threshold-independent evaluation |
| Precision-recall curve | Precision versus recall over thresholds | Imbalanced or rare-event classification |
| Confusion matrix | Counts of true and predicted labels | Multi-class diagnostic analysis |
| Log loss | Negative log probability assigned to true labels | Probabilistic classifiers |
| Word error rate | Edit distance per reference word | Speech recognition |
| Mean average precision | Average area under precision-recall curves | Object detection, retrieval |

The confusion matrix is a fundamental diagnostic tool, summarizing how often each true class is predicted as each other class. Receiver operating characteristic curves plot true positive rate against false positive rate as the decision threshold is varied, with area under the curve (AUC) providing a single threshold-independent summary.

Resampling techniques are used to make efficient use of finite data. In k-fold cross-validation, the data are split into k disjoint folds; the model is trained on k minus one folds and evaluated on the held-out fold, repeated k times. Leave-one-out cross-validation is the limiting case in which each example serves as its own test fold. Stratified cross-validation preserves class proportions across folds. A train, validation, and test split is the canonical setup in modern practice, with the validation set used to tune hyperparameters and the test set reserved for a final unbiased evaluation.

The bias-variance tradeoff describes how the expected generalization error of a model decomposes into bias from systematic mismatch with the true function, variance from sensitivity to the training sample, and irreducible noise. Methods such as regularization, ensemble averaging, and early stopping aim to navigate this tradeoff. Statistical significance tests, such as McNemar's test for paired classifier comparisons, are used to assess whether observed performance differences are likely to be real.

## What is pattern recognition used for? Applications

Pattern recognition powers a wide range of practical systems. The table below lists representative application areas and characteristic methods.

| Application | Description | Common methods |
|-------------|-------------|----------------|
| Optical character recognition | Conversion of printed or handwritten text in images to machine-readable form | CNNs, sequence models, hidden Markov models |
| Speech recognition | Transcription of spoken language into text | Hidden Markov models, deep neural acoustic models, end-to-end transformers |
| Face recognition | Identification or verification of individuals from facial images | CNN embeddings, metric learning, Viola-Jones for detection |
| Fingerprint recognition | Matching fingerprint impressions for identification | Minutiae extraction, ridge orientation analysis, neural matchers |
| Iris recognition | Biometric identification from iris texture | Gabor filtering, Hamming distance on iris codes |
| Handwriting recognition | Recognition of handwritten characters and words, online or offline | CNNs, recurrent networks, connectionist temporal classification |
| Medical image analysis | Detection, segmentation, and diagnosis from radiology, pathology, and other images | CNNs, U-Net, vision transformers, classical filters |
| Document analysis | Layout understanding, table recognition, form processing | Graph neural networks, sequence models, classical OCR pipelines |
| Bioinformatics | Sequence motif discovery, structure prediction, omics analysis | Hidden Markov models, profile HMMs, neural language models |
| Financial fraud detection | Identification of anomalous transactions or accounts | Gradient boosting, autoencoders, graph methods, rule-based systems |
| Industrial inspection | Detection of manufacturing defects in images or sensor data | CNNs, anomaly detection, classical machine vision |
| Remote sensing | Land cover classification and change detection from satellite imagery | Random forests, CNNs, time-series models |
| Radar and sonar | Target detection and classification | Statistical detection, matched filters, deep learning |
| Electronic health records | Phenotype identification, risk prediction | Logistic regression, gradient boosting, sequence models |

Many of these applications combine several techniques. A modern face recognition system, for example, may use a deep convolutional or transformer-based detector to localize faces in an image, an alignment step based on learned keypoints, and a learned embedding network whose outputs are compared using cosine similarity. A speech recognition system may combine an acoustic model, a pronunciation lexicon, and a language model in a single end-to-end network or in a hybrid pipeline.

## How does pattern recognition differ from machine learning?

Pattern recognition and machine learning are now closely intertwined, but they emerged from somewhat different intellectual traditions. Pattern recognition has its roots in engineering, signal processing, statistics, and the perceptual sciences, with early work motivated by concrete tasks such as character recognition, fingerprint matching, and radar target classification. Machine learning grew from artificial intelligence, computational learning theory, and the study of adaptive systems, with greater emphasis on inductive inference, search, and computational complexity.

The two communities have converged over time. Statistical learning theory, developed by Vladimir Vapnik and others, provided shared foundations such as VC dimension, structural risk minimization, and uniform convergence bounds. Common methods, including support vector machines, boosting, decision trees, kernel methods, and neural networks, are studied in both communities. Books and courses increasingly treat the subjects together: Bishop's 2006 textbook is titled "Pattern Recognition and Machine Learning," [12] and Kevin Murphy's 2012 book "Machine Learning: A Probabilistic Perspective" covers much of the same ground. [15]

Institutional structures reflect this convergence. The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is one of the leading venues for both communities. The Conference on Computer Vision and Pattern Recognition (CVPR) and the International Conference on Pattern Recognition (ICPR) have always combined pattern recognition with image analysis and machine learning, while NeurIPS and ICML, originally machine learning conferences, now publish substantial amounts of work on traditionally pattern recognition problems. In contemporary usage, the phrase "pattern recognition" is often a synonym or near-synonym for "applied machine learning," especially in vision, speech, and biometrics.

## Notable conferences and journals

The pattern recognition community supports a number of long-running conferences and journals, many of which are sponsored by the IAPR or the IEEE Computer Society.

| Venue | Type | Focus |
|-------|------|-------|
| ICPR (International Conference on Pattern Recognition) | Conference | General pattern recognition, biennial since 1973 |
| CVPR (Computer Vision and Pattern Recognition) | Conference | Computer vision and pattern analysis, annual |
| ICCV (International Conference on Computer Vision) | Conference | Computer vision, biennial |
| ECCV (European Conference on Computer Vision) | Conference | Computer vision, biennial |
| NeurIPS (Conference on Neural Information Processing Systems) | Conference | Machine learning, neural computation |
| ICML (International Conference on Machine Learning) | Conference | Machine learning theory and methods |
| ACCV (Asian Conference on Computer Vision) | Conference | Computer vision, IAPR sponsored |
| ICDAR (International Conference on Document Analysis and Recognition) | Conference | Document analysis, IAPR sponsored |
| ICASSP (International Conference on Acoustics, Speech, and Signal Processing) | Conference | Speech and signal processing |
| IEEE TPAMI | Journal | Pattern analysis and machine intelligence |
| Pattern Recognition (Elsevier) | Journal | General pattern recognition |
| Pattern Recognition Letters (Elsevier) | Journal | Short papers in pattern recognition |
| International Journal of Computer Vision (Springer) | Journal | Computer vision |
| Journal of Machine Learning Research | Journal | Machine learning |
| International Journal of Pattern Recognition and Artificial Intelligence | Journal | Pattern recognition and AI |

ICPR has been held biennially since its first meeting in Washington, D.C. in 1973. [5] Recent editions include ICPR 2024 in Kolkata, India and ICPR 2026 in Lyon, France.

## Notable books

A small number of textbooks have shaped how pattern recognition is taught and practiced. The 1973 book by Duda and Hart, [3] the second edition with David Stork in 2000, and Bishop's 2006 textbook [12] are widely used at the graduate level. Theodoridis and Koutroumbas provide an alternative treatment with detailed coverage of statistical methods, [13] and Murphy's book offers a unified probabilistic perspective on machine learning that overlaps heavily with classical pattern recognition. [15]

| Book | Authors | Edition and year | Publisher |
|------|---------|------------------|-----------|
| Pattern Classification and Scene Analysis | Richard O. Duda, Peter E. Hart | 1st edition, 1973 | Wiley |
| Pattern Classification | Richard O. Duda, Peter E. Hart, David G. Stork | 2nd edition, 2000 | Wiley |
| Pattern Recognition and Machine Learning | Christopher M. Bishop | 1st edition, 2006 | Springer |
| Pattern Recognition | Sergios Theodoridis, Konstantinos Koutroumbas | 4th edition, 2008 | Academic Press (Elsevier) |
| Machine Learning: A Probabilistic Perspective | Kevin P. Murphy | 1st edition, 2012 | MIT Press |
| The Elements of Statistical Learning | Trevor Hastie, Robert Tibshirani, Jerome Friedman | 2nd edition, 2009 | Springer |

The authors of these works, including [Christopher Bishop](/wiki/christopher_bishop), [Richard Duda](/wiki/richard_duda), and [Peter Hart](/wiki/peter_hart), are among the most widely cited figures in pattern recognition, and the broader literature includes major contributions by [Vladimir Vapnik](/wiki/vladimir_vapnik), [Frank Rosenblatt](/wiki/frank_rosenblatt), Yann LeCun, Geoffrey Hinton, Robert Schapire, Yoav Freund, Leo Breiman, and many others.

## References

1. Fisher, R. A. (1936). "The Use of Multiple Measurements in Taxonomic Problems." Annals of Eugenics, 7(2), 179-188. https://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x
2. Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain." Psychological Review, 65(6), 386-408. https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf
3. Duda, R. O., and Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: John Wiley and Sons. https://archive.org/details/patternclassific0000duda
4. International Association for Pattern Recognition. "History of IAPR." https://iapr.org/about-us/history-of-iapr/
5. International Association for Pattern Recognition. "International Conference on Pattern Recognition." https://iapr.org/conferences/international-conference-on-pattern-recognition/
6. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). "Backpropagation Applied to Handwritten Zip Code Recognition." Neural Computation, 1(4), 541-551. https://ieeexplore.ieee.org/document/6795724
7. Cortes, C., and Vapnik, V. (1995). "Support-Vector Networks." Machine Learning, 20(3), 273-297. https://link.springer.com/article/10.1007/BF00994018
8. Freund, Y., and Schapire, R. E. (1997). "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." Journal of Computer and System Sciences, 55(1), 119-139. https://www.sciencedirect.com/science/article/pii/S002200009791504X
9. Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant Keypoints." International Journal of Computer Vision, 60(2), 91-110. https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94
10. Viola, P., and Jones, M. (2001). "Rapid Object Detection Using a Boosted Cascade of Simple Features." Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), I-511 to I-518. https://ieeexplore.ieee.org/document/990517
11. Dalal, N., and Triggs, B. (2005). "Histograms of Oriented Gradients for Human Detection." Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). https://ieeexplore.ieee.org/document/1467360
12. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer. https://link.springer.com/book/9780387310732
13. Theodoridis, S., and Koutroumbas, K. (2008). Pattern Recognition (4th ed.). Burlington, MA: Academic Press. https://shop.elsevier.com/books/pattern-recognition/koutroumbas/978-1-59749-272-0
14. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks." Advances in Neural Information Processing Systems 25, 1097-1105. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
15. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/

