Pattern Recognition

Pattern recognition is the automatic discovery of regularities in data through the use of computer algorithms, with applications including the assignment of class labels to observations, the segmentation of images, the transcription of speech, and the detection of anomalies in measured signals. It is closely related to machine learning and statistical learning, and historically grew out of work in statistics, signal processing, and engineering aimed at building systems that could read printed characters, recognize spoken words, identify fingerprints, and classify radar returns.

A pattern recognition system typically takes raw measurements (pixels, audio samples, sensor readings, text), transforms them into a representation suitable for analysis through feature extraction or learned encodings, and then applies a decision procedure that maps representations to outputs such as discrete labels, real-valued predictions, or structured objects like sequences and graphs. The decision procedure may be designed by hand, derived from probabilistic models, or learned from examples. Modern pattern recognition systems are dominated by learned representations produced by deep neural networks, but classical statistical and structural methods remain important in domains where data are scarce, latency budgets are tight, or interpretability is required.

Pattern recognition spans a wide range of subjects, including computer vision, speech recognition, optical character recognition, bioinformatics, medical diagnostics, financial fraud detection, industrial inspection, and document analysis. Its central problems include classification, regression, clustering, sequence labeling, detection, and density estimation. The field has its own conferences and journals, such as the International Conference on Pattern Recognition (ICPR) and the journal Pattern Recognition, and it overlaps strongly with the broader machine learning, computer vision, and pattern analysis communities.

Definition and scope

The term pattern recognition refers to a family of computational problems in which a system must assign meaningful structure to observations. In the most common formulation, the system observes an input vector or sequence and produces a label drawn from a finite set, a real-valued estimate, a structured prediction such as a parse tree or segmentation mask, or a probability distribution over candidate outputs. The defining characteristic is that the mapping is learned or derived from data and statistical assumptions, rather than specified by an explicit set of hand-coded rules.

Pattern recognition shares its core tools and theory with supervised learning and unsupervised learning in machine learning, with statistical inference, and with signal processing. The field also covers structural and syntactic methods that operate on graphs, strings, and grammars rather than fixed-length feature vectors. Closely related areas include computer vision, where the inputs are images or video; natural language processing, where the inputs are text or speech transcripts; bioinformatics, where the inputs are biological sequences; and time series analysis, where the inputs are signals indexed by time.

In engineering practice, pattern recognition systems are usually evaluated on quantitative metrics such as classification accuracy, error rate, precision and recall, F1 score, area under the receiver operating characteristic curve, mean average precision, word error rate, or domain-specific scores. The combination of a problem definition, a dataset, and an evaluation protocol is sometimes called a benchmark, and progress in the field has often been driven by competitive evaluation on shared benchmarks.

History

Statistical and engineering roots

The statistical foundations of pattern recognition were laid in the first half of the 20th century. In 1936, the British statistician Ronald A. Fisher published "The Use of Multiple Measurements in Taxonomic Problems" in the Annals of Eugenics, introducing what is now called Fisher's linear discriminant. Fisher used measurements of iris flowers collected by Edgar Anderson to derive a linear combination of features that maximally separated two species, establishing one of the earliest formal classifiers and a key technique in linear discriminant analysis.

In the late 1950s, work on machine perception began to emerge as a distinct activity. Frank Rosenblatt, a research psychologist at the Cornell Aeronautical Laboratory in Buffalo, New York, described his perceptron in a 1957 technical report titled "The Perceptron: A Perceiving and Recognizing Automaton" and in a 1958 article in Psychological Review, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain." The perceptron was both a theoretical model and a hardware system, and it demonstrated that simple neural network models could be trained from examples to classify visual stimuli. The same period saw the construction of early optical character recognition machines for sorting mail and reading financial documents.

Establishment of the field

During the 1960s and early 1970s, pattern recognition consolidated into a recognizable research community. Statistical decision theory, Bayes classifiers, nearest neighbor rules, and feature selection methods were developed and applied to character recognition, speech, and remote sensing. In 1973, Richard O. Duda and Peter E. Hart published "Pattern Classification and Scene Analysis" with John Wiley and Sons. The book provided a unified treatment of statistical classification, parametric and nonparametric density estimation, clustering, and image analysis, and it became the standard graduate textbook in the field for the next two decades.

The community organized itself institutionally during the 1970s. The First International Joint Conference on Pattern Recognition was held October 30 to November 1, 1973 at the Mayflower Hotel in Washington, D.C., and the conference series later became the International Conference on Pattern Recognition (ICPR). The International Association for Pattern Recognition (IAPR) was founded in 1978, with formal organization completed at the 3rd International Joint Conference on Pattern Recognition in Coronado in 1976 and incorporation following soon after, under the leadership of Purdue University computer scientist King-Sun Fu.

Neural networks, kernels, and ensembles

The 1980s and 1990s saw a revival of neural network methods alongside the rise of statistical learning theory and ensemble approaches. Hidden Markov models became the dominant framework for speech recognition, supporting both acoustic modeling and word-level decoding. In 1989, Yann LeCun and colleagues published "Backpropagation Applied to Handwritten Zip Code Recognition," describing a convolutional neural network trained end to end with backpropagation on handwritten digits from the United States Postal Service database. The system achieved roughly 1 percent error on this task and represented an early real-world success for trainable convolutional architectures.

In 1995, Corinna Cortes and Vladimir Vapnik published "Support-Vector Networks" in Machine Learning, presenting the modern soft-margin support vector machine and demonstrating its effectiveness on optical character recognition benchmarks. The same year, Yoav Freund and Robert Schapire introduced AdaBoost in their EuroCOLT paper "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," with an extended journal version appearing in the Journal of Computer and System Sciences in 1997. Boosting and the closely related random forest algorithm, introduced by Leo Breiman in 2001, made ensemble methods a standard part of the practical pattern recognition toolkit.

Computer vision benchmarks and the deep learning era

In 2001, Paul Viola and Michael Jones published "Rapid Object Detection Using a Boosted Cascade of Simple Features" at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Their detector combined integral images, Haar-like features, and a cascade of AdaBoost classifiers to achieve real-time face detection on consumer hardware, and it became one of the most widely deployed computer vision algorithms of the 2000s. Hand-crafted features such as SIFT (David Lowe, 1999 and 2004) and HOG (Navneet Dalal and Bill Triggs, 2005) provided robust descriptors for object recognition, image matching, and pedestrian detection, often paired with linear classifiers or support vector machines.

In 2006, Christopher Bishop published "Pattern Recognition and Machine Learning" with Springer. The book gave a Bayesian-flavored treatment of probabilistic graphical models, kernel methods, mixture models, and approximate inference, and it became a standard reference at the graduate level alongside Duda and Hart.

The modern deep learning era is usually dated to 2012, when Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton's AlexNet won the ImageNet Large Scale Visual Recognition Challenge by a large margin, achieving a top-5 error rate of about 15.3 percent compared with 26.2 percent for the runner-up. Subsequent years saw the rapid adoption of deep convolutional networks across computer vision, recurrent and convolutional models for speech recognition that surpassed Gaussian mixture and hidden Markov baselines, and end-to-end deep learning for many traditional pattern recognition tasks. From 2017 onward, transformer architectures and large pretrained foundation models extended these gains to language, multimodal data, and increasingly to vision, blurring the boundary between deep learning and the rest of pattern recognition.

Tasks and types

Pattern recognition problems are usually grouped by the structure of their outputs and the form of the available supervision.

Task	Output type	Typical examples
Classification	Discrete label from a finite set	Digit recognition, spam detection, image categorization
Regression	Real-valued scalar or vector	Age estimation from images, dose prediction, signal regression
Clustering	Partition or hierarchy of data points	Customer segmentation, gene expression grouping
Density estimation	Probability distribution	Anomaly detection, generative modeling
Sequence labeling	Label per element in a sequence	Phoneme recognition, named entity recognition, part-of-speech tagging
Structured prediction	Parse trees, alignments, graphs	Syntactic parsing, optical music recognition
Detection	Set of objects with locations	Pedestrian detection, face detection, sound event detection
Segmentation	Per-pixel or per-token label	Medical image segmentation, semantic segmentation
Anomaly and novelty detection	Score or binary flag	Fraud detection, defect detection, intrusion detection

Classification is the canonical pattern recognition task and the focus of much of the classical literature. Regression and density estimation extend the framework to real-valued and probabilistic outputs. Clustering provides unsupervised structure when labels are absent. Sequence and structured prediction generalize classification to outputs with internal structure, while detection and segmentation are central to computer vision. Anomaly and novelty detection address situations in which only a poor sample of one class, or no labeled examples at all, are available.

Approaches

Pattern recognition methods can be grouped into broad families that differ in their assumptions, representations, and computational tools. Most modern systems combine ideas from several of these families.

Approach	Key idea	Representative methods
Statistical pattern recognition	Treat features as samples from probability distributions and apply decision theory	Bayes classifier, naive Bayes, Gaussian mixture models, linear and quadratic discriminant analysis, logistic regression
Structural and syntactic	Represent patterns as structured objects and use grammars or graph matching	String grammars, attributed graphs, edit distance, graph kernels
Template matching	Compare inputs to stored prototypes using a similarity measure	Cross-correlation, dynamic time warping, nearest prototype classifiers
Instance-based learning	Predict by looking up similar training examples	k-nearest neighbor, locally weighted regression
Kernel methods	Implicitly map inputs into high-dimensional spaces via kernel functions	Support vector machines, Gaussian processes, kernel ridge regression
Neural and deep methods	Learn layered representations directly from data	Perceptron, multilayer perceptron, convolutional neural network, recurrent neural network, transformer
Ensemble methods	Combine many classifiers to reduce variance or bias	Bagging, boosting (AdaBoost, gradient boosting), random forests, stacking
Decision tree methods	Recursively partition the input space	CART, ID3, C4.5, decision stumps

Statistical pattern recognition emphasizes probability models, including Bayesian inference, naive Bayes classifiers, and Gaussian mixture models. Structural and syntactic methods are useful for chemical structures, document layouts, and other domains where patterns are naturally graphs or strings. Template matching and the k-nearest neighbors classifier illustrate how very simple ideas can be effective in low-dimensional, well-curated problems. Decision tree methods and their ensembles, including boosting and bagging, often serve as strong baselines on tabular data. Kernel methods and support vector machines dominated many benchmarks in the 1990s and 2000s, while neural and deep methods are now the default in vision, speech, and language.

Feature extraction and representation

The choice of representation has long been recognized as central to pattern recognition. Two broad strategies have shaped the field: hand-crafted feature extraction and learned representations.

Hand-crafted features

In classical pipelines, raw measurements were transformed into feature vectors using domain knowledge. For images, edge maps, color histograms, Gabor filters, Haar-like features, and gradient-based descriptors were all common. SIFT, introduced by David Lowe in 1999 and developed in detail in his 2004 paper "Distinctive Image Features from Scale-Invariant Keypoints" in the International Journal of Computer Vision, provided keypoints and descriptors invariant to scale and rotation and robust to moderate viewpoint and illumination changes. HOG, introduced by Navneet Dalal and Bill Triggs in their 2005 CVPR paper "Histograms of Oriented Gradients for Human Detection," computed local histograms of gradient orientations on a dense grid and proved especially effective for pedestrian detection.

For audio, mel-frequency cepstral coefficients (MFCCs) became the standard front end for speech recognition and speaker identification, summarizing the short-term spectrum on a perceptual frequency scale. For text, hand-crafted features included bag-of-words representations, n-gram statistics, term frequency inverse document frequency weights, and part-of-speech tags. Other domains contributed their own feature engineering traditions, ranging from k-mer profiles in bioinformatics to wavelet coefficients in time-series analysis. The construction of such features is often called feature engineering and remains important in domains where labeled data are limited.

Dimensionality reduction

High-dimensional feature vectors are often projected into lower-dimensional spaces to reduce noise, computation, and the risk of overfitting. Principal component analysis finds orthogonal linear directions of maximum variance and underlies many classical pipelines. Linear discriminant analysis seeks projections that maximize between-class variance relative to within-class variance and is often used as a supervised counterpart to PCA. Nonlinear methods include kernel PCA, isomap, locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). These methods are especially useful for visualization and exploratory analysis.

Learned representations

A recurring theme in modern pattern recognition is the replacement of hand-crafted features by representations learned from data. Multilayer neural networks already learn intermediate features as a byproduct of training. Autoencoders explicitly learn compact codes that reconstruct the input, and denoising and variational variants impose additional structure on the latent space. Self-supervised pretraining, including masked language modeling, masked image modeling, and contrastive objectives such as those used in SimCLR and CLIP, learns reusable representations from unlabeled data that can then be adapted to many downstream tasks. In contemporary systems, the boundary between feature extraction and classification has largely disappeared: a single deep network is trained end to end to map inputs directly to outputs.

Classifier evaluation

Evaluating a pattern recognition system requires both a clear definition of the problem and a careful experimental protocol. Evaluation typically takes place on a held-out test set, separate from the data used to fit the model and to choose hyperparameters.

Metric	Definition	Used for
Accuracy	Fraction of correctly labeled examples	Balanced classification problems
Precision	True positives divided by predicted positives	Information retrieval, detection
Recall (sensitivity)	True positives divided by actual positives	Medical screening, search
F1 score	Harmonic mean of precision and recall	Imbalanced classification
Specificity	True negatives divided by actual negatives	Diagnostic testing
ROC curve and AUC	True positive rate versus false positive rate over thresholds	Threshold-independent evaluation
Precision-recall curve	Precision versus recall over thresholds	Imbalanced or rare-event classification
Confusion matrix	Counts of true and predicted labels	Multi-class diagnostic analysis
Log loss	Negative log probability assigned to true labels	Probabilistic classifiers
Word error rate	Edit distance per reference word	Speech recognition
Mean average precision	Average area under precision-recall curves	Object detection, retrieval

The confusion matrix is a fundamental diagnostic tool, summarizing how often each true class is predicted as each other class. Receiver operating characteristic curves plot true positive rate against false positive rate as the decision threshold is varied, with area under the curve (AUC) providing a single threshold-independent summary.

Resampling techniques are used to make efficient use of finite data. In k-fold cross-validation, the data are split into k disjoint folds; the model is trained on k minus one folds and evaluated on the held-out fold, repeated k times. Leave-one-out cross-validation is the limiting case in which each example serves as its own test fold. Stratified cross-validation preserves class proportions across folds. A train, validation, and test split is the canonical setup in modern practice, with the validation set used to tune hyperparameters and the test set reserved for a final unbiased evaluation.

The bias-variance tradeoff describes how the expected generalization error of a model decomposes into bias from systematic mismatch with the true function, variance from sensitivity to the training sample, and irreducible noise. Methods such as regularization, ensemble averaging, and early stopping aim to navigate this tradeoff. Statistical significance tests, such as McNemar's test for paired classifier comparisons, are used to assess whether observed performance differences are likely to be real.

Applications

Pattern recognition powers a wide range of practical systems. The table below lists representative application areas and characteristic methods.

Application	Description	Common methods
Optical character recognition	Conversion of printed or handwritten text in images to machine-readable form	CNNs, sequence models, hidden Markov models
Speech recognition	Transcription of spoken language into text	Hidden Markov models, deep neural acoustic models, end-to-end transformers
Face recognition	Identification or verification of individuals from facial images	CNN embeddings, metric learning, Viola-Jones for detection
Fingerprint recognition	Matching fingerprint impressions for identification	Minutiae extraction, ridge orientation analysis, neural matchers
Iris recognition	Biometric identification from iris texture	Gabor filtering, Hamming distance on iris codes
Handwriting recognition	Recognition of handwritten characters and words, online or offline	CNNs, recurrent networks, connectionist temporal classification
Medical image analysis	Detection, segmentation, and diagnosis from radiology, pathology, and other images	CNNs, U-Net, vision transformers, classical filters
Document analysis	Layout understanding, table recognition, form processing	Graph neural networks, sequence models, classical OCR pipelines
Bioinformatics	Sequence motif discovery, structure prediction, omics analysis	Hidden Markov models, profile HMMs, neural language models
Financial fraud detection	Identification of anomalous transactions or accounts	Gradient boosting, autoencoders, graph methods, rule-based systems
Industrial inspection	Detection of manufacturing defects in images or sensor data	CNNs, anomaly detection, classical machine vision
Remote sensing	Land cover classification and change detection from satellite imagery	Random forests, CNNs, time-series models
Radar and sonar	Target detection and classification	Statistical detection, matched filters, deep learning
Electronic health records	Phenotype identification, risk prediction	Logistic regression, gradient boosting, sequence models

Many of these applications combine several techniques. A modern face recognition system, for example, may use a deep convolutional or transformer-based detector to localize faces in an image, an alignment step based on learned keypoints, and a learned embedding network whose outputs are compared using cosine similarity. A speech recognition system may combine an acoustic model, a pronunciation lexicon, and a language model in a single end-to-end network or in a hybrid pipeline.

Pattern recognition versus machine learning

Pattern recognition and machine learning are now closely intertwined, but they emerged from somewhat different intellectual traditions. Pattern recognition has its roots in engineering, signal processing, statistics, and the perceptual sciences, with early work motivated by concrete tasks such as character recognition, fingerprint matching, and radar target classification. Machine learning grew from artificial intelligence, computational learning theory, and the study of adaptive systems, with greater emphasis on inductive inference, search, and computational complexity.

The two communities have converged over time. Statistical learning theory, developed by Vladimir Vapnik and others, provided shared foundations such as VC dimension, structural risk minimization, and uniform convergence bounds. Common methods, including support vector machines, boosting, decision trees, kernel methods, and neural networks, are studied in both communities. Books and courses increasingly treat the subjects together; Bishop's 2006 textbook is titled "Pattern Recognition and Machine Learning," and Kevin Murphy's 2012 book "Machine Learning: A Probabilistic Perspective" covers much of the same ground.

Institutional structures reflect this convergence. The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is one of the leading venues for both communities. The Conference on Computer Vision and Pattern Recognition (CVPR) and the International Conference on Pattern Recognition (ICPR) have always combined pattern recognition with image analysis and machine learning, while NeurIPS and ICML, originally machine learning conferences, now publish substantial amounts of work on traditionally pattern recognition problems. In contemporary usage, the phrase "pattern recognition" is often a synonym or near-synonym for "applied machine learning," especially in vision, speech, and biometrics.

Notable conferences and journals

The pattern recognition community supports a number of long-running conferences and journals, many of which are sponsored by the IAPR or the IEEE Computer Society.

Venue	Type	Focus
ICPR (International Conference on Pattern Recognition)	Conference	General pattern recognition, biennial since 1973
CVPR (Computer Vision and Pattern Recognition)	Conference	Computer vision and pattern analysis, annual
ICCV (International Conference on Computer Vision)	Conference	Computer vision, biennial
ECCV (European Conference on Computer Vision)	Conference	Computer vision, biennial
NeurIPS (Conference on Neural Information Processing Systems)	Conference	Machine learning, neural computation
ICML (International Conference on Machine Learning)	Conference	Machine learning theory and methods
ACCV (Asian Conference on Computer Vision)	Conference	Computer vision, IAPR sponsored
ICDAR (International Conference on Document Analysis and Recognition)	Conference	Document analysis, IAPR sponsored
ICASSP (International Conference on Acoustics, Speech, and Signal Processing)	Conference	Speech and signal processing
IEEE TPAMI	Journal	Pattern analysis and machine intelligence
Pattern Recognition (Elsevier)	Journal	General pattern recognition
Pattern Recognition Letters (Elsevier)	Journal	Short papers in pattern recognition
International Journal of Computer Vision (Springer)	Journal	Computer vision
Journal of Machine Learning Research	Journal	Machine learning
International Journal of Pattern Recognition and Artificial Intelligence	Journal	Pattern recognition and AI

ICPR has been held biennially since its first meeting in Washington, D.C. in 1973. Recent editions include ICPR 2024 in Kolkata, India and ICPR 2026 in Lyon, France.

Notable books

A small number of textbooks have shaped how pattern recognition is taught and practiced. The 1973 book by Duda and Hart, the second edition with David Stork in 2000, and Bishop's 2006 textbook are widely used at the graduate level. Theodoridis and Koutroumbas provide an alternative treatment with detailed coverage of statistical methods, and Murphy's book offers a unified probabilistic perspective on machine learning that overlaps heavily with classical pattern recognition.

Book	Authors	Edition and year	Publisher
Pattern Classification and Scene Analysis	Richard O. Duda, Peter E. Hart	1st edition, 1973	Wiley
Pattern Classification	Richard O. Duda, Peter E. Hart, David G. Stork	2nd edition, 2000	Wiley
Pattern Recognition and Machine Learning	Christopher M. Bishop	1st edition, 2006	Springer
Pattern Recognition	Sergios Theodoridis, Konstantinos Koutroumbas	4th edition, 2008	Academic Press (Elsevier)
Machine Learning: A Probabilistic Perspective	Kevin P. Murphy	1st edition, 2012	MIT Press
The Elements of Statistical Learning	Trevor Hastie, Robert Tibshirani, Jerome Friedman	2nd edition, 2009	Springer

The authors of these works, including Christopher Bishop, Richard Duda, and Peter Hart, are among the most widely cited figures in pattern recognition, and the broader literature includes major contributions by Vladimir Vapnik, Frank Rosenblatt, Yann LeCun, Geoffrey Hinton, Robert Schapire, Yoav Freund, Leo Breiman, and many others.

References

Fisher, R. A. (1936). "The Use of Multiple Measurements in Taxonomic Problems." Annals of Eugenics, 7(2), 179-188. https://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x
Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain." Psychological Review, 65(6), 386-408. https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf
Duda, R. O., and Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York: John Wiley and Sons. https://archive.org/details/patternclassific0000duda
International Association for Pattern Recognition. "History of IAPR." https://iapr.org/about-us/history-of-iapr/
International Association for Pattern Recognition. "International Conference on Pattern Recognition." https://iapr.org/conferences/international-conference-on-pattern-recognition/
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). "Handwritten Digit Recognition with a Back-Propagation Network." Advances in Neural Information Processing Systems 2. https://proceedings.neurips.cc/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf
Cortes, C., and Vapnik, V. (1995). "Support-Vector Networks." Machine Learning, 20(3), 273-297. https://link.springer.com/article/10.1007/BF00994018
Freund, Y., and Schapire, R. E. (1997). "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." Journal of Computer and System Sciences, 55(1), 119-139. https://www.sciencedirect.com/science/article/pii/S002200009791504X
Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant Keypoints." International Journal of Computer Vision, 60(2), 91-110. https://link.springer.com/article/10.1023/B:VISI.0000029664.99615.94
Viola, P., and Jones, M. (2001). "Rapid Object Detection Using a Boosted Cascade of Simple Features." Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), I-511 to I-518. https://ieeexplore.ieee.org/document/990517
Dalal, N., and Triggs, B. (2005). "Histograms of Oriented Gradients for Human Detection." Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). https://ieeexplore.ieee.org/document/1467360
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer. https://link.springer.com/book/9780387310732
Theodoridis, S., and Koutroumbas, K. (2008). Pattern Recognition (4th ed.). Burlington, MA: Academic Press. https://shop.elsevier.com/books/pattern-recognition/koutroumbas/978-1-59749-272-0
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks." Advances in Neural Information Processing Systems 25, 1097-1105. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/

Definition and scope

History

Statistical and engineering roots

Establishment of the field

Neural networks, kernels, and ensembles

Computer vision benchmarks and the deep learning era

Tasks and types

Approaches

Feature extraction and representation

Hand-crafted features

Dimensionality reduction

Learned representations

Classifier evaluation

Applications

Pattern recognition versus machine learning

Notable conferences and journals

Notable books

References

Improve this article

Related Articles

ARC-AGI 2

Machine learning terms/Fairness

Open-source AI

Data Science

Artificial Intelligence

Machine Learning

Definition and scope

History

Statistical and engineering roots

Establishment of the field

Neural networks, kernels, and ensembles

Computer vision benchmarks and the deep learning era

Tasks and types

Approaches

Feature extraction and representation

Hand-crafted features

Dimensionality reduction

Learned representations

Classifier evaluation

Applications

Pattern recognition versus machine learning

Notable conferences and journals

Notable books

References

Related Articles

ARC-AGI 2

Machine learning terms/Fairness

Open-source AI

Data Science

Artificial Intelligence

Machine Learning