See also: Terms and Machine learning
This glossary collects the core vocabulary used across machine learning research, engineering, and applied data science. Entries are organized first into subcategory hubs that link to dedicated topic glossaries, then into a curated set of bolded terms with concise one-line definitions. Definitions follow the conventions used by the Google ML Glossary, the Wikipedia machine learning category, and standard textbooks such as Goodfellow et al.'s Deep Learning and Bishop's Pattern Recognition and Machine Learning. For the complete alphabetical index of every term, see the Machine learning terms/All page.
Subcategory hubs
The following hub pages organize ML vocabulary by topic area. Each hub contains a focused glossary for that domain along with related concepts and methods.
Fundamentals
Fundamentals covers the building blocks of supervised and unsupervised learning, including features, labels, examples, training, loss, regularization, gradient descent, hyperparameters, and evaluation metrics such as accuracy, precision, and recall. This is the recommended starting point for newcomers to ML.
Natural language processing
Natural Language Processing and Language Evaluation cover terms for working with text: tokenization, embeddings, attention, transformers, language models, sequence-to-sequence tasks, and quality metrics like BLEU and perplexity.
Fairness
Fairness addresses bias, demographic disparities, and ethical considerations in ML systems. It includes fairness metrics, sensitive attributes, disparate impact, and approaches to mitigating bias through pre-processing, in-processing, and post-processing techniques.
Decision forests
Decision Forests covers tree-based models such as decision trees, random forests, and gradient boosted trees, along with their splitting rules, ensemble methods, feature importance measures, and out-of-bag evaluation.
Reinforcement learning
Reinforcement Learning covers agents that learn by interacting with environments to maximize cumulative reward. Terms include policy, state, action, value function, Q-learning, Markov decision process, and Deep Q-Networks.
Computer vision
Computer Vision and Image Models cover the processing of pixels and visual scenes: convolutional networks, pooling, bounding boxes, intersection over union, image augmentation, and architectures for classification, detection, and segmentation.
Sequence models
Sequence Models covers architectures for ordered data such as recurrent neural networks, LSTMs, attention mechanisms, transformers, and time series methods.
Clustering
Clustering covers unsupervised grouping algorithms including k-means, k-median, agglomerative, divisive, hierarchical, and centroid-based clustering, along with similarity measures and centroid concepts.
Recommendation systems
Recommendation Systems covers techniques for predicting user preferences, including collaborative filtering, matrix factorization, candidate generation, scoring, re-ranking, and the user and item matrices used by recommender models.
TensorFlow and Google Cloud
TensorFlow covers the open-source ML framework's vocabulary including tensors, graphs, sessions, estimators, and the Keras and Layers APIs. Google Cloud covers Cloud TPU, TPU Pods, and other managed infrastructure for training and serving models.
Key terms with definitions
A curated set of the most frequently referenced ML terms, organized by topic. For the complete alphabetical list, see Machine learning terms/All.
Fundamentals and training
- accuracy: Correct predictions divided by total predictions; can mislead on class-imbalanced datasets.
- backpropagation: Algorithm computing gradients of the loss with respect to each weight by chain-rule traversal of the network.
- batch: Set of examples processed in a single training iteration.
- batch size: Number of examples in one batch.
- cross-entropy: Loss measuring the difference between two probability distributions; generalizes log loss to multiple classes.
- cross-validation: Resampling procedure that estimates model performance by repeatedly splitting data into training and validation folds.
- early stopping: Regularization technique that halts training when validation loss stops improving.
- epoch: One full pass over the training set during training.
- example: Single row of data, comprising features and optionally a label.
- feature: Input variable used by a model to make predictions.
- feature engineering: Process of selecting and transforming raw data into features suitable for modeling.
- generalization: Model's ability to perform well on examples not seen during training.
- gradient descent: Optimization algorithm that iteratively updates parameters in the direction of steepest descent of the loss.
- hyperparameter: Configuration value set by the practitioner before training, such as learning rate or batch size.
- label: Target value associated with an example in supervised learning.
- learning rate: Scalar that multiplies the gradient when updating parameters.
- loss: Number expressing how far a model's predictions are from the labels.
- mini-batch: Small subset of training examples used in a single gradient update.
- model: Function learned from data that maps inputs to outputs.
- optimizer: Algorithm that updates model parameters to reduce loss, such as SGD, Adam, or AdaGrad.
- overfitting: Model fitting training data so closely that it fails to generalize to new data.
- regularization: Technique that penalizes model complexity to reduce overfitting.
- semi-supervised learning: Training using a mix of labeled and unlabeled data.
- stochastic gradient descent (SGD): Gradient descent using estimated gradients from a single example or mini-batch.
- supervised machine learning: Training using labeled examples to learn an input-to-output mapping.
- test set: Held-out data used to evaluate the final model.
- training set: Data used to fit model parameters.
- transfer learning: Reusing knowledge learned on one task to improve learning on another.
- underfitting: Model too simple to capture the relationship between features and labels.
- unsupervised machine learning: Training without labels, discovering patterns or structure in data.
- validation set: Held-out data used during training to tune hyperparameters.
- weight: Coefficient that multiplies an input in a model.
Neural networks and deep learning
- activation function: Nonlinear function applied at each neuron that lets a network learn nonlinear relationships.
- attention: Mechanism that weights different parts of an input when producing each part of an output.
- batch normalization: Layer that normalizes activations across a mini-batch, often speeding training and acting as regularization.
- convolutional neural network: Neural network built from convolutional layers, commonly used for images.
- deep neural network: Neural network with two or more hidden layers.
- dropout regularization: Regularization that randomly zeroes out a fraction of neurons during training.
- embedding vector: Dense vector representation of a discrete input such as a word.
- exploding gradient problem: Training instability caused by gradients growing unboundedly large.
- fine tuning: Continuing training of a pretrained model on a task-specific dataset.
- hidden layer: Neural network layer between the input and output layers.
- Long Short-Term Memory (LSTM): Recurrent neural network cell with gating designed to capture long-range dependencies.
- neural network: Model composed of layers of interconnected units that compute nonlinear functions of their inputs.
- Rectified Linear Unit (ReLU): Activation function that outputs max(0, x).
- recurrent neural network: Neural network that processes sequences by maintaining a hidden state across time steps.
- self-attention: Attention mechanism in which queries, keys, and values all come from the same sequence.
- softmax: Function that converts a vector of logits into a probability distribution over classes.
- Transformer: Neural network architecture built on self-attention, the basis for most large language models.
- vanishing gradient problem: Training difficulty in which gradients become very small in early layers, slowing or stopping learning.
Classification and metrics
- AUC (Area under the ROC curve): Probability that a randomly chosen positive ranks above a randomly chosen negative.
- binary classification: Classification task with two possible label values.
- confusion matrix: NxN table showing counts of predicted versus actual class for a classifier.
- F1 score: Harmonic mean of precision and recall, balancing the two metrics in a single number.
- logistic regression: Classification model that applies a sigmoid to a linear combination of features.
- multi-class classification: Classification task with more than two possible classes.
- precision: Fraction of positive predictions that are correct.
- recall: Fraction of actual positives the model correctly predicted as positive.
- ROC curve: Plot of true positive rate against false positive rate across classification thresholds.
NLP and language models
Reinforcement learning
- Bellman equation: Recursive equation expressing the value of a state in terms of the expected reward and the value of successor states.
- Deep Q-Network (DQN): Reinforcement learning algorithm using a deep neural network to approximate the Q-function.
- Markov decision process (MDP): Mathematical framework for sequential decision making under uncertainty.
- policy: Mapping from states to actions or to a distribution over actions.
- Q-learning: Reinforcement learning algorithm that estimates the optimal action-value function via temporal difference updates.
- reinforcement learning (RL): Paradigm in which an agent learns to act in an environment to maximize cumulative reward.
- reward: Scalar signal that a reinforcement learning agent tries to maximize.
- state: Description of the environment at a given time in reinforcement learning.
Trees, ensembles, and clustering
- bagging: Ensemble method training each model on a bootstrap sample of the data, used in random forests.
- boosting: Sequential ensemble method that combines weak learners, upweighting examples earlier models got wrong.
- decision tree: Supervised model routing examples through a tree of conditions to reach a leaf prediction.
- ensemble: Collection of models whose predictions are combined for a final output.
- gradient boosting: Boosting technique in which each new model fits the gradient of the loss from the current ensemble.
- k-means: Clustering algorithm partitioning data into k clusters around mean centroids.
- random forest: Ensemble of decision trees trained on bootstrap samples with attribute sampling.
Computer vision
- bounding box: Rectangle described by image coordinates that encloses an object of interest.
- convolution: Mathematical operation that slides a filter over an input to produce a feature map.
- data augmentation: Artificially enlarging the training set by transforming existing examples, such as rotating images.
- image recognition: Task of identifying objects, scenes, or other content in images.
- intersection over union (IoU): Overlap metric for two regions equal to area of intersection divided by area of union.
- pooling: Downsampling layer that aggregates spatial regions, such as max or average pooling.
Fairness
- demographic parity: Fairness criterion requiring equal positive prediction rates across demographic groups.
- disparate impact: Adverse effect of a decision system on a protected group, even without explicit discriminatory intent.
- equality of opportunity: Fairness criterion requiring equal true positive rates across protected groups.
- equalized odds: Fairness criterion requiring equal true positive and false positive rates across groups.
- fairness metric: Quantitative measure of how a model's outcomes differ across groups.
- sensitive attribute: Feature such as race, gender, or age that may be protected from use in decisions.
Recommendation and frameworks
- collaborative filtering: Recommendation technique predicting user preferences based on the preferences of similar users.
- Keras: High-level neural network API integrated into TensorFlow.
- matrix factorization: Decomposing a matrix into a product of two lower-rank matrices, used in recommendation.
- NumPy: Python library for numerical arrays and linear algebra.
- pandas: Python library for tabular data manipulation built on top of NumPy.
- recommendation system: System that suggests items to users based on their preferences or behaviors.
- scikit-learn: Python library offering classical machine learning algorithms and preprocessing tools.
- Tensor: Multidimensional array, the basic data structure in TensorFlow.
- TensorFlow: Open-source platform for building and deploying machine learning models.
- Tensor Processing Unit (TPU): Google's custom ASIC accelerator for machine learning workloads.
Generative models
- diffusion model: Generative model that learns to reverse a gradual noising process to produce data.
- discriminator: Network in a GAN that learns to distinguish real examples from those produced by the generator.
- generative adversarial network (GAN): Two-network system where a generator produces samples and a discriminator tries to detect them.
- generative model: Model that learns to produce new examples resembling the training distribution.
- generator: Network in a GAN that produces synthetic examples intended to look real.
- multimodal model: Model that ingests or produces more than one modality, such as text and images.
Additional commonly used terms
A short list of widely used ML terms not in the historic glossary, included for completeness.
See also
References