# Machine learning terms/All

> Source: https://aiwiki.ai/wiki/machine_learning_terms_all
> Updated: 2026-07-16
> Categories: Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

This alphabetical glossary collects core terminology used across machine learning, deep learning, reinforcement learning, large language models, and TensorFlow tooling. Each entry includes a short definition drawn from references such as the Google Machine Learning Glossary and Wikipedia.[1][2] Terms are grouped by first letter for easier scanning.

## A

- **[A/B testing](/wiki/a_b_testing)**: Comparing two variants on randomly assigned users.
- **[accuracy](/wiki/accuracy)**: Fraction of correct predictions.
- **[action](/wiki/action)**: An RL agent's choice changing environment state.
- **[activation function](/wiki/activation_function)**: Nonlinearity applied to a neuron's weighted sum.
- **[active learning](/wiki/active_learning)**: Training where the model picks which examples to label.
- **[AdaGrad](/wiki/adagrad)**: Adaptive per-parameter learning rate optimizer.
- **[agent](/wiki/agent)**: RL entity acting in an environment for reward.
- **[agglomerative clustering](/wiki/agglomerative_clustering)**: Bottom-up hierarchical clustering.
- **[anomaly detection](/wiki/anomaly_detection)**: Spotting data points unlike typical patterns.
- **[AR](/wiki/ar)**: Augmented reality.
- **[area under the PR curve](/wiki/area_under_the_pr_curve)**: Summary of precision-recall tradeoff.
- **[area under the ROC curve](/wiki/area_under_the_roc_curve)**: Probability classifier ranks a positive above a negative.
- **[artificial general intelligence](/wiki/artificial_general_intelligence)**: Hypothetical human-level general AI.
- **[artificial intelligence](/wiki/artificial_intelligence)**: Field building machines that reason or perceive.
- **[attention](/wiki/attention)**: Mechanism weighting input parts when producing output.
- **[attention mechanism](/wiki/attention_mechanism)**: Query-key-value weighted sum used in transformers.
- **[attribute](/wiki/attribute)**: Synonym for feature.
- **[attribute sampling](/wiki/attribute_sampling)**: Random feature subset chosen at each tree split.
- **[AUC (Area under the ROC curve)](/wiki/auc_area_under_the_roc_curve)**: Threshold-independent classifier metric.
- **[augmented reality](/wiki/augmented_reality)**: Computer imagery overlaid on a real-world view.
- **[automation bias](/wiki/automation_bias)**: Favoring automated suggestions over other sources.
- **[autoencoder](/wiki/autoencoder)**: Network learning to reconstruct input through a bottleneck.
- **[average precision](/wiki/average_precision)**: Single-number precision-recall curve summary.
- **[axis-aligned condition](/wiki/axis-aligned_condition)**: Tree split on one feature against a threshold.

## B

- **[backpropagation](/wiki/backpropagation)**: Chain-rule gradient computation through a network.
- **[bagging](/wiki/bagging)**: Ensemble averaging models trained on bootstrapped subsets.
- **[bag of words](/wiki/bag_of_words)**: Text representation counting words, ignoring order.
- **[baseline](/wiki/baseline)**: Simple reference model for comparison.
- **[batch](/wiki/batch)**: Group of examples processed together.
- **[batch normalization](/wiki/batch_normalization)**: Normalizing activations per batch to stabilize training.
- **[batch size](/wiki/batch_size)**: Examples per gradient update.
- **[Bayesian neural network](/wiki/bayesian_neural_network)**: Network with distributions over weights.
- **[Bayesian optimization](/wiki/bayesian_optimization)**: Hyperparameter tuning with a probabilistic surrogate.
- **[Bellman equation](/wiki/bellman_equation)**: Recursive equation for optimal state value.
- **[BERT (Bidirectional Encoder Representations from Transformers)](/wiki/bert_bidirectional_encoder_representations_from_transformers)**: Transformer pretrained with masked language modeling.
- **[bias (ethics/fairness)](/wiki/bias_ethics_fairness)**: Unfair model preference across groups.
- **[bias (math) or bias term](/wiki/bias_math_or_bias_term)**: Learned constant added to a weighted sum.
- **[bigram](/wiki/bigram)**: Pair of adjacent tokens.
- **[bidirectional](/wiki/bidirectional)**: Processing sequences from both directions.
- **[bidirectional language model](/wiki/bidirectional_language_model)**: Model using context on both sides of each token.
- **[binary classification](/wiki/binary_classification)**: Classification with two possible classes.
- **[binary condition](/wiki/binary_condition)**: Tree test with two outcomes.
- **[binning](/wiki/binning)**: Mapping continuous values into discrete buckets.
- **[BLEU (Bilingual Evaluation Understudy)](/wiki/bleu_bilingual_evaluation_understudy)**: Translation metric on n-gram overlap.
- **[boosting](/wiki/boosting)**: Sequential ensemble correcting prior errors.
- **[bounding box](/wiki/bounding_box)**: Rectangle around a detected object.
- **[broadcasting](/wiki/broadcasting)**: Aligning tensor shapes for element-wise math.
- **[bucketing](/wiki/bucketing)**: Synonym for binning.

## C

- **[calibration layer](/wiki/calibration_layer)**: Adjusts predicted probabilities to match observed frequencies.
- **[candidate generation](/wiki/candidate_generation)**: First recommender stage selecting an item subset.
- **[candidate sampling](/wiki/candidate_sampling)**: Using a subset of negative classes in loss.
- **[categorical data](/wiki/categorical_data)**: Features with a small set of discrete values.
- **[causal language model](/wiki/causal_language_model)**: Predicts next token from past context only.
- **[centroid](/wiki/centroid)**: Center point of a cluster.
- **[centroid-based clustering](/wiki/centroid-based_clustering)**: Clustering represented by central points.
- **[chain of thought](/wiki/chain_of_thought)**: Prompting that elicits intermediate reasoning steps.
- **[checkpoint](/wiki/checkpoint)**: Saved snapshot of model parameters.
- **[class](/wiki/class)**: One discrete output category.
- **[classification model](/wiki/classification_model)**: Model predicting class labels.
- **[classification threshold](/wiki/classification_threshold)**: Probability cutoff for the positive class.
- **[class-imbalanced dataset](/wiki/class-imbalanced_dataset)**: Dataset with very unequal class frequencies.
- **[clipping](/wiki/clipping)**: Bounding values within a fixed range.
- **[Cloud TPU](/wiki/cloud_tpu)**: Google's cloud-hosted TPU service.[1]
- **[clustering](/wiki/clustering)**: Unsupervised grouping of similar examples.
- **[co-adaptation](/wiki/co-adaptation)**: Neurons depending on specific neighbors.
- **[collaborative filtering](/wiki/collaborative_filtering)**: Recommendations from patterns of many users.
- **[condition](/wiki/condition)**: Tree feature test choosing the next branch.
- **[confirmation bias](/wiki/confirmation_bias)**: Interpreting evidence to confirm existing beliefs.
- **[confusion matrix](/wiki/confusion_matrix)**: Table of predicted versus actual labels.
- **[continuous feature](/wiki/continuous_feature)**: Feature with any real value in an interval.
- **[convenience sampling](/wiki/convenience_sampling)**: Sampling from easily accessible sources.
- **[convergence](/wiki/convergence)**: State where further training barely changes loss.
- **[convex function](/wiki/convex_function)**: Function whose graph lies below any chord.
- **[convex optimization](/wiki/convex_optimization)**: Minimizing convex functions over convex sets.
- **[convex set](/wiki/convex_set)**: Set containing every segment between its points.
- **[convolution](/wiki/convolution)**: Sliding a filter over data to produce feature maps.
- **[convolutional filter](/wiki/convolutional_filter)**: Small weight matrix detecting a feature.
- **[convolutional layer](/wiki/convolutional_layer)**: Layer applying convolutional filters.
- **[convolutional neural network](/wiki/convolutional_neural_network)**: Network using convolutional layers, common for images.
- **[convolutional operation](/wiki/convolutional_operation)**: Multiply-and-sum as a filter slides over input.
- **[cost](/wiki/cost)**: Synonym for loss.
- **[co-training](/wiki/co-training)**: Two models on different views label data for each other.
- **[counterfactual fairness](/wiki/counterfactual_fairness)**: Predictions unchanged under altered sensitive attributes.
- **[coverage bias](/wiki/coverage_bias)**: Bias when sampling omits relevant subgroups.
- **[crash blossom](/wiki/crash_blossom)**: Ambiguous headline illustrating language parsing challenges.
- **[critic](/wiki/critic)**: Actor-critic value estimator guiding the actor.
- **[cross-entropy](/wiki/cross-entropy)**: Loss comparing predicted and true distributions.
- **[cross-validation](/wiki/cross-validation)**: Rotating which data fold is held out.

## D

- **[data analysis](/wiki/data_analysis)**: Inspecting data to guide modeling.
- **[data augmentation](/wiki/data_augmentation)**: Generating extra examples by transformation.
- **[DataFrame](/wiki/dataframe)**: Tabular labeled data structure used in pandas.
- **[data parallelism](/wiki/data_parallelism)**: Replicating a model across devices splitting the batch.
- **[data set or dataset](/wiki/data_set_or_dataset)**: Collection of examples used for training or evaluation.
- **[Dataset API (tf.data)](/wiki/dataset_api_tf_data)**: TensorFlow API for input pipelines.[1]
- **[decision boundary](/wiki/decision_boundary)**: Surface separating predicted classes.
- **[decision forest](/wiki/decision_forest)**: Ensemble of decision trees.
- **[decision threshold](/wiki/decision_threshold)**: Synonym for classification threshold.
- **[decision tree](/wiki/decision_tree)**: Model with branching feature tests.
- **[deep model](/wiki/deep_model)**: Network with many hidden layers.
- **[decoder](/wiki/decoder)**: Encoder-decoder part generating outputs from a representation.
- **[deep neural network](/wiki/deep_neural_network)**: Network with many stacked hidden layers.[4]
- **[Deep Q-Network (DQN)](/wiki/deep_q-network_dqn)**: Deep network approximating the Q-function.
- **[demographic parity](/wiki/demographic_parity)**: Equal positive prediction rates across groups.
- **[denoising](/wiki/denoising)**: Removing noise or reconstructing clean signal.
- **[dense feature](/wiki/dense_feature)**: Feature with mostly nonzero values.
- **[dense layer](/wiki/dense_layer)**: Layer connecting every input to every output.
- **[depth](/wiki/depth)**: Number of layers in a network.
- **[depthwise separable convolutional neural network (sepCNN)](/wiki/depthwise_separable_convolutional_neural_network_sepcnn)**: Efficient CNN factorizing convolutions.
- **[derived label](/wiki/derived_label)**: Label computed from other data.
- **[device](/wiki/device)**: CPU, GPU, or TPU running operations.
- **[diffusion model](/wiki/diffusion_model)**: Generative model reversing a noising process.
- **[dimension reduction](/wiki/dimension_reduction)**: Mapping data to fewer dimensions preserving structure.
- **[dimensions](/wiki/dimensions)**: Independent axes of a tensor.
- **[discrete feature](/wiki/discrete_feature)**: Feature with countable possible values.
- **[discriminative model](/wiki/discriminative_model)**: Model learning conditional probability of labels.
- **[discriminator](/wiki/discriminator)**: GAN network distinguishing real from fake.
- **[disparate impact](/wiki/disparate_impact)**: Decisions disproportionately harming a protected group.
- **[disparate treatment](/wiki/disparate_treatment)**: Direct use of sensitive attributes in decisions.
- **[divisive clustering](/wiki/divisive_clustering)**: Top-down hierarchical clustering.
- **[downsampling](/wiki/downsampling)**: Reducing data rate or majority-class count.
- **[DQN](/wiki/dqn)**: Abbreviation for Deep Q-Network.
- **[dropout regularization](/wiki/dropout_regularization)**: Randomly zeroing neurons during training.
- **[dynamic](/wiki/dynamic)**: Continuously updated rather than static.
- **[dynamic model](/wiki/dynamic_model)**: Model retrained continuously as data arrives.

## E

- **[eager execution](/wiki/eager_execution)**: TensorFlow mode running ops immediately.[1]
- **[early stopping](/wiki/early_stopping)**: Stopping when validation loss stops improving.
- **[earth mover's distance (EMD)](/wiki/earth_mover_s_distance_emd)**: Distance based on minimum transport cost between distributions.
- **[embedding](/wiki/embedding)**: Learned dense vector for a discrete object.
- **[embedding layer](/wiki/embedding_layer)**: Maps discrete tokens to dense vectors.
- **[embedding space](/wiki/embedding_space)**: Vector space where geometry reflects similarity.
- **[embedding vector](/wiki/embedding_vector)**: Dense vector for one item.
- **[empirical risk minimization (ERM)](/wiki/empirical_risk_minimization_erm)**: Choosing a model that minimizes average training loss.
- **[encoder](/wiki/encoder)**: Maps inputs to a fixed-size representation.
- **[ensemble](/wiki/ensemble)**: Combination of several models' predictions.
- **[entropy](/wiki/entropy)**: Uncertainty measure of a distribution.
- **[environment](/wiki/environment)**: System an RL agent acts within.
- **[episode](/wiki/episode)**: One full RL run from start to terminal.
- **[epoch](/wiki/epoch)**: One full pass through the training data.
- **[epsilon greedy policy](/wiki/epsilon_greedy_policy)**: Greedy mostly but explores with probability epsilon.
- **[equality of opportunity](/wiki/equality_of_opportunity)**: Equal true positive rates across groups.
- **[equalized odds](/wiki/equalized_odds)**: Equal true and false positive rates across groups.
- **[Estimator](/wiki/estimator)**: TensorFlow high-level model API.[1]
- **[example](/wiki/example)**: One data instance fed to a model.
- **[experience replay](/wiki/experience_replay)**: Sampling stored RL transitions for training.
- **[experimenter's bias](/wiki/experimenter_s_bias)**: Researcher expectations influencing data or analysis.
- **[exploding gradient problem](/wiki/exploding_gradient_problem)**: Gradients growing uncontrollably during backprop.

## F

- **[fairness constraint](/wiki/fairness_constraint)**: Formal restriction enforcing a fairness criterion.
- **[fairness metric](/wiki/fairness_metric)**: Measure of behavior across protected groups.
- **[false negative (FN)](/wiki/false_negative_fn)**: Negative prediction with positive truth.
- **[false negative rate](/wiki/false_negative_rate)**: Fraction of actual positives missed.
- **[false positive (FP)](/wiki/false_positive_fp)**: Positive prediction with negative truth.
- **[false positive rate (FPR)](/wiki/false_positive_rate_fpr)**: Fraction of negatives misclassified positive.
- **[feature](/wiki/feature)**: Measurable input variable used by a model.
- **[feature cross](/wiki/feature_cross)**: Synthetic feature from combining features.
- **[feature engineering](/wiki/feature_engineering)**: Designing features to improve performance.
- **[feature extraction](/wiki/feature_extraction)**: Computing informative variables from raw data.
- **[feature importances](/wiki/feature_importances)**: Scores of each feature's contribution.
- **[feature set](/wiki/feature_set)**: Features used by a particular model.
- **[feature spec](/wiki/feature_spec)**: Description of feature names and types.
- **[feature vector](/wiki/feature_vector)**: Ordered list of feature values for one example.
- **[federated learning](/wiki/federated_learning)**: Training across devices that keep data local.
- **[feedback loop](/wiki/feedback_loop)**: When predictions influence future training data.
- **[feedforward neural network (FFN)](/wiki/feedforward_neural_network_ffn)**: Network with connections only forward.
- **[few-shot learning](/wiki/few-shot_learning)**: Task performance from a few examples.
- **[fine tuning](/wiki/fine_tuning)**: Continuing training of a pretrained model.
- **[foundation model](/wiki/foundation_model)**: Large pretrained model adaptable to many tasks.
- **[forget gate](/wiki/forget_gate)**: LSTM gate discarding cell-state information.
- **[full softmax](/wiki/full_softmax)**: Softmax over the entire vocabulary.
- **[fully connected layer](/wiki/fully_connected_layer)**: Synonym for dense layer.

## G

- **[GAN](/wiki/gan)**: Abbreviation for generative adversarial network.
- **[generalization](/wiki/generalization)**: Performance on unseen data.
- **[generalization curve](/wiki/generalization_curve)**: Plot of training versus validation loss.
- **[generalized linear model](/wiki/generalized_linear_model)**: Linear model with link function.
- **[generative adversarial network (GAN)](/wiki/generative_adversarial_network_gan)**: Generator trained against a discriminator.
- **[generative AI](/wiki/generative_ai)**: AI producing new content like text or images.
- **[generative model](/wiki/generative_model)**: Model of the joint distribution able to synthesize samples.
- **[generator](/wiki/generator)**: GAN network producing synthetic data.
- **[GPT (Generative Pre-trained Transformer)](/wiki/gpt_generative_pre-trained_transformer)**: Autoregressive transformer family from OpenAI.
- **[gini impurity](/wiki/gini_impurity)**: Misclassification probability at a tree node.
- **[gradient](/wiki/gradient)**: Vector of partial derivatives indicating steepest ascent.
- **[gradient boosting](/wiki/gradient_boosting)**: Boosting fitting each model to loss gradient.
- **[gradient boosted (decision) trees (GBT)](/wiki/gradient_boosted_decision_trees_gbt)**: Gradient boosting using shallow trees.
- **[gradient clipping](/wiki/gradient_clipping)**: Capping gradient magnitudes during training.
- **[gradient descent](/wiki/gradient_descent)**: Optimization stepping opposite the loss gradient.
- **[graph](/wiki/graph)**: TensorFlow computation representation of nodes and edges.[1]
- **[graph execution](/wiki/graph_execution)**: Running a precompiled TensorFlow graph.[1]
- **[greedy policy](/wiki/greedy_policy)**: Always picking the highest-valued action.
- **[ground truth](/wiki/ground_truth)**: Correct label for an example.
- **[group attribution bias](/wiki/group_attribution_bias)**: Assuming individuals match their group's traits.

## H

- **[hallucination](/wiki/hallucination)**: Fluent but factually incorrect generative output.
- **[hashing](/wiki/hashing)**: Mapping inputs to fixed-size integers.
- **[heuristic](/wiki/heuristic)**: Rule of thumb without optimality guarantee.
- **[hidden layer](/wiki/hidden_layer)**: Layer between input and output.
- **[hierarchical clustering](/wiki/hierarchical_clustering)**: Clustering forming a tree of nested groups.
- **[hinge loss](/wiki/hinge_loss)**: SVM margin-based loss.
- **[holdout data](/wiki/holdout_data)**: Examples reserved for final evaluation.
- **[hyperparameter](/wiki/hyperparameter)**: Training setting chosen before optimization.
- **[hyperplane](/wiki/hyperplane)**: Flat decision surface in feature space.

## I

- **[i.i.d.](/wiki/i_i_d)**: Independent and identically distributed.
- **[image recognition](/wiki/image_recognition)**: Identifying content within images.
- **[imbalanced dataset](/wiki/imbalanced_dataset)**: Dataset with very unequal class counts.
- **[implicit bias](/wiki/implicit_bias)**: Unconscious attitudes affecting data work.
- **[incompatibility of fairness metrics](/wiki/incompatibility_of_fairness_metrics)**: Theorem that some fairness criteria cannot coexist.
- **[independently and identically distributed (i.i.d)](/wiki/independently_and_identically_distributed_i_i_d)**: Examples drawn independently from one distribution.
- **[individual fairness](/wiki/individual_fairness)**: Similar individuals get similar predictions.
- **[inference](/wiki/inference)**: Using a trained model to predict.
- **[inference path](/wiki/inference_path)**: Tree conditions an example follows to a leaf.
- **[information gain](/wiki/information_gain)**: Entropy reduction from splitting on a feature.
- **[in-group bias](/wiki/in-group_bias)**: Favoring members of one's own group.
- **[input layer](/wiki/input_layer)**: First network layer receiving features.
- **[in-set condition](/wiki/in-set_condition)**: Tree test on set membership.
- **[instance](/wiki/instance)**: Synonym for example.
- **[interpretability](/wiki/interpretability)**: How understandable model reasoning is.
- **[inter-rater agreement](/wiki/inter-rater_agreement)**: How often labelers agree on labels.
- **[intersection over union (IoU)](/wiki/intersection_over_union_iou)**: Overlap-to-union ratio between regions.
- **[IoU](/wiki/iou)**: Abbreviation for intersection over union.
- **[item matrix](/wiki/item_matrix)**: Factor matrix for items in matrix factorization.
- **[items](/wiki/items)**: Objects suggested by a recommender.
- **[iteration](/wiki/iteration)**: One parameter update from one batch.

## K

- **[Keras](/wiki/keras)**: High-level neural network Python API.[1]
- **[keypoints](/wiki/keypoints)**: Distinctive image points used for pose or matching.
- **[Kernel Support Vector Machines (KSVMs)](/wiki/kernel_support_vector_machines_ksvms)**: SVMs using kernels in implicit feature spaces.
- **[k-means](/wiki/k-means)**: Clustering by alternating assignment and centroid update.
- **[k-median](/wiki/k-median)**: Clustering using medians instead of means.

## L

- **[L0 regularization](/wiki/l0_regularization)**: Penalty on nonzero weight count.
- **[L1 loss](/wiki/l1_loss)**: Sum of absolute prediction errors.
- **[L1 regularization](/wiki/l1_regularization)**: Penalty on absolute weights producing sparsity.
- **[L2 loss](/wiki/l2_loss)**: Sum of squared prediction errors.
- **[L2 regularization](/wiki/l2_regularization)**: Penalty on squared weights encouraging small values.
- **[label](/wiki/label)**: Correct output for an example.
- **[labeled example](/wiki/labeled_example)**: Example with features and a known label.
- **[LaMDA (Language Model for Dialogue Applications)](/wiki/lamda_language_model_for_dialogue_applications)**: Google conversational language model.[1]
- **[lambda](/wiki/lambda)**: Hyperparameter controlling regularization strength.
- **[landmarks](/wiki/landmarks)**: Named coordinate points annotated on images.
- **[language model](/wiki/language_model)**: Assigns probability to token sequences.
- **[large language model](/wiki/large_language_model)**: Language model with billions of parameters.
- **[layer](/wiki/layer)**: Group of neurons or ops at a network stage.
- **[Layers API (tf.layers)](/wiki/layers_api_tf_layers)**: TensorFlow reusable building blocks.[1]
- **[leaf](/wiki/leaf)**: Terminal tree node producing a prediction.
- **[learning rate](/wiki/learning_rate)**: Step size for gradient updates.
- **[least squares regression](/wiki/least_squares_regression)**: Regression minimizing squared residuals.
- **[linear model](/wiki/linear_model)**: Predictions as a weighted sum plus bias.
- **[linear](/wiki/linear)**: Expressible as a weighted sum.
- **[linear regression](/wiki/linear_regression)**: Linear function predicting a continuous value.
- **[logistic regression](/wiki/logistic_regression)**: Classifier using a logistic function on a linear sum.
- **[logits](/wiki/logits)**: Raw scores before softmax or sigmoid.
- **[Log Loss](/wiki/log_loss)**: Logistic regression loss, equivalent to binary cross-entropy.
- **[log-odds](/wiki/log-odds)**: Log ratio of two outcome probabilities.
- **[Long Short-Term Memory (LSTM)](/wiki/long_short-term_memory_lstm)**: Gated recurrent unit for long-range dependencies.
- **[LoRA](/wiki/lora)**: Low-rank adapter for parameter-efficient fine-tuning.
- **[loss](/wiki/loss)**: Quantity measuring prediction error.
- **[loss curve](/wiki/loss_curve)**: Plot of loss over training steps.
- **[loss function](/wiki/loss_function)**: Formula aggregating prediction errors into a loss.
- **[loss surface](/wiki/loss_surface)**: Landscape of loss across parameter values.
- **[LSTM](/wiki/lstm)**: Abbreviation for Long Short-Term Memory.

## M

- **[machine learning](/wiki/machine_learning)**: Building systems that learn patterns from data.[3][5]
- **[majority class](/wiki/majority_class)**: Most frequent class in an imbalanced dataset.
- **[Markov decision process (MDP)](/wiki/markov_decision_process_mdp)**: RL model of states, actions, transitions, rewards.
- **[Markov property](/wiki/markov_property)**: Future depends only on current state.
- **[masked language model](/wiki/masked_language_model)**: Predicts hidden tokens given context.
- **[matplotlib](/wiki/matplotlib)**: Python plotting library.
- **[matrix factorization](/wiki/matrix_factorization)**: Decomposing a matrix into smaller factor matrices.
- **[Mean Absolute Error (MAE)](/wiki/mean_absolute_error_mae)**: Average of absolute prediction errors.
- **[Mean Squared Error (MSE)](/wiki/mean_squared_error_mse)**: Average of squared prediction errors.
- **[metric](/wiki/metric)**: Scalar summarizing model performance.
- **[meta-learning](/wiki/meta-learning)**: Learning algorithms that learn new tasks quickly.
- **[Metrics API (tf.metrics)](/wiki/metrics_api_tf_metrics)**: TensorFlow standard metrics module.[1]
- **[mini-batch](/wiki/mini-batch)**: Small group of examples for one update.
- **[mini-batch stochastic gradient descent](/wiki/mini-batch_stochastic_gradient_descent)**: Gradient descent using mini-batches.
- **[minimax loss](/wiki/minimax_loss)**: Two-player adversarial GAN objective.
- **[minority class](/wiki/minority_class)**: Less frequent class in imbalanced data.
- **[mixture of experts](/wiki/mixture_of_experts)**: Architecture routing inputs to specialized sub-networks.
- **[ML](/wiki/ml)**: Abbreviation for machine learning.
- **[MNIST](/wiki/mnist)**: Classic handwritten digit benchmark dataset.
- **[modality](/wiki/modality)**: A data type such as text, image, or audio.
- **[model](/wiki/model)**: Object mapping inputs to predictions via learned parameters.
- **[model capacity](/wiki/model_capacity)**: Range of functions a model can represent.
- **[model parallelism](/wiki/model_parallelism)**: Splitting a model across devices.
- **[model training](/wiki/model_training)**: Fitting model parameters to data.
- **[Momentum](/wiki/momentum)**: Optimizer using moving average of gradients.
- **[multi-class classification](/wiki/multi-class_classification)**: Classification with more than two classes.
- **[multi-class logistic regression](/wiki/multi-class_logistic_regression)**: Logistic regression with softmax across classes.
- **[multi-head self-attention](/wiki/multi-head_self-attention)**: Parallel self-attention operations concatenated.
- **[multimodal model](/wiki/multimodal_model)**: Model handling multiple data modalities.
- **[multinomial classification](/wiki/multinomial_classification)**: Synonym for multi-class classification.
- **[multinomial regression](/wiki/multinomial_regression)**: Regression for probabilities across categories.

## N

- **[NaN trap](/wiki/nan_trap)**: Activations becoming NaN and spreading.
- **[natural language understanding](/wiki/natural_language_understanding)**: NLP subfield extracting meaning from text.
- **[negative class](/wiki/negative_class)**: Class without the tested condition.
- **[neural network](/wiki/neural_network)**: Connected layers of weighted units.
- **[neuron](/wiki/neuron)**: Single unit computing weighted sum plus activation.
- **[N-gram](/wiki/n-gram)**: Sequence of n adjacent tokens.
- **[NLU](/wiki/nlu)**: Abbreviation for natural language understanding.
- **[node (neural network)](/wiki/node_neural_network)**: Synonym for a neuron.
- **[node (TensorFlow graph)](/wiki/node_tensorflow_graph)**: Single op in a TensorFlow graph.[1]
- **[node (decision tree)](/wiki/node_decision_tree)**: Point in a tree with a test or leaf.
- **[noise](/wiki/noise)**: Random variation not reflecting the underlying signal.
- **[non-binary condition](/wiki/non-binary_condition)**: Tree test with more than two outcomes.
- **[nonlinear](/wiki/nonlinear)**: Not expressible as a weighted sum.
- **[non-response bias](/wiki/non-response_bias)**: Bias from groups responding to surveys less often.
- **[nonstationarity](/wiki/nonstationarity)**: Data statistics changing over time.
- **[normalization](/wiki/normalization)**: Rescaling values to a standard range.
- **[novelty detection](/wiki/novelty_detection)**: Identifying observations unlike training cases.
- **[numerical data](/wiki/numerical_data)**: Features represented as numbers.
- **[NumPy](/wiki/numpy)**: Python array computing library.

## O

- **[objective](/wiki/objective)**: Quantity an algorithm optimizes.
- **[objective function](/wiki/objective_function)**: Function optimized during training.
- **[oblique condition](/wiki/oblique_condition)**: Tree split on a linear feature combination.
- **[offline](/wiki/offline)**: Performed in batch, not in real time.
- **[offline inference](/wiki/offline_inference)**: Predicting in advance and storing results.
- **[one-hot encoding](/wiki/one-hot_encoding)**: Category as a vector with a single 1.
- **[one-shot learning](/wiki/one-shot_learning)**: Learning from a single labeled example.
- **[one-vs.-all](/wiki/one-vs_-all)**: Multi-class via one binary classifier per class.
- **[online](/wiki/online)**: Processing data as it arrives.
- **[online inference](/wiki/online_inference)**: Predicting in real time on requests.
- **[operation (op)](/wiki/operation_op)**: Named computation in a TensorFlow graph.[1]
- **[out-of-bag evaluation (OOB evaluation)](/wiki/out-of-bag_evaluation_oob_evaluation)**: Ensemble accuracy from unsampled examples.
- **[optimizer](/wiki/optimizer)**: Algorithm updating parameters from gradients.
- **[out-group homogeneity bias](/wiki/out-group_homogeneity_bias)**: Seeing outside groups as more uniform.
- **[outlier detection](/wiki/outlier_detection)**: Identifying examples far from typical data.
- **[outliers](/wiki/outliers)**: Examples with very atypical values.
- **[output layer](/wiki/output_layer)**: Final network layer producing predictions.
- **[overfitting](/wiki/overfitting)**: Fitting training data but failing to generalize.
- **[oversampling](/wiki/oversampling)**: Adding minority examples to balance classes.

## P

- **[pandas](/wiki/pandas)**: Python tabular data analysis library.
- **[parameter](/wiki/parameter)**: Value learned during training.
- **[Parameter Server (PS)](/wiki/parameter_server_ps)**: Distributed training architecture with shared servers.
- **[parameter update](/wiki/parameter_update)**: One optimizer step changing parameters.
- **[partial derivative](/wiki/partial_derivative)**: Derivative with respect to one variable.
- **[participation bias](/wiki/participation_bias)**: Bias when individuals choose to be included.
- **[partitioning strategy](/wiki/partitioning_strategy)**: Method splitting variables across machines.
- **[perceptron](/wiki/perceptron)**: Single-layer binary classifier.
- **[performance](/wiki/performance)**: Model quality or running speed.
- **[permutation variable importances](/wiki/permutation_variable_importances)**: Importance from shuffling a feature.
- **[perplexity](/wiki/perplexity)**: Language model surprise metric on held-out text.
- **[pipeline](/wiki/pipeline)**: Sequence of processing and modeling steps.
- **[pipelining](/wiki/pipelining)**: Overlapping execution stages for throughput.
- **[policy](/wiki/policy)**: RL mapping from states to actions.
- **[pooling](/wiki/pooling)**: CNN downsampling over local regions.
- **[positive class](/wiki/positive_class)**: Class with the tested condition present.
- **[post-processing](/wiki/post-processing)**: Steps applied to model output.
- **[PR AUC (area under the PR curve)](/wiki/pr_auc_area_under_the_pr_curve)**: Summary of precision-recall curve.
- **[precision](/wiki/precision)**: Fraction of predicted positives that are correct.
- **[precision-recall curve](/wiki/precision-recall_curve)**: Plot of precision versus recall tradeoff.
- **[prediction](/wiki/prediction)**: Output a model produces for an input.
- **[prediction bias](/wiki/prediction_bias)**: Average prediction minus average label.
- **[predictive parity](/wiki/predictive_parity)**: Equal precision across groups.
- **[predictive rate parity](/wiki/predictive_rate_parity)**: Equal positive predictive values across groups.
- **[preprocessing](/wiki/preprocessing)**: Transforming raw data for training.
- **[pre-trained model](/wiki/pre-trained_model)**: Model previously trained on a large dataset.
- **[prior belief](/wiki/prior_belief)**: Initial assumption about parameters.
- **[probabilistic regression model](/wiki/probabilistic_regression_model)**: Regression outputting a target distribution.
- **[prompt engineering](/wiki/prompt_engineering)**: Designing inputs to guide language model output.
- **[proxy (sensitive attributes)](/wiki/proxy_sensitive_attributes)**: Feature correlating with a sensitive attribute.
- **[proxy labels](/wiki/proxy_labels)**: Substitute targets when true labels are unavailable.

## Q

- **[Q-function](/wiki/q-function)**: Expected return from action in a state under policy.
- **[Q-learning](/wiki/q-learning)**: Value-based RL learning the Q-function.
- **[quantile](/wiki/quantile)**: Value below which a fraction of observations fall.
- **[quantile bucketing](/wiki/quantile_bucketing)**: Binning with equal counts per bucket.
- **[quantization](/wiki/quantization)**: Reducing numerical precision of weights or activations.
- **[queue](/wiki/queue)**: Structure holding inputs awaiting processing.

## R

- **[random forest](/wiki/random_forest)**: Ensemble of trees with random feature subsets.
- **[random policy](/wiki/random_policy)**: Policy choosing actions uniformly at random.
- **[ranking](/wiki/ranking)**: Ordering items by relevance.
- **[rank (ordinality)](/wiki/rank_ordinality)**: Position of an item in a sorted list.
- **[rank (Tensor)](/wiki/rank_tensor)**: Number of tensor dimensions.
- **[rater](/wiki/rater)**: Person labeling examples for training data.
- **[recall](/wiki/recall)**: Fraction of actual positives correctly identified.
- **[recommendation system](/wiki/recommender_system)**: System suggesting items of interest to users.
- **[Rectified Linear Unit (ReLU)](/wiki/rectified_linear_unit_relu)**: Activation returning input if positive, else zero.
- **[recurrent neural network](/wiki/recurrent_neural_network)**: Sequence network with hidden state.
- **[regression model](/wiki/regression_model)**: Model predicting continuous values.
- **[regularization](/wiki/regularization)**: Penalizing complexity to reduce overfitting.
- **[regularization rate](/wiki/regularization_rate)**: Hyperparameter for regularization strength.
- **[reinforcement learning (RL)](/wiki/reinforcement_learning_rl)**: Learning by interaction to maximize reward.
- **[reinforcement learning from human feedback](/wiki/reinforcement_learning_from_human_feedback)**: Fine-tuning models from human preference data.
- **[ReLU](/wiki/relu)**: Abbreviation for Rectified Linear Unit.
- **[replay buffer](/wiki/replay_buffer)**: Memory of past RL transitions.
- **[reporting bias](/wiki/reporting_bias)**: Bias from unrepresentative event reporting.
- **[representation](/wiki/representation)**: How examples are encoded as numbers.
- **[re-ranking](/wiki/re-ranking)**: Second pass refining candidate order.
- **[retrieval augmented generation](/wiki/retrieval_augmented_generation)**: Combining language models with document retrieval.
- **[return](/wiki/return)**: Cumulative discounted RL reward.
- **[reward](/wiki/reward)**: RL feedback signal after an action.
- **[ridge regularization](/wiki/ridge_regularization)**: Synonym for L2 regularization.
- **[RLHF](/wiki/rlhf)**: Abbreviation for reinforcement learning from human feedback.
- **[RNN](/wiki/rnn)**: Abbreviation for recurrent neural network.
- **[ROC (receiver operating characteristic) Curve](/wiki/roc_receiver_operating_characteristic_curve)**: Plot of TPR versus FPR across thresholds.
- **[root](/wiki/root)**: Top decision tree node.
- **[root directory](/wiki/root_directory)**: Base folder for files like checkpoints.
- **[Root Mean Squared Error (RMSE)](/wiki/root_mean_squared_error_rmse)**: Square root of mean squared error.
- **[rotational invariance](/wiki/rotational_invariance)**: Predictions unchanged under input rotation.

## S

- **[sampling bias](/wiki/sampling_bias)**: Sampled data unrepresentative of population.
- **[sampling with replacement](/wiki/sampling_with_replacement)**: Sampling where examples can repeat.
- **[SavedModel](/wiki/savedmodel)**: TensorFlow serialization format for complete models.[1]
- **[Saver](/wiki/saver)**: TensorFlow class for variable checkpoints.[1]
- **[scalar](/wiki/scalar)**: Tensor of rank zero, one number.
- **[scaling](/wiki/scaling)**: Adjusting feature value ranges.
- **[scikit-learn](/wiki/scikit-learn)**: Python classical machine learning library.[6]
- **[scoring](/wiki/scoring)**: Producing numeric scores for candidates.
- **[selection bias](/wiki/selection_bias)**: Selection correlating with outcomes.
- **[self-attention (also called self-attention layer)](/wiki/self-attention_also_called_self-attention_layer)**: Attention within one sequence.
- **[self-supervised learning](/wiki/self-supervised_learning)**: Generating labels from the input itself.
- **[self-training](/wiki/self-training)**: Using confident predictions as new labels.
- **[semi-supervised learning](/wiki/semi-supervised_learning)**: Mixing labeled and unlabeled data.
- **[sensitive attribute](/wiki/sensitive_attribute)**: Feature treated specially for fairness.
- **[sentiment analysis](/wiki/sentiment_analysis)**: Classifying text by emotional attitude.
- **[sequence model](/wiki/sequence_model)**: Model for ordered sequences.
- **[sequence-to-sequence task](/wiki/sequence-to-sequence_task)**: Task with sequence input and output.
- **[serving](/wiki/serving)**: Deploying a model to answer requests.
- **[shape (Tensor)](/wiki/shape_tensor)**: Sizes along each tensor dimension.
- **[shrinkage](/wiki/shrinkage)**: Reducing boosting step contributions.
- **[sigmoid function](/wiki/sigmoid_function)**: Activation mapping reals to between zero and one.
- **[similarity measure](/wiki/similarity_measure)**: Function quantifying example likeness.
- **[size invariance](/wiki/size_invariance)**: Predictions unchanged under input rescaling.
- **[sketching](/wiki/sketching)**: Approximate summaries of large datasets.
- **[softmax](/wiki/softmax)**: Turning a vector into a probability distribution.
- **[sparse feature](/wiki/sparse_feature)**: Feature with mostly zero values.
- **[sparse representation](/wiki/sparse_representation)**: Representation with mostly zero entries.
- **[sparse vector](/wiki/sparse_vector)**: Vector storing only nonzero entries.
- **[sparsity](/wiki/sparsity)**: Proportion of zero entries.
- **[spatial pooling](/wiki/spatial_pooling)**: Pooling across spatial feature map dimensions.
- **[split](/wiki/split)**: Subset of data such as train or test.
- **[splitter](/wiki/splitter)**: Tree component selecting best node conditions.
- **[squared hinge loss](/wiki/squared_hinge_loss)**: Hinge loss with squared margin violation.
- **[squared loss](/wiki/squared_loss)**: Squared prediction-target difference.
- **[stability](/wiki/stability)**: Consistency under small changes.
- **[staged training](/wiki/staged_training)**: Training in increasing-complexity phases.
- **[state](/wiki/state)**: Description of RL environment at a moment.
- **[state-action value function](/wiki/state-action_value_function)**: Synonym for Q-function.
- **[static](/wiki/static)**: Trained once and not updated.
- **[static inference](/wiki/static_inference)**: Synonym for offline inference.
- **[stationarity](/wiki/stationarity)**: Data statistics unchanged over time.
- **[step](/wiki/step)**: One training iteration on one batch.
- **[step size](/wiki/step_size)**: Synonym for learning rate.
- **[stochastic gradient descent (SGD)](/wiki/stochastic_gradient_descent_sgd)**: Gradient descent on random examples or batches.
- **[stride](/wiki/stride)**: Convolutional filter step size.
- **[structural risk minimization (SRM)](/wiki/structural_risk_minimization_srm)**: Balancing training error and complexity.
- **[subsampling](/wiki/subsampling)**: Selecting a smaller data subset.
- **[summary](/wiki/summary)**: TensorFlow value recorded for TensorBoard.[1]
- **[supervised machine learning](/wiki/supervised_machine_learning)**: Learning from labeled input-output pairs.
- **[synthetic feature](/wiki/synthetic_feature)**: Feature derived from existing ones.

## T

- **[tabular Q-learning](/wiki/tabular_q-learning)**: Q-learning storing values in a table.
- **[target](/wiki/target)**: Synonym for label.
- **[target network](/wiki/target_network)**: Delayed Q-network copy for stability.
- **[temporal data](/wiki/temporal_data)**: Data indexed by time.
- **[Tensor](/wiki/tensor)**: Multidimensional array, the deep learning data structure.
- **TensorBoard**: TensorFlow visualization toolkit for training runs.[1]
- **[TensorFlow](/wiki/tensorflow)**: Open-source machine learning platform from Google.[1]
- **[TensorFlow Playground](/wiki/tensorflow_playground)**: Interactive browser tool for small networks.[1]
- **[TensorFlow Serving](/wiki/tensorflow_serving)**: Production serving system for TensorFlow models.[1]
- **[Tensor Processing Unit (TPU)](/wiki/tensor_processing_unit_tpu)**: Google's neural network hardware accelerator.[1]
- **[Tensor rank](/wiki/tensor_rank)**: Number of tensor dimensions.
- **[Tensor shape](/wiki/tensor_shape)**: Sizes along each tensor dimension.
- **[Tensor size](/wiki/tensor_size)**: Total number of tensor elements.
- **[termination condition](/wiki/termination_condition)**: Criterion ending an iterative process.
- **[test](/wiki/test)**: Tree condition or held-out evaluation set.
- **[test loss](/wiki/test_loss)**: Loss on the test set.
- **[test set](/wiki/test_set)**: Held-out data for final evaluation.
- **[tf.Example](/wiki/tf_example)**: TensorFlow protocol buffer for one example.[1]
- **[tf.keras](/wiki/tf_keras)**: TensorFlow bundled Keras API.[1]
- **[threshold (for decision trees)](/wiki/threshold_for_decision_trees)**: Cutoff value in a tree split.
- **[time series analysis](/wiki/time_series_analysis)**: Studying data sequenced in time.
- **[timestep](/wiki/timestep)**: One position in a sequence or RL tick.
- **[token](/wiki/token)**: Discrete unit such as a word or subword.
- **[tower](/wiki/tower)**: Model replica in distributed training.
- **[TPU](/wiki/tpu)**: Abbreviation for Tensor Processing Unit.[1]
- **[TPU chip](/wiki/tpu_chip)**: Single integrated circuit performing TPU computation.[1]
- **[TPU device](/wiki/tpu_device)**: Board with one or more TPU chips.[1]
- **[TPU master](/wiki/tpu_master)**: Coordinator dispatching to TPU workers.[1]
- **[TPU node](/wiki/tpu_node)**: Cloud TPU resource exposed to a VM.[1]
- **[TPU Pod](/wiki/tpu_pod)**: Cluster of many TPU chips on fast networking.[1]
- **[TPU resource](/wiki/tpu_resource)**: Allocation of TPU compute in Google Cloud.[1]
- **[TPU slice](/wiki/tpu_slice)**: Subset of TPU chips from a Pod.[1]
- **[TPU type](/wiki/tpu_type)**: TPU configuration label like v3 or v4.[1]
- **[TPU worker](/wiki/tpu_worker)**: Process running TPU computation.[1]
- **[training](/wiki/training)**: Fitting model parameters to data.
- **[training loss](/wiki/training_loss)**: Loss on training data.
- **[training-serving skew](/wiki/training-serving_skew)**: Gap between training and serving data processing.
- **[training set](/wiki/training_set)**: Data used to fit model parameters.
- **[trajectory](/wiki/trajectory)**: Sequence of RL states, actions, rewards.
- **[transfer learning](/wiki/transfer_learning)**: Reusing knowledge across related tasks.
- **[Transformer](/wiki/transformer)**: Self-attention-based architecture dominant in language modeling.
- **[translational invariance](/wiki/translational_invariance)**: Predictions unchanged under input shifts.
- **[trigram](/wiki/trigram)**: Sequence of three adjacent tokens.
- **[true negative (TN)](/wiki/true_negative_tn)**: Correct negative prediction.
- **[true positive (TP)](/wiki/true_positive_tp)**: Correct positive prediction.
- **[true positive rate (TPR)](/wiki/true_positive_rate_tpr)**: Synonym for recall.

## U

- **[unawareness (to a sensitive attribute)](/wiki/unawareness_to_a_sensitive_attribute)**: Fairness approach hiding sensitive attributes.
- **[underfitting](/wiki/underfitting)**: Model too simple to capture data patterns.
- **[undersampling](/wiki/undersampling)**: Reducing majority class examples.
- **[unidirectional](/wiki/unidirectional)**: Sequence processing in one direction only.
- **[unidirectional language model](/wiki/unidirectional_language_model)**: Language model conditioning only on past tokens.
- **[unlabeled example](/wiki/unlabeled_example)**: Example without a known label.
- **[unsupervised machine learning](/wiki/unsupervised_machine_learning)**: Learning structure without labels.
- **[uplift modeling](/wiki/uplift_modeling)**: Causal modeling of an action's incremental effect.
- **[upweighting](/wiki/upweighting)**: Increasing example influence in loss.
- **[user matrix](/wiki/user_matrix)**: Factor matrix for users in matrix factorization.

## V

- **[validation](/wiki/validation)**: Evaluating on a held-out set for tuning.
- **[validation loss](/wiki/validation_loss)**: Loss on the validation set.
- **[validation set](/wiki/validation_set)**: Data for model selection during development.
- **[vanishing gradient problem](/wiki/vanishing_gradient_problem)**: Gradients shrinking to zero in deep networks.
- **[variable importances](/wiki/variable_importances)**: Scores estimating feature influence.
- **[vector database](/wiki/vector_database)**: Storage for similarity search over embeddings.

## W

- **[Wasserstein loss](/wiki/wasserstein_loss)**: GAN loss based on earth mover's distance.
- **[weight](/wiki/weight)**: Learned parameter scaling an input or activation.
- **[Weighted Alternating Least Squares (WALS)](/wiki/weighted_alternating_least_squares_wals)**: Matrix factorization by alternating least squares.
- **[weighted sum](/wiki/weighted_sum)**: Linear combination of inputs and weights.
- **[wide model](/wiki/wide_model)**: Few-layer model with many features and feature crosses.
- **[width](/wiki/width)**: Number of units in a layer.
- **[wisdom of the crowd](/wiki/wisdom_of_the_crowd)**: Aggregated independent estimates often beat single ones.
- **[word embedding](/wiki/word_embedding)**: Dense vector representation of a word.

## Z

- **[zero-shot learning](/wiki/zero-shot_learning)**: Performing a task with no labeled examples.
- **[Z-score normalization](/wiki/z-score_normalization)**: Rescaling to zero mean and unit variance.

## References

[1] Google Developers: Machine Learning Glossary, https://developers.google.com/machine-learning/glossary
[2] Wikipedia: Glossary of artificial intelligence, https://en.wikipedia.org/wiki/Glossary_of_artificial_intelligence
[3] Wikipedia: Machine learning, https://en.wikipedia.org/wiki/Machine_learning
[4] Wikipedia: Deep learning, https://en.wikipedia.org/wiki/Deep_learning
[5] IBM: What is machine learning?, https://www.ibm.com/topics/machine-learning
[6] scikit-learn glossary, https://scikit-learn.org/stable/glossary.html