HomeWikiMachine learning terms/AllMachine learning terms/All5 min readUpdated Mar 19, 2026EditHistorySee also: Machine learning terms A/B testing accuracy action activation function active learning AdaGrad agent agglomerative clustering anomaly detection AR area under the PR curve area under the ROC curve artificial general intelligence artificial intelligence attention attribute attribute sampling AUC (Area under the ROC curve) augmented reality automation bias average precision axis-aligned condition backpropagation bagging bag of words baseline batch batch normalization batch size Bayesian neural network Bayesian optimization Bellman equation BERT (Bidirectional Encoder Representations from Transformers) bias (ethics/fairness) bias (math) or bias term bigram bidirectional bidirectional language model binary classification binary condition binning BLEU (Bilingual Evaluation Understudy) boosting bounding box broadcasting bucketing calibration layer candidate generation candidate sampling categorical data causal language model centroid centroid-based clustering checkpoint class classification model classification threshold class-imbalanced dataset clipping Cloud TPU clustering co-adaptation collaborative filtering condition confirmation bias confusion matrix continuous feature convenience sampling convergence convex function convex optimization convex set convolution convolutional filter convolutional layer convolutional neural network convolutional operation cost co-training counterfactual fairness coverage bias crash blossom critic cross-entropy cross-validation data analysis data augmentation DataFrame data parallelism data set or dataset Dataset API (tf.data) decision boundary decision forest decision threshold decision tree deep model decoder deep neural network Deep Q-Network (DQN) demographic parity denoising dense feature dense layer depth depthwise separable convolutional neural network (sepCNN) derived label device dimension reduction dimensions discrete feature discriminative model discriminator disparate impact disparate treatment divisive clustering downsampling DQN dropout regularization dynamic dynamic model eager execution early stopping earth mover's distance (EMD) embedding layer embedding space embedding vector empirical risk minimization (ERM) encoder ensemble entropy environment episode epoch epsilon greedy policy equality of opportunity equalized odds Estimator example experience replay experimenter's bias exploding gradient problem fairness constraint fairness metric false negative (FN) false negative rate false positive (FP) false positive rate (FPR) feature feature cross feature engineering feature extraction feature importances feature set feature spec feature vector federated learning feedback loop feedforward neural network (FFN) few-shot learning fine tuning forget gate full softmax fully connected layer GAN generalization generalization curve generalized linear model generative adversarial network (GAN) generative model generator GPT (Generative Pre-trained Transformer) gini impurity gradient gradient boosting gradient boosted (decision) trees (GBT) gradient clipping gradient descent graph graph execution greedy policy ground truth group attribution bias hallucination hashing heuristic hidden layer hierarchical clustering hinge loss holdout data hyperparameter hyperplane i.i.d. image recognition imbalanced dataset implicit bias incompatibility of fairness metrics independently and identically distributed (i.i.d) individual fairness inference inference path information gain in-group bias input layer in-set condition instance interpretability inter-rater agreement intersection over union (IoU) IoU item matrix items iteration Keras keypoints Kernel Support Vector Machines (KSVMs) k-means k-median L0 regularization L1 loss L1 regularization L2 loss L2 regularization label labeled example LaMDA (Language Model for Dialogue Applications) lambda landmarks language model large language model layer Layers API (tf.layers) leaf learning rate least squares regression linear model linear linear regression logistic regression logits Log Loss log-odds Long Short-Term Memory (LSTM) loss loss curve loss function loss surface LSTM machine learning majority class Markov decision process (MDP) Markov property masked language model matplotlib matrix factorization Mean Absolute Error (MAE) Mean Squared Error (MSE) metric meta-learning Metrics API (tf.metrics) mini-batch mini-batch stochastic gradient descent minimax loss minority class ML MNIST modality model model capacity model parallelism model training Momentum multi-class classification multi-class logistic regression multi-head self-attention multimodal model multinomial classification multinomial regression NaN trap natural language understanding negative class neural network neuron N-gram NLU node (neural network) node (TensorFlow graph) node (decision tree) noise non-binary condition nonlinear non-response bias nonstationarity normalization novelty detection numerical data NumPy objective objective function oblique condition offline offline inference one-hot encoding one-shot learning one-vs.-all online online inference operation (op) out-of-bag evaluation (OOB evaluation) optimizer out-group homogeneity bias outlier detection outliers output layer overfitting oversampling pandas parameter Parameter Server (PS) parameter update partial derivative participation bias partitioning strategy perceptron performance permutation variable importances perplexity pipeline pipelining policy pooling positive class post-processing PR AUC (area under the PR curve) precision precision-recall curve prediction prediction bias predictive parity predictive rate parity preprocessing pre-trained model prior belief probabilistic regression model proxy (sensitive attributes) proxy labels Q-function Q-learning quantile quantile bucketing quantization queue random forest random policy ranking rank (ordinality) rank (Tensor) rater recall recommendation system Rectified Linear Unit (ReLU) recurrent neural network regression model regularization regularization rate reinforcement learning (RL) ReLU replay buffer reporting bias representation re-ranking return reward ridge regularization RNN ROC (receiver operating characteristic) Curve root root directory Root Mean Squared Error (RMSE) rotational invariance sampling bias sampling with replacement SavedModel Saver scalar scaling scikit-learn scoring selection bias self-attention (also called self-attention layer) self-supervised learning self-training semi-supervised learning sensitive attribute sentiment analysis sequence model sequence-to-sequence task serving shape (Tensor) shrinkage sigmoid function similarity measure size invariance sketching softmax sparse feature sparse representation sparse vector sparsity spatial pooling split splitter squared hinge loss squared loss stability staged training state state-action value function static static inference stationarity step step size stochastic gradient descent (SGD) stride structural risk minimization (SRM) subsampling summary supervised machine learning synthetic feature tabular Q-learning target target network temporal data Tensor TensorBoard TensorFlow TensorFlow Playground TensorFlow Serving Tensor Processing Unit (TPU) Tensor rank Tensor shape Tensor size termination condition test test loss test set tf.Example tf.keras threshold (for decision trees) time series analysis timestep token tower TPU TPU chip TPU device TPU master TPU node TPU Pod TPU resource TPU slice TPU type TPU worker training training loss training-serving skew training set trajectory transfer learning Transformer translational invariance trigram true negative (TN) true positive (TP) true positive rate (TPR) unawareness (to a sensitive attribute) underfitting undersampling unidirectional unidirectional language model unlabeled example unsupervised machine learning uplift modeling upweighting user matrix validation validation loss validation set vanishing gradient problem variable importances Wasserstein loss weight Weighted Alternating Least Squares (WALS) weighted sum wide model width wisdom of the crowd word embedding Z-score normalization