This page is the top-level glossary aggregator for the AI Wiki. It combines artificial intelligence vocabulary, machine learning concepts, and AI tool or product names into a single index. For deeper drill-downs see the specialized glossaries: artificial intelligence terms, machine learning terms, and the comprehensive machine learning terms/All. Use this index when you want to scan many topics at a glance; use the specialized glossaries when you need richer context.
See also: Models, Applications and Guides
Artificial intelligence terms
This section gathers high-level AI concepts, model families, companies, products, and operational vocabulary. It is the broadest bucket in the index and covers everything from foundational ideas like AGI and inference to specific labs and product names.
See also: artificial intelligence terms
Agentic Context Engineering: Methods for assembling, pruning, and routing context that an autonomous agent uses across multi-step tasks.
AI Agents: Software systems that perceive inputs, plan, and take actions toward goals, often invoking tools or APIs.
AI bubble: A market expectation that current AI investment and valuations exceed near-term revenue and may correct.
AI Monarchy
AI Project Management: The practice of planning, scoping, and shipping AI-powered products, including data, model, and evaluation work.
Alibaba Cloud: The cloud computing arm of Alibaba Group, which hosts AI services and the Qwen model family.
Anthropic: An AI safety company founded in 2021 that builds the Claude family of large language models.
Art: AI-generated images, illustrations, and visual works produced by generative image models.
Artificial general intelligence (AGI): A hypothetical AI with broad cognitive abilities matching or exceeding humans across most tasks.
Artificial Superintelligence (ASI): A hypothetical AI that decisively surpasses the best human minds in every economically valuable domain.
AudioCraft: Meta's open-source research codebase for audio generation, including MusicGen and AudioGen.
Backdooring LLMs: Inserting hidden triggers during training so that specific inputs cause attacker-chosen outputs at inference.
Baidu AI: Baidu's portfolio of AI products and the ERNIE family of foundation models.
Bard: Google's earlier conversational AI service, later rebranded as Gemini.
Benchmarks: Standardized test sets and evaluation suites used to compare model capabilities and quality.
BERT: Bidirectional Encoder Representations from Transformers, a 2018 Google encoder model for language understanding tasks.
Browser-use agent (BUA): An agent that controls a web browser to complete tasks by clicking, typing, and reading pages.
Burstiness: Variation in sentence length and rhythm, used as a signal in some AI text detectors.
ByteDance Seed: ByteDance's foundational research group developing large language and multimodal models.
Claude: Anthropic's family of large language models, including the Opus, Sonnet, and Haiku tiers.
Claude Code: Anthropic's command-line coding assistant that runs Claude as an agent in the terminal.
Clade Skills: Reusable, packaged capabilities that extend Claude with custom instructions, scripts, and data.
Claude --dangerously-skip-permissions: A Claude Code flag that disables interactive permission prompts so the agent runs commands without confirmation.
Completion: New text generated (output) by the LLM after you inference it with the prompt.
Context window: The maximum number of tokens a model can attend to in a single request, including prompt and output.
contexts optical compression: A technique used in DeepSeek-OCR that encodes long text contexts as compressed vision tokens.
Controversies: Notable disputes and incidents involving AI labs, products, or research practices.
CodeGen: A family of open-source code language models from Salesforce Research.
Computer-use agent (CUA): An agent that controls a desktop environment by reading the screen and operating mouse and keyboard.
Computer-use model: A model trained or fine-tuned to drive a computer interface as an agent.
ControlNet: A neural network architecture that adds spatial conditioning to diffusion models, controlling image generation with maps like edges or poses.
Copyright: Legal issues around training data, model outputs, and authorship of AI-generated works.
CUDA: NVIDIA's parallel computing platform and programming model for general-purpose GPU computation.
cuDNN: NVIDIA's GPU-accelerated library of primitives for deep neural networks.
Cursor Rules: Per-project configuration files that steer the Cursor AI editor with custom instructions.
Custom GPTs: OpenAI's mechanism for building tailored versions of ChatGPT with custom instructions, knowledge, and tools.
Dark Factory: A fully automated factory that operates without human workers or lighting.
Dead internet theory: The claim that much online content and engagement is now generated by bots and AI rather than humans.
DeepSeek: A Chinese AI lab known for open-weight reasoning and code models such as DeepSeek-V3 and DeepSeek-R1.
DeepSeek-OCR: A DeepSeek model that uses optical context compression to handle very long documents.
Deep Research: Agent products from OpenAI, Perplexity, and others that browse, read, and synthesize long-form reports.
Diffusion Models: Generative models that learn to reverse a gradual noising process to produce images, audio, or video.
DreamStudio: Stability AI's hosted web app for generating images with Stable Diffusion.
Edge AI: Running AI models on devices like phones or embedded hardware rather than in the cloud.
Emergent Abilities: Capabilities that appear in large language models past a certain scale and are not present in smaller versions.
Ethics: The study of moral implications of AI systems, including bias, autonomy, and societal impact.
Few shot: A prompting style that gives a model a small number of input-output examples to demonstrate a task.
Foundation models: Large models pretrained on broad data that can be adapted to many downstream tasks.
Frontier labs: The leading AI research organizations developing the most capable models, such as OpenAI, Anthropic, and Google DeepMind.
Frontier models: The most capable AI models at any given time, often subject to enhanced safety review.
GLM: A family of bilingual open-source language models developed by Zhipu AI.
GPT: Generative Pre-trained Transformer, OpenAI's family of decoder-only language models.
GPT-1: The 2018 first generation GPT model with 117 million parameters.
GPT-2: OpenAI's 2019 1.5B parameter language model released in staged increments due to misuse concerns.
GPT-3: OpenAI's 2020 175 billion parameter autoregressive language model.
GPT-4: OpenAI's multimodal large language model released in March 2023.
Google DeepMind: Google's combined AI research division formed in 2023 from DeepMind and Google Brain.
Grok: xAI's conversational large language model, integrated with the X social platform.
Grok 3 Jailbreak: A method that bypassed Grok 3's safety guardrails shortly after launch.
hallucination: Model output that is fluent but factually incorrect or unsupported by the input.
HuggingGPT: A research system that uses ChatGPT to plan and orchestrate calls to specialist models on Hugging Face.
image generator: A generative model that produces images from text prompts or other inputs.
inclusionAI: An open-source organization releasing Ant Group's Ling and Ring model families.
inference: Running a trained model on new inputs to produce predictions or completions.
Isaac GR00T: NVIDIA's foundation model and platform for general-purpose humanoid robots.
Knowledge distillation: Training a smaller student model to match the outputs of a larger teacher model.
LaMDA: Google's Language Model for Dialogue Applications, the precursor to Bard.
LangChain: An open-source framework for composing LLM applications with tools, memory, and chains.
Large language models (LLMs): Neural language models trained on massive text corpora, typically transformer-based with billions of parameters.
Large language models ranking: Comparative ordering of LLMs by benchmark scores, capabilities, or user preferences.
Leaderboards: Public ranked lists of model performance on standardized evaluations.
LLM Anxiety: Psychological unease prompted by uncertainty about AI's effects on work, identity, or society.
LongCat-Video: A long-form video generation model from Meituan's LongCat lab.
LoRA (low-rank adaptation): A parameter-efficient fine-tuning method that injects small trainable rank-decomposition matrices into a frozen base model.
Manipulation problem: Concerns that advanced AI systems could influence human beliefs or behavior at scale.
Manus AI: A general AI agent from Chinese startup Butterfly Effect that autonomously executes multi-step tasks.
Meta AI: Meta's AI assistant product integrated across Facebook, Instagram, WhatsApp, and Messenger.
Meta AI (Company): The artificial intelligence research division of Meta Platforms, formerly Facebook AI Research.
Meta Prompting: Using prompts that instruct a model to generate or refine other prompts.
MiniMax: A Chinese AI startup known for the MiniMax-M and Hailuo video generation models.
Minimum Viable Agent: The smallest agent loop that demonstrates value on a target task, used as a starting point for iteration.
Mistral AI: A French AI lab known for open-weight models including Mistral 7B and Mixtral.
Mixture of Experts (MoE): An architecture that routes each token to a small subset of expert subnetworks, scaling capacity without proportional compute.
Model collapse: Degradation of model quality when later generations are trained primarily on prior models' synthetic outputs.
Model hubs: Repositories such as Hugging Face that host and distribute model weights and metadata.
Moonshot AI: A Chinese AI startup behind the Kimi chatbot and the Kimi K2 family of models.
Multi-head Latent Attention (MLA): An attention variant from DeepSeek that compresses key-value caches via low-rank projections.
Muon optimizer: A second-order matrix optimizer designed to train neural networks faster than AdamW for some workloads.
Neural network: A model of stacked layers of weighted units that learns from data via gradient-based optimization.
NVIDIA: The American GPU and accelerated computing company whose chips power most modern AI training.
NVIDIA DGX Spark: NVIDIA's compact desktop AI workstation built on the GB10 Grace Blackwell platform.
NVIDIA Omniverse: NVIDIA's platform for building and operating 3D simulation and digital twin applications.
OCR Models: Optical character recognition models that extract text from images and documents.
Ollama: An open-source tool for running and managing local large language models on personal hardware.
One shot: A prompting style with a single example provided in the prompt.
OpenAI: The AI lab founded in 2015 that produces GPT, DALL-E, and ChatGPT.
OpenAI Gym: A retired OpenAI toolkit of reinforcement learning environments, now maintained as Gymnasium.
OpenAI Gym Retro: An RL platform that turned classic console games into reinforcement learning environments.
OpenAI Universe: An early OpenAI platform for measuring agent intelligence across many games and applications.
OpenMule
OpenRouter: A unified API and marketplace that routes requests across many hosted large language models.
Paper2Video: A pipeline that turns academic papers into narrated video presentations using generative models.
Perplexity: An AI-native answer engine that combines web search with large language model responses.
PoisonGPT: A proof-of-concept demonstrating how a model on Hugging Face could be modified to spread targeted misinformation.
Post-training: All training applied to a pretrained model, including supervised fine-tuning and reinforcement learning from feedback.
Pre-training: The large-scale self-supervised stage where a foundation model learns from broad data before fine-tuning.
Presentations: Slide decks and visual material produced or assisted by AI.
Prompt: The text input given to a language model to elicit a completion.
Prompt engineering: The craft of designing prompts to reliably steer model behavior.
Prompt engineering for image generation: Techniques for writing text prompts that produce desired images from generative models.
Prompt engineering for text generation: Techniques and patterns for writing text prompts that produce desired language model outputs.
Prompt extraction: An attack that tricks a model into revealing its hidden system prompt or instructions.
Prompt injection: An attack that smuggles instructions into model inputs to override the developer's intended behavior.
Pruning: Removing weights, neurons, or attention heads from a network to reduce size and compute.
Purple Llama: Meta's umbrella project of trust and safety tools for open AI development.
PyTorch: An open-source deep learning framework originally developed by Meta and widely used in research.
Q* OpenAI: A rumored internal OpenAI research project associated with stronger reasoning capabilities.
QPU: Quantum processing unit, the hardware that performs quantum computations.
Quantization: Reducing numerical precision of model weights and activations to shrink memory and speed inference.
Qwen: Alibaba's open-source family of large language and multimodal models.
ReasoningBank: A memory system that stores reusable reasoning traces and tool patterns for agents.
Reinforcement learning: Learning a policy by interacting with an environment and optimizing a cumulative reward signal.
Reflection AI: A startup founded by former DeepMind researchers focused on autonomous coding agents.
RLHF (Reinforcement learning from human feedback): A post-training method that uses human preference labels to fit a reward model and fine-tune a policy.
RSI (Recursive self-improvement): A scenario in which an AI improves its own design, accelerating its capability gains.
Segment Anything: Meta's promptable segmentation model that can cut out any object in an image.
SEO for ChatGPT: Optimization practices aimed at making content more likely to be cited by AI answer engines.
SmolVLA: Hugging Face's small open-source vision-language-action model for robotics.
Stability AI: The UK-based company behind the Stable Diffusion family of generative image models.
Supervised fine-tuning (SFT): Fine-tuning a pretrained model on labeled input-output pairs, often used to teach instruction following.
System prompt: A high-priority instruction passed to a model that sets persona, tools, and constraints for the conversation.
Text-to-3D: Generating 3D meshes or scenes from text prompts.
Text-to-audio: Generating speech, music, or sound effects from text prompts.
Text-to-image: Generating images from text prompts, popularized by DALL-E and Stable Diffusion.
Text-to-video: Generating video clips from text prompts using generative video models.
The Pile: An 825 GiB open-source text dataset assembled by EleutherAI for training language models.
Tokens: The discrete units, typically subword pieces, that a language model reads and produces.
Transformers: The neural network architecture introduced in 2017 that uses self-attention as its core building block.
Unstable Diffusion: A community fork and Discord effort focused on uncensored Stable Diffusion variants.
Vector database: A database optimized for storing and searching high-dimensional embedding vectors by similarity.
Vector embeddings: Dense numerical representations of text, images, or other data in a continuous space.
Vibe coding: A coding style coined by Andrej Karpathy where the developer describes intent and lets an AI write most of the code.
Vision encoder: The model component that converts an image into feature representations for downstream tasks.
Vision Language Model (VLM): A model that jointly understands images and text for tasks like captioning, VQA, and grounding.
Vision tokens: Token representations produced by encoding an image so a transformer can process visual input.
Vision Transformer (ViT): An image classification model that applies the transformer architecture to sequences of image patches.
VLA: Vision-language-action models that take images and language and output robot actions.
Wan AI: An open video generation model series released by Alibaba's Wan team.
World models: Internal predictive models of an environment that an agent can plan against without real interaction.
xAI: Elon Musk's AI company that develops the Grok line of large language models.
XPeng IRON: XPeng's humanoid robot designed for manufacturing and service tasks.
Z.ai: Zhipu AI's consumer chatbot and developer platform built around the GLM model family.
Machine learning terms
This section links to subject-area glossaries adapted from the Google Machine Learning Glossary, grouped by topic so readers can jump straight to the area they care about.
See also: Machine learning terms
Fundamentals: Core ML vocabulary covering training, evaluation, loss, regularization, and basic model types.
Natural Language Processing: Terms for tokenization, language modeling, transformers, and text-specific architectures.
Language Evaluation: Metrics and methods used to measure language model output quality.
Fairness: Concepts around bias, equality of opportunity, and disparate impact in ML systems.
Decision Forests: Vocabulary for decision trees, random forests, and gradient-boosted trees.
Reinforcement Learning: Terms for agents, environments, policies, rewards, and value functions.
Computer Vision: Vocabulary for image recognition, object detection, and visual representation learning.
Image Models: Terms for convolutional networks, vision transformers, and image-specific training tricks.
Sequence Models: Vocabulary for RNNs, LSTMs, and sequence-to-sequence architectures.
Clustering: Terms for k-means, hierarchical clustering, and similarity measures.
Recommendation Systems: Vocabulary for candidate generation, matrix factorization, and re-ranking.
TensorFlow: Framework-specific vocabulary for the TensorFlow ecosystem.
Google Cloud: ML vocabulary specific to Google Cloud Platform services.
All ML terms
This section is the flat alphabetical index of every machine learning term in the wiki. For grouped views see the subject pages above; for the long-form list see Machine learning terms/All.
See also: Machine learning terms/All
- A/B testing: A statistical method comparing two variants on a metric to decide which performs better.
- accuracy: The fraction of predictions a classification model got right.
- action: In reinforcement learning, the choice an agent makes from its action space in a state.
- activation function: A nonlinear function such as ReLU applied to a neuron's weighted sum.
- active learning: A training approach where the model selects which unlabeled examples to have humans label next.
- AdaGrad: An adaptive gradient algorithm that scales the learning rate per parameter using accumulated squared gradients.
- agent: In reinforcement learning, the entity that selects actions to maximize cumulative reward.
- agglomerative clustering: A bottom-up hierarchical clustering method that successively merges the closest clusters.
- anomaly detection: Identifying rare items, events, or observations that differ significantly from the majority.
- AR: Augmented reality, overlaying computer-generated content on the real world.
- area under the PR curve: A scalar summary of a precision-recall curve, useful for imbalanced classification.
- area under the ROC curve: A scalar summary of an ROC curve giving overall ranking quality.
- artificial general intelligence: A hypothetical AI with general human-level cognitive abilities across most tasks.
- artificial intelligence: The field of building systems that perform tasks typically requiring human intelligence.
- attention: A mechanism that computes weighted combinations of inputs based on learned relevance scores.
- attribute: A property or feature of an example used by a model.
- attribute sampling: Randomly selecting a subset of features when training each tree in a forest.
- AUC (Area under the ROC curve): A scalar in 0 to 1 summarizing classifier ranking quality across thresholds.
- augmented reality: Overlaying computer-generated content on the user's view of the real world.
- automation bias: The tendency to favor suggestions from automated systems over equally valid manual judgment.
- average precision: The mean of precision values computed at each recall level on a precision-recall curve.
- axis-aligned condition: A decision-tree split based on a single feature's value, parallel to a coordinate axis.
- backpropagation: The algorithm for computing gradients of a loss function with respect to weights in a neural network.
- bagging: Bootstrap aggregating, an ensemble method that trains models on random samples with replacement.
- bag of words: A text representation that counts word occurrences and ignores order.
- baseline: A simple reference model used as a comparison point for more complex models.
- batch: The set of examples used in one iteration of model training.
- batch normalization: Normalizing the inputs of a layer across each mini-batch to stabilize and speed up training.
- batch size: The number of training examples in one batch.
- Bayesian neural network: A neural network whose weights have probability distributions rather than fixed values.
- Bayesian optimization: A method for tuning expensive black-box functions using a surrogate probabilistic model.
- Bellman equation: A recursive equation relating the value of a state to the values of its successor states.
- BERT (Bidirectional Encoder Representations from Transformers): A 2018 Google encoder model trained with masked language modeling for NLU tasks.
- bias (ethics/fairness): Systematic unfairness in a model or dataset toward certain groups.
- bias (math) or bias term: An additional learned parameter added to a linear combination of inputs.
- bigram: An N-gram of length two.
- bidirectional: Processing a sequence using context from both earlier and later tokens.
- bidirectional language model: A language model that conditions on both left and right context, such as BERT.
- binary classification: A task with exactly two possible output classes.
- binary condition: A decision-tree condition with only two outcomes.
- binning: Grouping continuous values into discrete intervals.
- BLEU (Bilingual Evaluation Understudy): A precision-based metric for machine translation quality based on N-gram overlap.
- boosting: An ensemble method that combines weak learners sequentially, each correcting the prior's errors.
- bounding box: A rectangle around an object in an image used in object detection.
- broadcasting: Automatically expanding tensors of different shapes to compatible shapes for element-wise operations.
- bucketing: Converting a continuous feature into a categorical feature using value ranges.
- calibration layer: A post-prediction adjustment that aligns predicted probabilities with observed frequencies.
- candidate generation: The first stage of a recommender that produces a coarse list of items to score.
- candidate sampling: A training-time technique that computes loss using a subset of negative classes.
- categorical data: Features with a discrete set of possible values.
- causal language model: A language model that predicts the next token given only previous tokens.
- centroid: The center of a cluster, typically computed as the mean of its members.
- centroid-based clustering: Clustering algorithms like k-means that assign points to the nearest cluster center.
- checkpoint: A saved snapshot of a model's weights and optimizer state during training.
- class: One of the discrete output categories for a classification model.
- classification model: A model that predicts a discrete class label.
- classification threshold: The probability cutoff used to convert a score into a binary class prediction.
- class-imbalanced dataset: A dataset where some classes have far more examples than others.
- clipping: Limiting feature or gradient values to a maximum or minimum range.
- Cloud TPU: Google Cloud's hosted Tensor Processing Unit service for ML workloads.
- clustering: Grouping unlabeled examples by similarity.
- co-adaptation: A failure mode where neurons rely too heavily on each other, often mitigated by dropout.
- collaborative filtering: A recommendation technique that uses user-item interaction patterns to make predictions.
- condition: In a decision tree, a test on a feature that splits examples into branches.
- confirmation bias: The tendency to favor information that confirms existing beliefs.
- confusion matrix: A table comparing predicted versus actual class counts.
- continuous feature: A feature with an infinite range of floating-point values.
- convenience sampling: Drawing examples that are easy to gather rather than statistically representative.
- convergence: The state where additional training produces little change in loss.
- convex function: A function whose graph lies below any line segment between two points on it.
- convex optimization: Minimization of a convex function over a convex set, which has a unique minimum.
- convex set: A set where the line segment between any two points stays inside the set.
- convolution: A mathematical operation that slides a filter across input data to produce feature maps.
- convolutional filter: A small matrix of weights that is convolved across an input.
- convolutional layer: A neural network layer that applies a set of convolutional filters.
- convolutional neural network: A neural network using convolutional layers, especially for image data.
- convolutional operation: The element-wise multiply-and-sum that produces a single output value of a convolution.
- cost: Synonym for loss.
- co-training: A semi-supervised method where two models trained on different views label data for each other.
- counterfactual fairness: A fairness criterion requiring identical predictions on counterfactual versions of an example.
- coverage bias: Bias arising when the sampling frame excludes parts of the target population.
- crash blossom: An ambiguously parseable headline used as an NLP test case.
- critic: In actor-critic RL, the component that estimates the value of states or actions.
- cross-entropy: A loss function measuring the difference between two probability distributions.
- cross-validation: Estimating generalization by training and evaluating on multiple data splits.
- data analysis: Inspecting and summarizing data to understand its properties.
- data augmentation: Artificially expanding a training set with label-preserving transformations.
- DataFrame: A pandas data structure representing tabular data with labeled columns.
- data parallelism: A distributed training strategy where each device holds a model replica and processes different data.
- data set or dataset: A collection of examples used to train or evaluate a model.
- Dataset API (tf.data): TensorFlow's API for building input pipelines.
- decision boundary: The surface in feature space that separates predicted classes.
- decision forest: An ensemble model composed of multiple decision trees.
- decision threshold: The cutoff used to map model scores to discrete predictions.
- decision tree: A model that makes predictions through a hierarchy of feature-based conditions.
- deep model: A neural network with many hidden layers.
- decoder: The component of an encoder-decoder model that generates an output sequence.
- deep neural network: A neural network with multiple hidden layers.
- Deep Q-Network (DQN): A reinforcement learning algorithm that uses a deep neural network to approximate Q-values.
- demographic parity: A fairness criterion requiring equal positive prediction rates across groups.
- denoising: Removing noise from data, a core idea in diffusion models and autoencoders.
- dense feature: A feature whose values are mostly nonzero.
- dense layer: A fully connected neural network layer.
- depth: The number of layers in a neural network or the level of a node in a tree.
- depthwise separable convolutional neural network (sepCNN): A CNN that factors standard convolutions into depthwise and pointwise steps for efficiency.
- derived label: A label computed from other features rather than directly observed.
- device: The hardware accelerator, such as CPU, GPU, or TPU, used for computation.
- dimension reduction: Mapping high-dimensional data to a lower-dimensional space while preserving structure.
- dimensions: The number of axes or features in a vector or tensor.
- discrete feature: A feature with a finite set of possible values.
- discriminative model: A model that directly estimates the conditional probability of labels given inputs.
- discriminator: In a GAN, the network that distinguishes real from generated examples.
- disparate impact: A decision policy that disproportionately affects a protected group, even without intent.
- disparate treatment: A decision policy that explicitly treats individuals differently based on a protected attribute.
- divisive clustering: A top-down hierarchical clustering method that recursively splits clusters.
- downsampling: Reducing resolution, frequency, or count of examples in a majority class.
- DQN: Deep Q-Network, a deep reinforcement learning algorithm.
- dropout regularization: Randomly dropping units during training to reduce co-adaptation and overfitting.
- dynamic: A model or feature that updates frequently online.
- dynamic model: A model that is continuously retrained on new data.
- eager execution: A TensorFlow mode where operations run immediately rather than building a graph.
- early stopping: Halting training when validation loss stops improving to prevent overfitting.
- earth mover's distance (EMD): A measure of distance between two probability distributions, equivalent to Wasserstein-1.
- embedding layer: A neural network layer that maps discrete tokens to dense vectors.
- embedding space: The continuous space in which embedding vectors live.
- embedding vector: A dense numeric vector that represents a discrete item.
- empirical risk minimization (ERM): Choosing the model that minimizes loss on the training set.
- encoder: The component of an encoder-decoder model that produces a representation of the input.
- ensemble: A model formed by combining the predictions of several models.
- entropy: An information-theoretic measure of uncertainty in a distribution.
- environment: In reinforcement learning, the system the agent interacts with.
- episode: One run of an RL agent in an environment, from start to terminal state.
- epoch: One full pass through the training dataset.
- epsilon greedy policy: A policy that picks a random action with probability epsilon and the greedy action otherwise.
- equality of opportunity: A fairness criterion requiring equal true-positive rates across groups.
- equalized odds: A fairness criterion requiring equal true-positive and false-positive rates across groups.
- Estimator: A high-level TensorFlow API for training and evaluating models.
- example: One row of a dataset, comprising features and optionally a label.
- experience replay: A memory of past transitions sampled to train RL agents off-policy.
- experimenter's bias: Bias caused by experimenter expectations influencing results.
- exploding gradient problem: Gradients growing very large during backpropagation, destabilizing training.
- fairness constraint: A constraint added to optimization to enforce a fairness criterion.
- fairness metric: A measurement of how well a model satisfies a fairness criterion.
- false negative (FN): A positive example that the model predicted as negative.
- false negative rate: The proportion of actual positives that were predicted as negative.
- false positive (FP): A negative example that the model predicted as positive.
- false positive rate (FPR): The proportion of actual negatives that were predicted as positive.
- feature: An input variable used by a model.
- feature cross: A synthetic feature formed by combining two or more categorical features.
- feature engineering: The process of designing, selecting, and transforming features to improve model quality.
- feature extraction: Producing features from raw data, often using a learned representation.
- feature importances: Scores indicating how much each feature contributes to a model's predictions.
- feature set: The collection of features used by a model.
- feature spec: A schema describing the types and shapes of input features.
- feature vector: The numeric vector of features representing one example.
- federated learning: Training a shared model across many devices that keep their data local.
- feedback loop: A situation where a model's predictions influence future training data.
- feedforward neural network (FFN): A neural network where connections do not form cycles.
- few-shot learning: Learning from very few labeled examples per class.
- fine tuning: Continued training of a pretrained model on a downstream task.
- forget gate: The LSTM component that decides which information to discard from cell state.
- full softmax: A softmax computed over the entire output vocabulary.
- fully connected layer: A layer where every input is connected to every output unit.
- GAN: Generative adversarial network, a pair of competing generator and discriminator models.
- generalization: How well a model performs on previously unseen data.
- generalization curve: A plot of training and validation loss across training iterations.
- generalized linear model: A linear model with an output transformed by a link function.
- generative adversarial network (GAN): A generator trained against a discriminator to produce realistic samples.
- generative model: A model that estimates the joint distribution of inputs and labels, capable of producing new samples.
- generator: In a GAN, the network that produces synthetic examples.
- GPT (Generative Pre-trained Transformer): A decoder-only transformer language model family from OpenAI.
- gini impurity: A measure of class mixture in a tree node used for splitting decisions.
- gradient: The vector of partial derivatives of a loss with respect to model parameters.
- gradient boosting: An ensemble that fits new trees to the residuals of the current model.
- gradient boosted (decision) trees (GBT): A decision forest trained via gradient boosting.
- gradient clipping: Capping gradient norms to prevent exploding updates.
- gradient descent: Updating parameters in the direction of the negative gradient to reduce loss.
- graph: A symbolic representation of computation as nodes and edges.
- graph execution: Running a model by first building and then executing a computation graph.
- greedy policy: An RL policy that always picks the action with the highest estimated value.
- ground truth: The correct label or value used as the target during training and evaluation.
- group attribution bias: Generalizing what is true for an individual to a whole group, or vice versa.
- hallucination: Model output that sounds plausible but is unsupported or factually wrong.
- hashing: Mapping categorical values to fixed-size buckets using a hash function.
- heuristic: A practical rule of thumb used in place of a guaranteed-optimal method.
- hidden layer: A layer of a neural network between input and output.
- hierarchical clustering: A clustering method that produces a tree of nested clusters.
- hinge loss: A loss function used for margin-based classifiers such as SVMs.
- holdout data: Examples set aside from training and used only for evaluation.
- hyperparameter: A configuration value set before training, such as learning rate or depth.
- hyperplane: A flat affine subspace one dimension less than its ambient space.
- i.i.d.: Independent and identically distributed, a common assumption about training samples.
- image recognition: The task of identifying objects, scenes, or attributes in images.
- imbalanced dataset: A dataset where one class is much more frequent than others.
- implicit bias: Unconscious associations that may influence labels or model behavior.
- incompatibility of fairness metrics: The result that several fairness criteria cannot be satisfied simultaneously except in trivial cases.
- independently and identically distributed (i.i.d): A statistical assumption that samples are drawn independently from the same distribution.
- individual fairness: A fairness criterion requiring similar individuals to receive similar predictions.
- inference: Producing predictions from a trained model on new inputs.
- inference path: The sequence of nodes traversed by an example in a decision tree.
- information gain: The expected reduction in entropy from splitting on a feature.
- in-group bias: A preference for one's own group that can bias data or labels.
- input layer: The first layer of a neural network that receives raw features.
- in-set condition: A decision-tree condition that checks membership in a set of categorical values.
- instance: A single example fed to a model.
- interpretability: The degree to which a human can understand a model's decisions.
- inter-rater agreement: A measure of how often human labelers agree on the same examples.
- intersection over union (IoU): The overlap area between two regions divided by their union area.
- IoU: Intersection over union, a common evaluation metric for object detection and segmentation.
- item matrix: A matrix of item embeddings used in collaborative filtering.
- items: The objects being recommended or ranked.
- iteration: One update of a model's parameters during training.
- Keras: A high-level neural network API now bundled with TensorFlow.
- keypoints: Distinctive points in an image used for matching and pose estimation.
- Kernel Support Vector Machines (KSVMs): SVMs that use kernel functions to implicitly map inputs into higher-dimensional spaces.
- k-means: A centroid-based clustering algorithm that partitions data into k clusters.
- k-median: A clustering algorithm similar to k-means but using medians rather than means.
- L0 regularization: A regularizer that penalizes the count of nonzero parameters.
- L1 loss: A loss function equal to the absolute difference between predicted and target values.
- L1 regularization: A regularizer that penalizes the sum of absolute parameter values, encouraging sparsity.
- L2 loss: A loss function equal to the squared difference between predicted and target values.
- L2 regularization: A regularizer that penalizes the sum of squared parameter values.
- label: The target value associated with an example in supervised learning.
- labeled example: An example that includes both features and a label.
- LaMDA (Language Model for Dialogue Applications): Google's dialogue-focused language model that preceded Bard.
- lambda: A scalar that weights a regularization term in a loss function.
- landmarks: Specific anatomical or feature points used for pose or face analysis.
- language model: A model that assigns probabilities to sequences of tokens or predicts the next token.
- large language model: A language model with very many parameters trained on broad text data.
- layer: A set of neurons in a neural network that processes input and produces output.
- Layers API (tf.layers): A deprecated TensorFlow API for building neural network layers.
- leaf: A terminal node in a decision tree containing a prediction.
- learning rate: A scalar that controls the size of parameter updates during gradient descent.
- least squares regression: A regression method that minimizes the sum of squared residuals.
- linear model: A model whose output is a linear function of its features.
- linear: A relationship that can be expressed as a weighted sum plus a bias.
- linear regression: A model that fits a line or hyperplane to numerical data via least squares.
- logistic regression: A classification model that maps a linear combination of features through a sigmoid.
- logits: The raw, unnormalized scores output by a classifier before softmax.
- Log Loss: The negative log-likelihood loss for binary or multiclass classification.
- log-odds: The logarithm of the ratio of probabilities of an event versus its complement.
- Long Short-Term Memory (LSTM): A recurrent network architecture with gates designed to retain long-range dependencies.
- loss: A measure of how far model predictions are from targets.
- loss curve: A plot of loss values across training iterations.
- loss function: The function being minimized during training.
- loss surface: The shape of the loss as a function of model parameters.
- LSTM: Long Short-Term Memory, a gated recurrent neural network architecture.
- machine learning: The field of building systems that improve at a task with experience.
- majority class: The most common class in an imbalanced dataset.
- Markov decision process (MDP): A formal model of sequential decision making with states, actions, and rewards.
- Markov property: The property that the next state depends only on the current state and action.
- masked language model: A language model trained to predict masked tokens given surrounding context.
- matplotlib: A popular Python library for plotting and visualization.
- matrix factorization: A recommender technique that decomposes the user-item matrix into latent factors.
- Mean Absolute Error (MAE): The average of absolute differences between predicted and actual values.
- Mean Squared Error (MSE): The average of squared differences between predicted and actual values.
- metric: A scalar value used to evaluate model performance.
- meta-learning: Learning algorithms that improve their own learning processes across tasks.
- Metrics API (tf.metrics): TensorFlow's API for computing evaluation metrics.
- mini-batch: A small subset of the training set used in one optimization step.
- mini-batch stochastic gradient descent: Gradient descent that estimates gradients on mini-batches.
- minimax loss: The loss used in the original GAN formulation, where generator and discriminator play a minimax game.
- minority class: The less common class in an imbalanced dataset.
- ML: Abbreviation for machine learning.
- MNIST: A classic dataset of 70,000 handwritten digit images used to benchmark image classifiers.
- modality: A type of input or output such as text, image, audio, or video.
- model: A function with learned parameters that maps inputs to outputs.
- model capacity: The variety of functions a model can represent.
- model parallelism: Splitting a single model across multiple devices.
- model training: The process of fitting model parameters to data.
- Momentum: An optimizer modification that accelerates gradient descent using a velocity term.
- multi-class classification: A task with more than two output classes.
- multi-class logistic regression: A logistic regression generalized to multi-class problems via softmax.
- multi-head self-attention: The attention mechanism in transformers that runs several self-attention heads in parallel.
- multimodal model: A model that handles more than one input or output modality.
- multinomial classification: Synonym for multi-class classification.
- multinomial regression: A regression model for categorical outcomes with more than two classes.
- NaN trap: A training failure mode where a NaN value propagates and corrupts all subsequent computations.
- natural language understanding: The subfield of NLP focused on extracting meaning from text.
- negative class: In binary classification, the class labeled as not the target.
- neural network: A model of stacked layers of weighted units that learns from data.
- neuron: A unit in a neural network that computes a weighted sum and activation.
- N-gram: A contiguous sequence of N tokens from a piece of text.
- NLU: Natural language understanding.
- node (neural network): A neuron in a neural network.
- node (TensorFlow graph): An operation in a TensorFlow computation graph.
- node (decision tree): A condition or leaf in a decision tree.
- noise: Random variation in data unrelated to the underlying signal.
- non-binary condition: A decision-tree condition with more than two possible outcomes.
- nonlinear: A relationship that is not a weighted sum of inputs.
- non-response bias: Bias from systematic differences between respondents and non-respondents.
- nonstationarity: A property where data distribution changes over time.
- normalization: Rescaling data to a standard range or distribution.
- novelty detection: Detecting whether new examples come from a different distribution than training data.
- numerical data: Features represented as integers or floating-point numbers.
- NumPy: A foundational Python library for numerical array computation.
- objective: A quantity the model is trying to optimize.
- objective function: The function optimized during training, typically a loss.
- oblique condition: A decision-tree split based on a linear combination of multiple features.
- offline: Computation done in batch ahead of serving rather than in real time.
- offline inference: Generating predictions in batch and storing them for later lookup.
- one-hot encoding: Representing a categorical value as a vector with a single 1 and the rest 0.
- one-shot learning: Learning a concept from a single labeled example.
- one-vs.-all: A multi-class strategy training one binary classifier per class.
- online: Computation done in real time as requests arrive.
- online inference: Generating predictions on demand for individual requests.
- operation (op): A node in a TensorFlow graph that performs computation.
- out-of-bag evaluation (OOB evaluation): Evaluating a bagged ensemble using examples not in each tree's bootstrap sample.
- optimizer: An algorithm that updates parameters to minimize a loss.
- out-group homogeneity bias: The tendency to see members of other groups as more similar to each other than they really are.
- outlier detection: Identifying examples that deviate strongly from the majority.
- outliers: Examples that lie far from the bulk of the data distribution.
- output layer: The final layer of a neural network that produces predictions.
- overfitting: When a model learns training data too closely and generalizes poorly.
- oversampling: Replicating minority-class examples to balance a dataset.
- pandas: A Python library for tabular data analysis built on NumPy.
- parameter: A learnable variable, such as a weight or bias, fit during training.
- Parameter Server (PS): A distributed training architecture with workers and central parameter servers.
- parameter update: One adjustment of model parameters during training.
- partial derivative: The derivative of a function with respect to one of its variables.
- participation bias: Bias from individuals self-selecting into or out of participation.
- partitioning strategy: The scheme for splitting data or computation across devices.
- perceptron: A simple linear binary classifier that updates weights on misclassified examples.
- performance: How well a model performs on a task or how fast it runs.
- permutation variable importances: Feature importance scores measured by shuffling each feature and observing the drop in performance.
- perplexity: A measure of how well a language model predicts a sample.
- pipeline: A sequence of data and model processing steps.
- pipelining: Overlapping execution stages to improve throughput.
- policy: In reinforcement learning, the rule mapping states to actions.
- pooling: A downsampling operation that aggregates values within a region.
- positive class: In binary classification, the class labeled as the target.
- post-processing: Adjustments applied to model outputs after prediction.
- PR AUC (area under the PR curve): A scalar summary of a precision-recall curve.
- precision: The fraction of predicted positives that are actually positive.
- precision-recall curve: A plot of precision against recall at various thresholds.
- prediction: The model's output for a given input.
- prediction bias: A systematic offset between average model prediction and average label.
- predictive parity: A fairness criterion requiring equal positive predictive value across groups.
- predictive rate parity: Synonym for predictive parity.
- preprocessing: Transforming raw data into a form suitable for a model.
- pre-trained model: A model whose weights come from prior training on a related task.
- prior belief: Initial assumptions about parameters before observing data.
- probabilistic regression model: A regression model that outputs a probability distribution over the target.
- proxy (sensitive attributes): A feature that stands in for a sensitive attribute and can carry its bias.
- proxy labels: Labels used as a stand-in for the true target that cannot be directly observed.
- Q-function: The expected return of taking an action in a state and following a policy.
- Q-learning: A model-free reinforcement learning algorithm that learns the Q-function.
- quantile: A cut point dividing a distribution into intervals with equal probabilities.
- quantile bucketing: Binning continuous features so each bucket contains a similar number of examples.
- quantization: Reducing numerical precision of weights or activations to save memory and compute.
- queue: A data structure used to feed examples into a training pipeline.
- random forest: An ensemble of decision trees trained on bootstrap samples with random feature subsets.
- random policy: An RL policy that picks each action uniformly at random.
- ranking: The task of ordering items by relevance to a query.
- rank (ordinality): The position of an item in an ordered list.
- rank (Tensor): The number of dimensions of a tensor.
- rater: A human labeler who annotates examples for training or evaluation.
- recall: The fraction of actual positives that were correctly predicted as positive.
- recommendation system: A system that suggests items to users based on past behavior.
- Rectified Linear Unit (ReLU): An activation function that outputs the input if positive and zero otherwise.
- recurrent neural network: A neural network with cycles that processes sequences.
- regression model: A model that predicts continuous numerical values.
- regularization: Techniques that penalize complexity to improve generalization.
- regularization rate: A hyperparameter that controls the strength of regularization.
- reinforcement learning (RL): A paradigm where agents learn policies by interacting with environments and receiving rewards.
- ReLU: Rectified Linear Unit activation function.
- replay buffer: A memory of past transitions used in off-policy RL.
- reporting bias: Bias arising when people report events selectively rather than fully.
- representation: A numeric encoding of inputs learned by a model.
- re-ranking: The second stage of a recommender that refines an initial candidate list.
- return: In reinforcement learning, the cumulative discounted reward from a state.
- reward: A scalar signal the environment gives an agent for an action.
- ridge regularization: Synonym for L2 regularization.
- RNN: Recurrent neural network.
- ROC (receiver operating characteristic) Curve: A plot of true positive rate against false positive rate at various thresholds.
- root: The top node of a decision tree.
- root directory: The base directory used to organize training output files.
- Root Mean Squared Error (RMSE): The square root of the mean squared error, in the units of the target.
- rotational invariance: The property that a model's output does not change when the input is rotated.
- sampling bias: Bias caused by selecting samples in a non-random way.
- sampling with replacement: Drawing items where each pick can be repeated.
- SavedModel: TensorFlow's standard format for serialized models.
- Saver: A TensorFlow 1.x object for saving and restoring model variables.
- scalar: A tensor with zero dimensions, that is a single number.
- scaling: Adjusting the magnitude of feature values, often to a common range.
- scikit-learn: A widely used Python library for classical machine learning.
- scoring: Computing a model's prediction or score for an example.
- selection bias: Bias arising when the way examples are selected for analysis affects results.
- self-attention (also called self-attention layer): An attention mechanism where queries, keys, and values all come from the same sequence.
- self-supervised learning: Training using labels derived automatically from the input itself.
- self-training: A semi-supervised method where a model labels unlabeled data and retrains on its own predictions.
- semi-supervised learning: Training that combines a small labeled set with a larger unlabeled set.
- sensitive attribute: A feature like race or gender that fairness analyses must consider.
- sentiment analysis: The task of classifying text by expressed opinion or emotion.
- sequence model: A model whose inputs or outputs are ordered sequences.
- sequence-to-sequence task: A task that maps an input sequence to an output sequence, such as translation.
- serving: Deploying a trained model to handle inference requests.
- shape (Tensor): The number of elements along each dimension of a tensor.
- shrinkage: A regularization-like factor in gradient boosting that scales each tree's contribution.
- sigmoid function: The S-shaped function that maps any real number to the range 0 to 1.
- similarity measure: A function that quantifies how alike two examples are.
- size invariance: The property that a model's output does not change when the input is rescaled.
- sketching: Producing a compact summary of a large dataset for fast approximate queries.
- softmax: A function that converts a vector of logits into a probability distribution.
- sparse feature: A feature whose values are mostly zero.
- sparse representation: An encoding where most entries are zero.
- sparse vector: A vector dominated by zero entries.
- sparsity: The fraction of zero entries in a tensor.
- spatial pooling: Pooling applied across spatial dimensions in a CNN.
- split: A partition of a dataset into training, validation, and test subsets.
- splitter: A component that determines how to split a decision-tree node.
- squared hinge loss: The square of hinge loss, used to penalize larger margin violations more heavily.
- squared loss: Another name for L2 loss.
- stability: The degree to which small data changes cause small model changes.
- staged training: Training a model in distinct phases, often with growing data or complexity.
- state: In reinforcement learning, the current configuration of the environment.
- state-action value function: The expected return of taking an action in a state and then following a policy.
- static: A model or feature that is trained offline and not updated continuously.
- static inference: Synonym for offline inference.
- stationarity: The property of a distribution that does not change over time.
- step: One parameter update during training.
- step size: Synonym for learning rate.
- stochastic gradient descent (SGD): Gradient descent using gradient estimates from one example or a mini-batch at a time.
- stride: The step size used when sliding a convolutional filter across an input.
- structural risk minimization (SRM): A principle that balances training loss with model complexity.
- subsampling: Randomly drawing a smaller set of examples or features from a larger one.
- summary: A TensorBoard record of a scalar or tensor value during training.
- supervised machine learning: Learning a mapping from inputs to outputs using labeled examples.
- synthetic feature: A feature derived from one or more existing features.
- tabular Q-learning: Q-learning that stores Q-values in a table indexed by state-action pairs.
- target: The value a supervised model tries to predict.
- target network: In DQN, a periodically updated copy of the Q-network used to stabilize training.
- temporal data: Data indexed by time, such as time series.
- Tensor: A multi-dimensional array used in deep learning frameworks.
- TensorBoard: TensorFlow's visualization toolkit for inspecting metrics, graphs, and embeddings.
- TensorFlow: An open-source machine learning framework developed by Google.
- TensorFlow Playground: A browser-based visualization for experimenting with small neural networks.
- TensorFlow Serving: A flexible system for deploying TensorFlow models in production.
- Tensor Processing Unit (TPU): Google's custom ASIC designed to accelerate machine learning workloads.
- Tensor rank: The number of dimensions of a tensor.
- Tensor shape: The size of a tensor along each dimension.
- Tensor size: The total number of elements in a tensor.
- termination condition: The criterion used to decide when to stop training or splitting.
- test: The phase of evaluating a final model on held-out data.
- test loss: The loss computed on a test set.
- test set: A partition of a dataset used only for final evaluation.
- tf.Example: A TensorFlow protobuf message format for storing examples.
- tf.keras: The Keras implementation packaged with TensorFlow.
- threshold (for decision trees): The numeric cutoff used in a decision-tree split.
- time series analysis: The study of data points indexed in time order.
- timestep: One step of a sequence or simulation.
- token: A discrete unit of text such as a word, subword, or character.
- tower: A subnetwork dedicated to a specific input modality in a multi-tower architecture.
- TPU: Tensor Processing Unit.
- TPU chip: A single TPU ASIC.
- TPU device: A physical machine containing one or more TPU chips.
- TPU master: The coordinator process in a TPU training cluster.
- TPU node: A TPU resource consisting of TPU devices and a master.
- TPU Pod: A cluster of many TPU devices connected with high-speed interconnect.
- TPU resource: An allocation of TPU hardware in Google Cloud.
- TPU slice: A subset of TPU devices within a TPU Pod.
- TPU type: The generation and configuration of a TPU.
- TPU worker: A process that performs computation on TPU devices.
- training: The process of optimizing model parameters to fit data.
- training loss: The loss measured on the training set.
- training-serving skew: A mismatch between data processing during training and during serving.
- training set: The partition of a dataset used to fit model parameters.
- trajectory: A sequence of states, actions, and rewards in a reinforcement learning episode.
- transfer learning: Reusing a model trained on one task as a starting point for a related task.
- Transformer: A neural network architecture that uses self-attention as its core operator.
- translational invariance: The property that a model's output does not change when the input is shifted.
- trigram: An N-gram of length three.
- true negative (TN): A negative example that the model correctly predicted as negative.
- true positive (TP): A positive example that the model correctly predicted as positive.
- true positive rate (TPR): The fraction of actual positives correctly predicted, also called recall.
- unawareness (to a sensitive attribute): A fairness approach that simply omits a sensitive attribute as an input feature.
- underfitting: When a model is too simple to capture patterns in the data.
- undersampling: Removing majority-class examples to balance a dataset.
- unidirectional: Processing a sequence using context from only one direction.
- unidirectional language model: A language model that conditions only on past tokens.
- unlabeled example: An example that contains features but no label.
- unsupervised machine learning: Learning patterns from data without labels.
- uplift modeling: Modeling the incremental effect of a treatment on an outcome.
- upweighting: Increasing the loss weight on minority-class examples.
- user matrix: A matrix of user embeddings used in collaborative filtering.
- validation: The phase of evaluating a model on held-out data during model selection.
- validation loss: The loss computed on a validation set.
- validation set: A partition of a dataset used to tune hyperparameters.
- vanishing gradient problem: Gradients shrinking to near zero during backpropagation, preventing learning.
- variable importances: Scores indicating each feature's contribution to a model's predictions.
- Wasserstein loss: A GAN training loss based on Wasserstein distance, which often stabilizes training.
- weight: A learnable coefficient that scales an input in a neural network.
- Weighted Alternating Least Squares (WALS): A matrix factorization algorithm that alternates updates between user and item factors.
- weighted sum: The sum of inputs multiplied by their corresponding weights.
- wide model: A linear model that handles large sparse feature spaces.
- width: The number of units in a neural network layer.
- wisdom of the crowd: The idea that the aggregate of many independent estimates is often more accurate than any one.
- word embedding: A learned dense vector representation of a word.
- Z-score normalization: Rescaling values by subtracting the mean and dividing by the standard deviation.
References