Abbreviations
Last reviewed
May 13, 2026
Sources
15 citations
Review status
Source-backed
Revision
v2 ยท 6,836 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
15 citations
Review status
Source-backed
Revision
v2 ยท 6,836 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Acronyms, Terms and Guides
This page is a glossary of abbreviations, acronyms, and initialisms used throughout artificial intelligence and machine learning. Entries are grouped by category and each row gives the abbreviation, its expansion, a short description, and a wikilink target where one exists on this site. The list is not exhaustive, but it covers most of the shorthand that appears in modern AI research papers, blog posts, model cards, and infrastructure documentation.
AI as a field collects acronyms at an unusual rate. Some come from the underlying mathematics (PCA, SVM, KL), some from architectures named after their authors' favorite expansions (BERT, GPT, T5), some from training tricks (LoRA, DPO, RLHF), and some from infrastructure (CUDA, HBM, FSDP). The same letters can mean different things in different contexts. LDA is Linear Discriminant Analysis in classical statistics and Latent Dirichlet Allocation in topic modeling. SAE is a Sparse Autoencoder in interpretability work and Society of Automotive Engineers elsewhere. Where ambiguity exists, this glossary lists each meaning separately.
The original short list at the top of the page is preserved below for continuity, and the longer tables follow. Sources include original paper titles, the Hugging Face model and dataset hubs, NIST and EU regulatory documents, and standard machine learning textbooks. When an abbreviation is contested or only loosely standardized, the most common expansion is given.
| Abbreviation | Expansion |
|---|---|
| ACC | Accuracy |
| ADA | AdaBoosted Decision Trees |
| AdaBoost | Adaptive Boosting |
| AdR | AdaBoostRegressor |
| GenAI | Generative AI |
The broadest umbrella terms in the field. Most of these appear in the first paragraph of any AI book or news article.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| AI | Artificial Intelligence | The general field of building machines that perform tasks normally associated with human intelligence. | /wiki/artificial_intelligence |
| ML | Machine Learning | Subfield of AI focused on algorithms that learn from data rather than being explicitly programmed. | /wiki/machine_learning |
| DL | Deep Learning | Machine learning using deep neural networks with many layers. | /wiki/deep_learning |
| AGI | Artificial General Intelligence | Hypothetical AI that matches or exceeds human capability across most cognitive tasks. | /wiki/agi |
| ASI | Artificial Superintelligence | Hypothetical AI that surpasses the best human minds in essentially every domain. | /wiki/asi |
| ANI | Artificial Narrow Intelligence | AI that performs well on a single task or narrow domain, the kind we have today. | /wiki/ani |
| GenAI | Generative AI | Models that produce new content such as text, images, audio, or code. | /wiki/generative_ai |
| NLP | Natural Language Processing | Subfield concerned with computational processing of human language. | /wiki/natural_language_processing |
| NLG | Natural Language Generation | The task of producing fluent text from data or representations. | /wiki/natural_language_generation |
| NLU | Natural Language Understanding | The task of interpreting meaning from human language. | /wiki/natural_language_understanding |
| CV | Computer Vision | Subfield concerned with image and video understanding. | /wiki/computer_vision |
| RL | Reinforcement Learning | Learning by trial and error from rewards delivered by an environment. | /wiki/reinforcement_learning |
| RLHF | Reinforcement Learning from Human Feedback | RL where the reward signal is trained from human preference comparisons. | /wiki/rlhf |
| RLAIF | Reinforcement Learning from AI Feedback | Variant of RLHF where another model produces the preference labels instead of humans. | /wiki/rlaif |
| SL | Supervised Learning | Learning a mapping from labeled input output pairs. | /wiki/supervised_learning |
| USL | Unsupervised Learning | Learning structure from unlabeled data. | /wiki/unsupervised_learning |
| SSL | Self-Supervised Learning | Learning from labels that are derived automatically from the data itself. | /wiki/self_supervised_learning |
| TL | Transfer Learning | Using knowledge from one task to improve learning on another. | /wiki/transfer_learning |
| IL | Imitation Learning | Learning a policy by mimicking demonstrations rather than from a reward function. | /wiki/imitation_learning |
| MLOps | Machine Learning Operations | Practices and tooling for deploying and maintaining ML systems in production. | /wiki/mlops |
| LLMOps | Large Language Model Operations | MLOps specialized for the lifecycle of large language models. | /wiki/llmops |
| AIOps | AI for IT Operations | Use of ML to monitor, diagnose, and automate IT infrastructure. | /wiki/aiops |
| HCI | Human-Computer Interaction | Field studying how people interact with computing systems, increasingly relevant for AI products. | /wiki/hci |
| HITL | Human in the Loop | Workflows that combine automated decisions with human oversight. | /wiki/human_in_the_loop |
The core inventory of model families and architectural building blocks. Architectures often share letters with optimizers and benchmarks, so context matters.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| ANN | Artificial Neural Network | A model loosely inspired by biological neurons, composed of layers of weighted units. | /wiki/neural_network |
| CNN | Convolutional Neural Network | Network using convolutional filters, dominant in image recognition since 2012. | /wiki/cnn |
| RNN | Recurrent Neural Network | Network with loops that maintain state across sequence steps. | /wiki/rnn |
| LSTM | Long Short-Term Memory | RNN variant with gating that can learn long range dependencies. | /wiki/lstm |
| GRU | Gated Recurrent Unit | Simpler gated RNN cell introduced by Cho et al. in 2014. | /wiki/gru |
| DNN | Deep Neural Network | A neural network with multiple hidden layers. | /wiki/dnn |
| MLP | Multi-Layer Perceptron | Feedforward network of fully connected layers, the workhorse architecture. | /wiki/mlp |
| GAN | Generative Adversarial Network | Two networks trained in opposition: a generator and a discriminator. | /wiki/gan |
| VAE | Variational Autoencoder | Probabilistic autoencoder that learns a continuous latent space. | /wiki/vae |
| AE | Autoencoder | Network that compresses and reconstructs its input. | /wiki/autoencoder |
| SAE | Sparse Autoencoder | Autoencoder with sparsity constraints, now widely used in mechanistic interpretability. | /wiki/sparse_autoencoder |
| DAE | Denoising Autoencoder | Autoencoder trained to reconstruct clean inputs from corrupted versions. | /wiki/denoising_autoencoder |
| BERT | Bidirectional Encoder Representations from Transformers | 2018 Google encoder model that started the transformer wave in NLP. | /wiki/bert |
| GPT | Generative Pre-trained Transformer | OpenAI decoder model family, basis of ChatGPT. | /wiki/gpt |
| LLM | Large Language Model | A language model with billions of parameters trained on broad text corpora. | /wiki/large_language_model |
| SLM | Small Language Model | A smaller language model designed for edge devices or efficient inference. | /wiki/slm |
| MoE | Mixture of Experts | Architecture that routes each input through a small subset of expert subnetworks. | /wiki/mixture_of_experts |
| T5 | Text-to-Text Transfer Transformer | 2019 Google model that frames every NLP task as text in, text out. | /wiki/t5 |
| ViT | Vision Transformer | Transformer applied directly to image patches, introduced by Google in 2020. | /wiki/vit |
| DiT | Diffusion Transformer | Transformer backbone for diffusion image and video models. | /wiki/dit |
| MM-DiT | Multimodal Diffusion Transformer | DiT variant that jointly processes text and image streams, used in Stable Diffusion 3 and Flux. | /wiki/mm_dit |
| VLM | Vision-Language Model | Model that takes both images and text as input. | /wiki/vlm |
| VLA | Vision-Language-Action | Robotics model that maps images and instructions to motor actions. | /wiki/vla |
| DDPM | Denoising Diffusion Probabilistic Model | The Ho et al. 2020 formulation that revived diffusion for image generation. | /wiki/ddpm |
| DDIM | Denoising Diffusion Implicit Model | Non-Markovian diffusion sampling that allows fewer steps and deterministic outputs. | /wiki/ddim |
| NeRF | Neural Radiance Field | Method that reconstructs 3D scenes by overfitting a small MLP to view-dependent radiance. | /wiki/nerf |
| GS | Gaussian Splatting | 3D reconstruction using a cloud of explicit Gaussian primitives, faster than NeRF. | /wiki/gaussian_splatting |
| KAN | Kolmogorov-Arnold Network | 2024 architecture that places learnable activation functions on edges rather than nodes. | /wiki/kan |
| HMM | Hidden Markov Model | Probabilistic model with hidden states emitting observable symbols. | /wiki/hmm |
| GMM | Gaussian Mixture Model | Weighted sum of Gaussian distributions, often fit with EM. | /wiki/gmm |
| SVM | Support Vector Machine | Maximum margin classifier, dominant before deep learning for many tasks. | /wiki/svm |
| SVR | Support Vector Regression | Regression variant of SVM. | /wiki/svr |
| RF | Random Forest | Ensemble of decision trees with feature and sample bagging. | /wiki/random_forest |
| GBM | Gradient Boosting Machine | Sequential tree ensemble that fits each tree to the residuals of the previous ones. | /wiki/gbm |
| GBDT | Gradient Boosted Decision Trees | Synonym for GBM, common in industrial pipelines. | /wiki/gbdt |
| XGBoost | eXtreme Gradient Boosting | Popular high performance GBDT library by Tianqi Chen. | /wiki/xgboost |
| LightGBM | Light Gradient Boosting Machine | Microsoft GBDT library that uses histogram binning and leaf-wise growth. | /wiki/lightgbm |
| CART | Classification and Regression Trees | Foundational decision tree algorithm by Breiman et al. | /wiki/cart |
| KNN | k-Nearest Neighbors | Non-parametric method that classifies by majority vote among nearest training points. | /wiki/knn |
| PCA | Principal Component Analysis | Linear dimensionality reduction that finds directions of maximum variance. | /wiki/pca |
| LDA | Linear Discriminant Analysis | Supervised projection that maximizes class separation. Also stands for Latent Dirichlet Allocation. | /wiki/lda |
| LDA | Latent Dirichlet Allocation | Probabilistic topic model by Blei et al. that finds latent topics in documents. | /wiki/latent_dirichlet_allocation |
| t-SNE | t-distributed Stochastic Neighbor Embedding | Nonlinear embedding for visualizing high dimensional data. | /wiki/t_sne |
| UMAP | Uniform Manifold Approximation and Projection | Nonlinear embedding method, faster than t-SNE and better at global structure. | /wiki/umap |
| NMF | Non-negative Matrix Factorization | Matrix factorization with non-negativity constraints, often used for parts based decompositions. | /wiki/nmf |
| RBM | Restricted Boltzmann Machine | Bipartite generative network, historically important for pretraining. | /wiki/rbm |
| GNN | Graph Neural Network | Network that operates on graph structured data via message passing. | /wiki/gnn |
| GCN | Graph Convolutional Network | A specific GNN variant by Kipf and Welling. | /wiki/gcn |
| GAT | Graph Attention Network | GNN variant that uses attention weights between neighbors. | /wiki/gat |
| SSM | State Space Model | Sequence model based on linear dynamical systems, basis of S4 and Mamba. | /wiki/ssm |
Methods, optimizers, and parallelism strategies used to train modern models.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| SGD | Stochastic Gradient Descent | Workhorse optimizer that updates parameters using gradients on minibatches. | /wiki/sgd |
| Adam | Adaptive Moment Estimation | Optimizer that combines momentum with per-parameter adaptive learning rates. | /wiki/adam |
| AdamW | Adam with decoupled Weight decay | Modified Adam by Loshchilov and Hutter that decouples weight decay from gradient updates. | /wiki/adamw |
| LR | Learning Rate | The step size scalar applied to gradient updates. | /wiki/learning_rate |
| LRS | Learning Rate Scheduler | A schedule that varies the learning rate during training. | /wiki/learning_rate_scheduler |
| BN | Batch Normalization | Normalizes activations across the minibatch dimension. | /wiki/batch_normalization |
| LN | Layer Normalization | Normalizes activations across the feature dimension within a single sample, used in transformers. | /wiki/layer_normalization |
| GN | Group Normalization | Normalizes across groups of channels, useful for small batches. | /wiki/group_normalization |
| RMSNorm | Root Mean Square Normalization | Layer norm variant without mean centering, used in LLaMA and many other LLMs. | /wiki/rmsnorm |
| BP | Backpropagation | Algorithm that computes gradients of a loss with respect to network parameters. | /wiki/backpropagation |
| BPTT | Backpropagation Through Time | Backpropagation applied to unrolled recurrent networks. | /wiki/bptt |
| AD | Automatic Differentiation | The general technique of computing exact derivatives from a numerical program. | /wiki/automatic_differentiation |
| AMP | Automatic Mixed Precision | Training framework that uses lower precision where safe to speed up training. | /wiki/amp |
| FP32 | 32-bit Floating Point | IEEE 754 single precision, the historical default for ML training. | /wiki/fp32 |
| FP16 | 16-bit Floating Point | IEEE 754 half precision, smaller exponent range than BF16. | /wiki/fp16 |
| BF16 | Brain Floating Point 16 | Google's 16-bit format with 8 exponent bits, matching FP32 dynamic range. | /wiki/bf16 |
| FP8 | 8-bit Floating Point | New low precision format with E4M3 and E5M2 variants on Hopper and Blackwell GPUs. | /wiki/fp8 |
| DDP | Distributed Data Parallel | Replicates the model on each device and synchronizes gradients across them. | /wiki/ddp |
| FSDP | Fully Sharded Data Parallel | PyTorch parallelism that shards parameters, gradients, and optimizer state across devices. | /wiki/fsdp |
| ZeRO | Zero Redundancy Optimizer | DeepSpeed memory optimization that partitions optimizer state, gradients, and parameters. | /wiki/zero |
| TP | Tensor Parallelism | Splits individual tensors across devices, often used inside a node. | /wiki/tensor_parallelism |
| PP | Pipeline Parallelism | Splits the model into stages, each placed on different devices. | /wiki/pipeline_parallelism |
| DP | Data Parallelism | Splits the batch across devices that each hold a full model copy. | /wiki/data_parallelism |
| EP | Expert Parallelism | Distributes experts of a MoE model across devices. | /wiki/expert_parallelism |
| PEFT | Parameter-Efficient Fine-Tuning | Family of fine-tuning methods that update only a small fraction of parameters. | /wiki/peft |
| LoRA | Low-Rank Adaptation | PEFT method that injects trainable low-rank matrices into frozen weights. | /wiki/lora |
| QLoRA | Quantized LoRA | LoRA on top of a 4-bit quantized base model, enabling fine-tuning on a single GPU. | /wiki/qlora |
| DoRA | Weight-Decomposed Low-Rank Adaptation | LoRA variant that decomposes weights into magnitude and direction. | /wiki/dora |
| SFT | Supervised Fine-Tuning | Fine-tuning a pretrained model on labeled prompt-response pairs. | /wiki/sft |
| DPO | Direct Preference Optimization | Preference learning method that skips the explicit reward model used in RLHF. | /wiki/dpo |
| IPO | Identity Preference Optimization | DPO variant that does not assume the Bradley-Terry preference model. | /wiki/ipo |
| KTO | Kahneman-Tversky Optimization | Preference method using prospect theory style loss on binary feedback. | /wiki/kto |
| ORPO | Odds Ratio Preference Optimization | Combines SFT and preference optimization in a single loss. | /wiki/orpo |
| GRPO | Group Relative Policy Optimization | DeepSeek RL method that scores groups of samples relative to each other. | /wiki/grpo |
| PPO | Proximal Policy Optimization | Standard RL algorithm by Schulman et al., long used in RLHF. | /wiki/ppo |
| TRPO | Trust Region Policy Optimization | Predecessor to PPO with explicit trust region constraint. | /wiki/trpo |
| SAC | Soft Actor-Critic | Off-policy continuous-control RL method with entropy regularization. | /wiki/sac |
| DDPG | Deep Deterministic Policy Gradient | Off-policy actor-critic for continuous action spaces. | /wiki/ddpg |
| A2C | Advantage Actor-Critic | Synchronous actor-critic method using an advantage estimate. | /wiki/a2c |
| A3C | Asynchronous Advantage Actor-Critic | The asynchronous variant from DeepMind. | /wiki/a3c |
| EMA | Exponential Moving Average | Average of weights over training steps, often used for inference stability. | /wiki/ema |
| EM | Expectation-Maximization | Algorithm for fitting latent variable models such as GMMs. | /wiki/em |
Numbers people report in tables. Many overlap across NLP, CV, and speech.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| ACC | Accuracy | Fraction of predictions that are correct. | /wiki/accuracy |
| F1 | F1 score | Harmonic mean of precision and recall. | /wiki/f1_score |
| AUC | Area Under the Curve | Usually the area under a ROC or PR curve, summarizing ranking quality. | /wiki/auc |
| ROC | Receiver Operating Characteristic | Plot of true positive rate against false positive rate across thresholds. | /wiki/roc |
| PR | Precision-Recall | The curve and its summary metrics for imbalanced classification. | /wiki/precision_recall |
| MAP | Mean Average Precision | Average of per-query average precision, standard in retrieval and detection. | /wiki/map_metric |
| IoU | Intersection over Union | Overlap measure between predicted and ground truth regions, used in detection and segmentation. | /wiki/iou |
| BLEU | Bilingual Evaluation Understudy | N-gram overlap metric for machine translation, from Papineni et al. 2002. | /wiki/bleu |
| ROUGE | Recall-Oriented Understudy for Gisting Evaluation | N-gram overlap metric for summarization. | /wiki/rouge |
| METEOR | Metric for Evaluation of Translation with Explicit ORdering | Translation metric that considers stems and synonyms. | /wiki/meteor |
| CIDEr | Consensus-based Image Description Evaluation | TF-IDF weighted n-gram metric for image captioning. | /wiki/cider |
| WER | Word Error Rate | Edit distance metric for ASR systems, measured in words. | /wiki/wer |
| CER | Character Error Rate | Edit distance metric measured at the character level. | /wiki/cer |
| MOS | Mean Opinion Score | Subjective audio quality rating averaged over listeners. | /wiki/mos |
| PPL | Perplexity | Exponential of the average negative log likelihood, standard language model metric. | /wiki/perplexity |
| ECE | Expected Calibration Error | Measures how well a model's predicted probabilities match observed frequencies. | /wiki/ece |
| KL | Kullback-Leibler divergence | Information theoretic measure of how one distribution differs from another. | /wiki/kl_divergence |
| JSD | Jensen-Shannon Divergence | Symmetric version of KL divergence. | /wiki/jsd |
| MSE | Mean Squared Error | Average squared difference between predictions and targets. | /wiki/mse |
| MAE | Mean Absolute Error | Average absolute difference between predictions and targets. | /wiki/mae |
| RMSE | Root Mean Squared Error | Square root of MSE, in the same units as the target. | /wiki/rmse |
| MAPE | Mean Absolute Percentage Error | Average of absolute errors expressed as a percentage of the true value. | /wiki/mape |
| NDCG | Normalized Discounted Cumulative Gain | Ranking quality measure weighted by position. | /wiki/ndcg |
| TPR | True Positive Rate | Sensitivity or recall, fraction of positives correctly identified. | /wiki/tpr |
| FPR | False Positive Rate | Fraction of negatives incorrectly flagged as positive. | /wiki/fpr |
| EER | Equal Error Rate | Operating point where FPR equals FNR, common in biometrics. | /wiki/eer |
| Pass@k | Pass at k | Fraction of problems solved by at least one of k samples, standard for code benchmarks. | /wiki/pass_at_k |
| Elo | Elo rating | Pairwise rating system, used by LMSYS Chatbot Arena. | /wiki/elo |
The shorthand for problems researchers try to solve and the test sets they use to compare progress.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| VQA | Visual Question Answering | Answering natural language questions about images. | /wiki/vqa |
| ASR | Automatic Speech Recognition | Converting speech audio to text. | /wiki/asr |
| TTS | Text-to-Speech | Synthesizing speech audio from text. | /wiki/tts |
| VAD | Voice Activity Detection | Detecting when speech is present in an audio stream. | /wiki/vad |
| SED | Sound Event Detection | Detecting and labeling sound events in audio. | /wiki/sed |
| T2I | Text-to-Image | Generating images from a text prompt. | /wiki/t2i |
| I2T | Image-to-Text | Generating text from an image, such as captioning or OCR. | /wiki/i2t |
| T2V | Text-to-Video | Generating video clips from a text prompt. | /wiki/t2v |
| I2V | Image-to-Video | Animating a still image into video. | /wiki/i2v |
| T2A | Text-to-Audio | Generating audio or music from a text prompt. | /wiki/t2a |
| T23D | Text-to-3D | Generating 3D models from a text prompt. | /wiki/t23d |
| OCR | Optical Character Recognition | Reading text from images of documents or signs. | /wiki/ocr |
| OD | Object Detection | Localizing and classifying objects in images. | /wiki/object_detection |
| SS | Semantic Segmentation | Pixel level classification of an image. | /wiki/semantic_segmentation |
| IS | Instance Segmentation | Object detection plus per-instance pixel masks. | /wiki/instance_segmentation |
| SQuAD | Stanford Question Answering Dataset | Reading comprehension benchmark by Rajpurkar et al. | /wiki/squad |
| MMLU | Massive Multitask Language Understanding | 57-subject knowledge benchmark by Hendrycks et al. | /wiki/mmlu |
| MMLU-Pro | MMLU Professional | Harder MMLU successor with 10 answer choices and reasoning emphasis. | /wiki/mmlu_pro |
| MMMU | Massive Multi-discipline Multimodal Understanding | College-level multimodal benchmark across many subjects. | /wiki/mmmu |
| HumanEval | HumanEval | OpenAI's 164 hand-written Python coding problems. | /wiki/humaneval |
| MBPP | Mostly Basic Python Problems | Google's Python programming benchmark of about 1,000 problems. | /wiki/mbpp |
| APPS | Automated Programming Progress Standard | Competitive programming benchmark from Hendrycks et al. | /wiki/apps |
| SWE-bench | Software Engineering benchmark | Real GitHub issues paired with patches, evaluating LLM coding agents. | /wiki/swe_bench |
| RepoBench | RepoBench | Repository level code completion benchmark. | /wiki/repobench |
| GSM8K | Grade School Math 8K | 8,500 grade-school word problems by OpenAI. | /wiki/gsm8k |
| MATH | Mathematics benchmark | 12,500 competition math problems by Hendrycks et al. | /wiki/math_benchmark |
| AIME | American Invitational Mathematics Examination | High school competition used as an AI reasoning benchmark. | /wiki/aime |
| GPQA | Graduate-Level Google-Proof Q&A | Hard science benchmark designed to resist web search. | /wiki/gpqa |
| HLE | Humanity's Last Exam | Cross-disciplinary expert-level benchmark by Center for AI Safety and Scale AI. | /wiki/hle |
| ARC | AI2 Reasoning Challenge | Allen Institute multiple-choice science exam benchmark. | /wiki/arc |
| ARC-AGI | Abstraction and Reasoning Corpus | Francois Chollet's visual reasoning benchmark, distinct from AI2 ARC. | /wiki/arc_agi |
| HellaSwag | HellaSwag | Sentence completion benchmark targeting commonsense inference. | /wiki/hellaswag |
| WinoGrande | WinoGrande | Large scale Winograd schema benchmark for commonsense reasoning. | /wiki/winogrande |
| TruthfulQA | TruthfulQA | Benchmark measuring how often models repeat human misconceptions. | /wiki/truthfulqa |
| BIG-bench | Beyond the Imitation Game benchmark | Crowdsourced suite of 200+ tasks for language models. | /wiki/big_bench |
| AGIEval | AGIEval | Benchmark of human standardized exams across disciplines. | /wiki/agieval |
The chips, interconnects, and runtime formats that run the models.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| CPU | Central Processing Unit | General purpose processor, still used for data preparation and serving small models. | /wiki/cpu |
| GPU | Graphics Processing Unit | Parallel processor originally for graphics, dominant for ML training. | /wiki/gpu |
| TPU | Tensor Processing Unit | Google's custom AI accelerator. | /wiki/tpu |
| NPU | Neural Processing Unit | Generic term for accelerators on phones and laptops dedicated to ML. | /wiki/npu |
| LPU | Language Processing Unit | Groq's branding for its inference chip. | /wiki/lpu |
| IPU | Intelligence Processing Unit | Graphcore's AI accelerator. | /wiki/ipu |
| DPU | Data Processing Unit | NVIDIA's term for SmartNIC class processors for networking and storage. | /wiki/dpu |
| VRAM | Video RAM | High speed memory on a GPU, the binding constraint for large model training. | /wiki/vram |
| HBM | High Bandwidth Memory | Stacked DRAM technology used on data center GPUs and accelerators. | /wiki/hbm |
| SRAM | Static RAM | On-chip memory with low latency, used for caches and TPU systolic arrays. | /wiki/sram |
| DRAM | Dynamic RAM | Standard system memory technology. | /wiki/dram |
| ASIC | Application-Specific Integrated Circuit | Custom silicon, of which TPUs are an example. | /wiki/asic |
| FPGA | Field-Programmable Gate Array | Reconfigurable hardware used for some inference deployments. | /wiki/fpga |
| CUDA | Compute Unified Device Architecture | NVIDIA's parallel computing platform and programming model. | /wiki/cuda |
| ROCm | Radeon Open Compute | AMD's open software stack for GPU compute. | /wiki/rocm |
| NVLink | NVIDIA NVLink | NVIDIA's high bandwidth GPU to GPU interconnect. | /wiki/nvlink |
| NVSwitch | NVIDIA NVSwitch | Switch fabric that connects many NVLink-capable GPUs in a node. | /wiki/nvswitch |
| PCIe | Peripheral Component Interconnect Express | Standard host to device interconnect, used for most GPUs. | /wiki/pcie |
| IB | InfiniBand | Low-latency interconnect used in HPC and AI clusters. | /wiki/infiniband |
| RDMA | Remote Direct Memory Access | Network feature allowing direct memory access between hosts, used in IB and RoCE. | /wiki/rdma |
| ONNX | Open Neural Network Exchange | Cross framework model format. | /wiki/onnx |
| MIG | Multi-Instance GPU | NVIDIA feature that partitions a single GPU into isolated instances. | /wiki/mig |
| SM | Streaming Multiprocessor | The basic compute unit inside an NVIDIA GPU. | /wiki/sm |
| TFLOPS | Tera Floating Point Operations Per Second | Common unit of compute throughput. | /wiki/tflops |
| MFU | Model FLOPs Utilization | Achieved throughput as a fraction of theoretical peak. | /wiki/mfu |
The software people actually type into their imports.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| TF | TensorFlow | Google's machine learning framework, originally graph based. | /wiki/tensorflow |
| PT | PyTorch | Meta's eager-mode deep learning framework, now the research standard. | /wiki/pytorch |
| JAX | JAX | Google's NumPy-compatible framework with composable transformations. | /wiki/jax |
| HF | Hugging Face | Platform and library suite hosting models, datasets, and the Transformers library. | /wiki/hugging_face |
| DL4J | Deep Learning for Java | JVM-based deep learning library from Skymind. | /wiki/dl4j |
| TRT | TensorRT | NVIDIA's inference optimizer and runtime. | /wiki/tensorrt |
| TFLite | TensorFlow Lite | Mobile and edge runtime for TensorFlow models. | /wiki/tflite |
| MLX | MLX | Apple's array framework for Apple silicon. | /wiki/mlx |
| vLLM | vLLM | High throughput LLM inference engine with PagedAttention. | /wiki/vllm |
| TGI | Text Generation Inference | Hugging Face's production LLM serving framework. | /wiki/tgi |
| SGLang | SGLang | Structured generation language and runtime for LLM serving. | /wiki/sglang |
| DSPy | DSPy | Stanford framework for programming, not prompting, language models. | /wiki/dspy |
Techniques and abstractions used in research, agents, and applied systems.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| MCP | Model Context Protocol | Anthropic's open protocol for connecting AI assistants to tools and data. | /wiki/model_context_protocol |
| RAG | Retrieval-Augmented Generation | Pattern of retrieving documents and conditioning generation on them. | /wiki/rag |
| CRAG | Corrective Retrieval-Augmented Generation | RAG variant that grades retrieved passages and rewrites queries when they look weak. | /wiki/crag |
| HyDE | Hypothetical Document Embeddings | Retrieval method that embeds a hypothesized answer instead of the query. | /wiki/hyde |
| ReAct | Reasoning and Acting | Agent prompting pattern from Yao et al. 2022 that interleaves thought and action. | /wiki/react |
| CoT | Chain of Thought | Eliciting step by step reasoning in the output. | /wiki/chain_of_thought |
| ToT | Tree of Thoughts | Search over multiple reasoning paths rather than a single chain. | /wiki/tree_of_thoughts |
| GoT | Graph of Thoughts | Reasoning structure where thoughts form a directed graph. | /wiki/graph_of_thoughts |
| ICL | In-Context Learning | Performing a new task purely from examples in the prompt, without parameter updates. | /wiki/in_context_learning |
| FSL | Few-Shot Learning | Learning from a small number of labeled examples. | /wiki/few_shot_learning |
| ZSL | Zero-Shot Learning | Performing a task without task-specific labeled training data. | /wiki/zero_shot_learning |
| KD | Knowledge Distillation | Training a smaller student model to mimic a larger teacher. | /wiki/knowledge_distillation |
| NAS | Neural Architecture Search | Automatically searching for good network architectures. | /wiki/nas |
| AutoML | Automated Machine Learning | Automating the data, feature, model, and tuning pipeline. | /wiki/automl |
| HPO | Hyperparameter Optimization | Searching over training hyperparameters like LR or batch size. | /wiki/hpo |
| CFG | Classifier-Free Guidance | Diffusion sampling trick that interpolates between conditional and unconditional predictions. | /wiki/cfg |
| SDE | Stochastic Differential Equation | Continuous-time formulation of diffusion processes. | /wiki/sde |
| ODE | Ordinary Differential Equation | Deterministic formulation used in flow matching and DDIM-style samplers. | /wiki/ode |
| MCTS | Monte Carlo Tree Search | Search method underlying AlphaGo and many planning systems. | /wiki/mcts |
| BoN | Best-of-N sampling | Sampling N candidates and picking the highest scoring one. | /wiki/best_of_n |
| PTQ | Post-Training Quantization | Quantizing a trained model without retraining. | /wiki/ptq |
| QAT | Quantization-Aware Training | Training that simulates quantization in the forward pass. | /wiki/qat |
| GGUF | GGUF file format | Quantized model file format used by llama.cpp. | /wiki/gguf |
| GPTQ | GPTQ | Post-training quantization method targeting LLMs. | /wiki/gptq |
| AWQ | Activation-aware Weight Quantization | LLM quantization method that protects salient weights. | /wiki/awq |
Things you find in datasets and feature pipelines.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| i.i.d. | Independently and Identically Distributed | Standard statistical assumption that samples are drawn independently from the same distribution. | /wiki/iid |
| OOD | Out of Distribution | Inputs drawn from a different distribution than the training data. | /wiki/ood |
| OOV | Out of Vocabulary | Tokens not seen in the model's vocabulary. | /wiki/oov |
| TF-IDF | Term Frequency Inverse Document Frequency | Classical weighting scheme for sparse text features. | /wiki/tf_idf |
| BoW | Bag of Words | Representation of text as unordered token counts. | /wiki/bow |
| NER | Named Entity Recognition | Tagging spans like people, places, and organizations. | /wiki/ner |
| POS | Part of Speech tagging | Assigning grammatical categories to words. | /wiki/pos_tagging |
| SRL | Semantic Role Labeling | Tagging predicate argument structure in sentences. | /wiki/srl |
| WSD | Word Sense Disambiguation | Selecting the intended sense of a polysemous word in context. | /wiki/wsd |
| IR | Information Retrieval | Finding relevant documents for a query. | /wiki/information_retrieval |
| QA | Question Answering | Returning an answer rather than a list of documents. | /wiki/question_answering |
| MT | Machine Translation | Automatic translation between languages. | /wiki/machine_translation |
| SMT | Statistical Machine Translation | Pre-neural translation using phrase tables and language models. | /wiki/smt |
| NMT | Neural Machine Translation | Translation using neural sequence to sequence models. | /wiki/nmt |
| BPE | Byte-Pair Encoding | Subword tokenization algorithm widely used in LLMs. | /wiki/bpe |
| SPM | SentencePiece | Subword tokenization toolkit by Google. | /wiki/sentencepiece |
| WP | WordPiece | Subword tokenizer used in BERT. | /wiki/wordpiece |
| EDA | Exploratory Data Analysis | Initial summarization and visualization of a dataset. | /wiki/eda |
| ETL | Extract, Transform, Load | Standard data pipeline pattern. | /wiki/etl |
| EHR | Electronic Health Record | Patient records used in clinical ML. | /wiki/ehr |
A partial roll of the institutions whose names appear most often in papers and news. Many of these have their own articles on this wiki.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| OpenAI | OpenAI | AI lab founded in 2015, maker of GPT and ChatGPT. | /wiki/openai |
| Anthropic | Anthropic | AI safety company founded in 2021, maker of Claude. | /wiki/anthropic |
| DeepMind | Google DeepMind | London-based lab founded in 2010, acquired by Google in 2014. | /wiki/deepmind |
| FAIR | Fundamental AI Research | Meta's AI research lab, formerly Facebook AI Research. | /wiki/fair |
| MSR | Microsoft Research | Microsoft's corporate research arm. | /wiki/msr |
| GR | Google Research | Google's research division, separate from DeepMind. | /wiki/google_research |
| MIT | Massachusetts Institute of Technology | University with major AI groups including CSAIL. | /wiki/mit |
| CMU | Carnegie Mellon University | University with a large AI and ML faculty. | /wiki/cmu |
| BAIR | Berkeley AI Research | UC Berkeley's AI research lab. | /wiki/bair |
| AI2 | Allen Institute for AI | Seattle research institute founded by Paul Allen. | /wiki/ai2 |
| MILA | Montreal Institute for Learning Algorithms | Quebec research institute affiliated with Universitรฉ de Montrรฉal. | /wiki/mila |
| CHAI | Center for Human-Compatible AI | Berkeley center founded by Stuart Russell. | /wiki/chai |
| MIRI | Machine Intelligence Research Institute | Berkeley nonprofit focused on AI alignment theory. | /wiki/miri |
| HAI | Stanford Institute for Human-Centered AI | Stanford institute studying AI and its societal impact. | /wiki/stanford_hai |
| CSAIL | Computer Science and Artificial Intelligence Laboratory | MIT's main computing research lab. | /wiki/csail |
| NeurIPS | Conference on Neural Information Processing Systems | Premier ML conference, held annually. | /wiki/neurips |
| ICML | International Conference on Machine Learning | Major annual ML conference. | /wiki/icml |
| ICLR | International Conference on Learning Representations | Open-review ML conference focused on representation learning. | /wiki/iclr |
| CVPR | Conference on Computer Vision and Pattern Recognition | Top computer vision conference. | /wiki/cvpr |
| ECCV | European Conference on Computer Vision | Biennial computer vision conference. | /wiki/eccv |
| ACL | Association for Computational Linguistics | Major NLP conference and society. | /wiki/acl |
| EMNLP | Empirical Methods in Natural Language Processing | Major NLP conference. | /wiki/emnlp |
| AAAI | Association for the Advancement of Artificial Intelligence | Long-running AI conference and society. | /wiki/aaai |
| IJCAI | International Joint Conference on Artificial Intelligence | Long-running AI conference. | /wiki/ijcai |
Laws, frameworks, and standards bodies that increasingly shape what models can do and where.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| NIST AI RMF | NIST AI Risk Management Framework | Voluntary US framework for managing AI risks, first released January 2023. | /wiki/nist_ai_rmf |
| EU AI Act | European Union AI Act | EU regulation classifying AI systems by risk, agreed in 2024. | /wiki/eu_ai_act |
| AISI | AI Safety Institute | National bodies in the UK and US tasked with evaluating frontier AI. | /wiki/aisi |
| DSA | Digital Services Act | EU regulation of online platforms covering some AI-driven services. | /wiki/dsa |
| GDPR | General Data Protection Regulation | EU data protection law that applies to many AI training and deployment scenarios. | /wiki/gdpr |
| C2PA | Coalition for Content Provenance and Authenticity | Industry standard for cryptographically signing media provenance. | /wiki/c2pa |
| HIPAA | Health Insurance Portability and Accountability Act | US law governing protected health information. | /wiki/hipaa |
| COPPA | Children's Online Privacy Protection Act | US law restricting data collection from children under 13. | /wiki/coppa |
| CCPA | California Consumer Privacy Act | California privacy law extended by the CPRA in 2023. | /wiki/ccpa |
| CFR | Code of Federal Regulations | The codified rules of US federal agencies. | /wiki/cfr |
| OECD AI | OECD AI Principles | International principles for trustworthy AI adopted in 2019. | /wiki/oecd_ai |
| ISO/IEC 42001 | Artificial Intelligence Management System standard | International standard for AI governance systems, published in 2023. | /wiki/iso_iec_42001 |
| FERPA | Family Educational Rights and Privacy Act | US law protecting student educational records. | /wiki/ferpa |
| EO 14110 | Executive Order 14110 | The 2023 US executive order on AI, later rescinded in 2025. | /wiki/executive_order_14110 |
The smaller corner that worries about what models do internally and how they affect people.
| Abbreviation | Expansion | Description | Link |
|---|---|---|---|
| AI Safety | AI Safety | Field studying how to keep advanced AI systems beneficial and controllable. | /wiki/ai_safety |
| AI Alignment | AI Alignment | Subfield focused on getting models to pursue intended goals. | /wiki/ai_alignment |
| CAI | Constitutional AI | Anthropic's training method using a written set of principles to guide model behavior. | /wiki/constitutional_ai |
| MI | Mechanistic Interpretability | Reverse engineering neural networks at the level of circuits and features. | /wiki/mechanistic_interpretability |
| SAE | Sparse Autoencoder | Used to extract interpretable features from model activations. | /wiki/sparse_autoencoder |
| IRL | Inverse Reinforcement Learning | Recovering a reward function from observed behavior. | /wiki/irl |
| IDA | Iterated Distillation and Amplification | Alignment proposal by Paul Christiano that recursively trains helpers and distills them. | /wiki/ida |
| RSP | Responsible Scaling Policy | Anthropic-style commitments tying capability levels to safety measures. | /wiki/responsible_scaling_policy |
| ASL | AI Safety Level | Tiered capability and safety classification used by Anthropic. | /wiki/asl |
| CBRN | Chemical, Biological, Radiological, Nuclear | Risk categories used in frontier model safety evaluations. | /wiki/cbrn |
| METR | Model Evaluation and Threat Research | Nonprofit specializing in dangerous-capability evaluations. | /wiki/metr |
| Apollo | Apollo Research | AI evaluation nonprofit focused on deception and scheming. | /wiki/apollo_research |
| DAN | Do Anything Now | A long-running family of jailbreak prompts. | /wiki/dan |
A few three letter strings carry more than one common meaning inside AI:
When in doubt, the context of the surrounding paper or article almost always disambiguates.