Abbreviations

Artificial Intelligence

34 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v4 · 6,836 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

See also: Acronyms, Terms and Guides

This page is a glossary of abbreviations, acronyms, and initialisms used throughout artificial intelligence and machine learning. Entries are grouped by category and each row gives the abbreviation, its expansion, a short description, and a wikilink target where one exists on this site. The list is not exhaustive, but it covers most of the shorthand that appears in modern AI research papers, blog posts, model cards, and infrastructure documentation.

Overview

AI as a field collects acronyms at an unusual rate. Some come from the underlying mathematics (PCA, SVM, KL), some from architectures named after their authors' favorite expansions (BERT, GPT, T5), some from training tricks (LoRA, DPO, RLHF), and some from infrastructure (CUDA, HBM, FSDP). The same letters can mean different things in different contexts. LDA is Linear Discriminant Analysis in classical statistics and Latent Dirichlet Allocation in topic modeling. SAE is a Sparse Autoencoder in interpretability work and Society of Automotive Engineers elsewhere. Where ambiguity exists, this glossary lists each meaning separately.

The original short list at the top of the page is preserved below for continuity, and the longer tables follow. Sources include original paper titles, the Hugging Face model and dataset hubs, NIST and EU regulatory documents, and standard machine learning textbooks. When an abbreviation is contested or only loosely standardized, the most common expansion is given.

Preserved short list

Abbreviation	Expansion
ACC	Accuracy
ADA	AdaBoosted Decision Trees
AdaBoost	Adaptive Boosting
AdR	AdaBoostRegressor
GenAI	Generative AI

General AI and ML

The broadest umbrella terms in the field. Most of these appear in the first paragraph of any AI book or news article.

Abbreviation	Expansion	Description	Link
AI	Artificial Intelligence	The general field of building machines that perform tasks normally associated with human intelligence.	/wiki/artificial_intelligence
ML	Machine Learning	Subfield of AI focused on algorithms that learn from data rather than being explicitly programmed.	/wiki/machine_learning
DL	Deep Learning	Machine learning using deep neural networks with many layers.	/wiki/deep_learning
AGI	Artificial General Intelligence	Hypothetical AI that matches or exceeds human capability across most cognitive tasks.	/wiki/agi
ASI	Artificial Superintelligence	Hypothetical AI that surpasses the best human minds in essentially every domain.	/wiki/asi
ANI	Artificial Narrow Intelligence	AI that performs well on a single task or narrow domain, the kind we have today.	/wiki/ani
GenAI	Generative AI	Models that produce new content such as text, images, audio, or code.	/wiki/generative_ai
NLP	Natural Language Processing	Subfield concerned with computational processing of human language.	/wiki/natural_language_processing
NLG	Natural Language Generation	The task of producing fluent text from data or representations.	/wiki/natural_language_generation
NLU	Natural Language Understanding	The task of interpreting meaning from human language.	/wiki/natural_language_understanding
CV	Computer Vision	Subfield concerned with image and video understanding.	/wiki/computer_vision
RL	Reinforcement Learning	Learning by trial and error from rewards delivered by an environment.	/wiki/reinforcement_learning
RLHF	Reinforcement Learning from Human Feedback	RL where the reward signal is trained from human preference comparisons.	/wiki/rlhf
RLAIF	Reinforcement Learning from AI Feedback	Variant of RLHF where another model produces the preference labels instead of humans.	/wiki/rlaif
SL	Supervised Learning	Learning a mapping from labeled input output pairs.	/wiki/supervised_learning
USL	Unsupervised Learning	Learning structure from unlabeled data.	/wiki/unsupervised_learning
SSL	Self-Supervised Learning	Learning from labels that are derived automatically from the data itself.	/wiki/self_supervised_learning
TL	Transfer Learning	Using knowledge from one task to improve learning on another.	/wiki/transfer_learning
IL	Imitation Learning	Learning a policy by mimicking demonstrations rather than from a reward function.	/wiki/imitation_learning
MLOps	Machine Learning Operations	Practices and tooling for deploying and maintaining ML systems in production.	/wiki/mlops
LLMOps	Large Language Model Operations	MLOps specialized for the lifecycle of large language models.	/wiki/llmops
AIOps	AI for IT Operations	Use of ML to monitor, diagnose, and automate IT infrastructure.	/wiki/aiops
HCI	Human-Computer Interaction	Field studying how people interact with computing systems, increasingly relevant for AI products.	/wiki/hci
HITL	Human in the Loop	Workflows that combine automated decisions with human oversight.	/wiki/human_in_the_loop

Models and architectures

The core inventory of model families and architectural building blocks. Architectures often share letters with optimizers and benchmarks, so context matters.

Abbreviation	Expansion	Description	Link
ANN	Artificial Neural Network	A model loosely inspired by biological neurons, composed of layers of weighted units.	/wiki/neural_network
CNN	Convolutional Neural Network	Network using convolutional filters, dominant in image recognition since 2012.	/wiki/cnn
RNN	Recurrent Neural Network	Network with loops that maintain state across sequence steps.	/wiki/rnn
LSTM	Long Short-Term Memory	RNN variant with gating that can learn long range dependencies.	/wiki/lstm
GRU	Gated Recurrent Unit	Simpler gated RNN cell introduced by Cho et al. in 2014.	/wiki/gru
DNN	Deep Neural Network	A neural network with multiple hidden layers.	/wiki/dnn
MLP	Multi-Layer Perceptron	Feedforward network of fully connected layers, the workhorse architecture.	/wiki/mlp
GAN	Generative Adversarial Network	Two networks trained in opposition: a generator and a discriminator.	/wiki/gan
VAE	Variational Autoencoder	Probabilistic autoencoder that learns a continuous latent space.	/wiki/vae
AE	Autoencoder	Network that compresses and reconstructs its input.	/wiki/autoencoder
SAE	Sparse Autoencoder	Autoencoder with sparsity constraints, now widely used in mechanistic interpretability.	/wiki/sparse_autoencoder
DAE	Denoising Autoencoder	Autoencoder trained to reconstruct clean inputs from corrupted versions.	/wiki/denoising_autoencoder
BERT	Bidirectional Encoder Representations from Transformers	2018 Google encoder model that started the transformer wave in NLP.^[3]	/wiki/bert
GPT	Generative Pre-trained Transformer	OpenAI decoder model family, basis of ChatGPT.	/wiki/gpt
LLM	Large Language Model	A language model with billions of parameters trained on broad text corpora.	/wiki/large_language_model
SLM	Small Language Model	A smaller language model designed for edge devices or efficient inference.	/wiki/slm
MoE	Mixture of Experts	Architecture that routes each input through a small subset of expert subnetworks.	/wiki/mixture_of_experts
T5	Text-to-Text Transfer Transformer	2019 Google model that frames every NLP task as text in, text out.	/wiki/t5
ViT	Vision Transformer	Transformer applied directly to image patches, introduced by Google in 2020.	/wiki/vit
DiT	Diffusion Transformer	Transformer backbone for diffusion image and video models.	/wiki/dit
MM-DiT	Multimodal Diffusion Transformer	DiT variant that jointly processes text and image streams, used in Stable Diffusion 3 and Flux.	/wiki/mm_dit
VLM	Vision-Language Model	Model that takes both images and text as input.	/wiki/vlm
VLA	Vision-Language-Action	Robotics model that maps images and instructions to motor actions.	/wiki/vla
DDPM	Denoising Diffusion Probabilistic Model	The Ho et al. 2020 formulation that revived diffusion for image generation.^[4]	/wiki/ddpm
DDIM	Denoising Diffusion Implicit Model	Non-Markovian diffusion sampling that allows fewer steps and deterministic outputs.	/wiki/ddim
NeRF	Neural Radiance Field	Method that reconstructs 3D scenes by overfitting a small MLP to view-dependent radiance.	/wiki/nerf
GS	Gaussian Splatting	3D reconstruction using a cloud of explicit Gaussian primitives, faster than NeRF.	/wiki/gaussian_splatting
KAN	Kolmogorov-Arnold Network	2024 architecture that places learnable activation functions on edges rather than nodes.	/wiki/kan
HMM	Hidden Markov Model	Probabilistic model with hidden states emitting observable symbols.	/wiki/hmm
GMM	Gaussian Mixture Model	Weighted sum of Gaussian distributions, often fit with EM.	/wiki/gmm
SVM	Support Vector Machine	Maximum margin classifier, dominant before deep learning for many tasks.	/wiki/svm
SVR	Support Vector Regression	Regression variant of SVM.	/wiki/svr
RF	Random Forest	Ensemble of decision trees with feature and sample bagging.	/wiki/random_forest
GBM	Gradient Boosting Machine	Sequential tree ensemble that fits each tree to the residuals of the previous ones.	/wiki/gbm
GBDT	Gradient Boosted Decision Trees	Synonym for GBM, common in industrial pipelines.	/wiki/gbdt
XGBoost	eXtreme Gradient Boosting	Popular high performance GBDT library by Tianqi Chen.	/wiki/xgboost
LightGBM	Light Gradient Boosting Machine	Microsoft GBDT library that uses histogram binning and leaf-wise growth.	/wiki/lightgbm
CART	Classification and Regression Trees	Foundational decision tree algorithm by Breiman et al.	/wiki/cart
KNN	k-Nearest Neighbors	Non-parametric method that classifies by majority vote among nearest training points.	/wiki/knn
PCA	Principal Component Analysis	Linear dimensionality reduction that finds directions of maximum variance.	/wiki/pca
LDA	Linear Discriminant Analysis	Supervised projection that maximizes class separation. Also stands for Latent Dirichlet Allocation.	/wiki/lda
LDA	Latent Dirichlet Allocation	Probabilistic topic model by Blei et al. that finds latent topics in documents.	/wiki/latent_dirichlet_allocation
t-SNE	t-distributed Stochastic Neighbor Embedding	Nonlinear embedding for visualizing high dimensional data.	/wiki/t_sne
UMAP	Uniform Manifold Approximation and Projection	Nonlinear embedding method, faster than t-SNE and better at global structure.	/wiki/umap
NMF	Non-negative Matrix Factorization	Matrix factorization with non-negativity constraints, often used for parts based decompositions.	/wiki/nmf
RBM	Restricted Boltzmann Machine	Bipartite generative network, historically important for pretraining.	/wiki/rbm
GNN	Graph Neural Network	Network that operates on graph structured data via message passing.	/wiki/gnn
GCN	Graph Convolutional Network	A specific GNN variant by Kipf and Welling.	/wiki/gcn
GAT	Graph Attention Network	GNN variant that uses attention weights between neighbors.	/wiki/gat
SSM	State Space Model	Sequence model based on linear dynamical systems, basis of S4 and Mamba.	/wiki/ssm

Training and optimization

Methods, optimizers, and parallelism strategies used to train modern models.

Abbreviation	Expansion	Description	Link
SGD	Stochastic Gradient Descent	Workhorse optimizer that updates parameters using gradients on minibatches.	/wiki/sgd
Adam	Adaptive Moment Estimation	Optimizer that combines momentum with per-parameter adaptive learning rates.	/wiki/adam
AdamW	Adam with decoupled Weight decay	Modified Adam by Loshchilov and Hutter that decouples weight decay from gradient updates.	/wiki/adamw
LR	Learning Rate	The step size scalar applied to gradient updates.	/wiki/learning_rate
LRS	Learning Rate Scheduler	A schedule that varies the learning rate during training.	/wiki/learning_rate_scheduler
BN	Batch Normalization	Normalizes activations across the minibatch dimension.	/wiki/batch_normalization
LN	Layer Normalization	Normalizes activations across the feature dimension within a single sample, used in transformers.	/wiki/layer_normalization
GN	Group Normalization	Normalizes across groups of channels, useful for small batches.	/wiki/group_normalization
RMSNorm	Root Mean Square Normalization	Layer norm variant without mean centering, used in LLaMA and many other LLMs.	/wiki/rmsnorm
BP	Backpropagation	Algorithm that computes gradients of a loss with respect to network parameters.	/wiki/backpropagation
BPTT	Backpropagation Through Time	Backpropagation applied to unrolled recurrent networks.	/wiki/bptt
AD	Automatic Differentiation	The general technique of computing exact derivatives from a numerical program.	/wiki/automatic_differentiation
AMP	Automatic Mixed Precision	Training framework that uses lower precision where safe to speed up training.	/wiki/amp
FP32	32-bit Floating Point	IEEE 754 single precision, the historical default for ML training.	/wiki/fp32
FP16	16-bit Floating Point	IEEE 754 half precision, smaller exponent range than BF16.	/wiki/fp16
BF16	Brain Floating Point 16	Google's 16-bit format with 8 exponent bits, matching FP32 dynamic range.	/wiki/bf16
FP8	8-bit Floating Point	New low precision format with E4M3 and E5M2 variants on Hopper and Blackwell GPUs.	/wiki/fp8
DDP	Distributed Data Parallel	Replicates the model on each device and synchronizes gradients across them.	/wiki/ddp
FSDP	Fully Sharded Data Parallel	PyTorch parallelism that shards parameters, gradients, and optimizer state across devices.	/wiki/fsdp
ZeRO	Zero Redundancy Optimizer	DeepSpeed memory optimization that partitions optimizer state, gradients, and parameters.	/wiki/zero
TP	Tensor Parallelism	Splits individual tensors across devices, often used inside a node.	/wiki/tensor_parallelism
PP	Pipeline Parallelism	Splits the model into stages, each placed on different devices.	/wiki/pipeline_parallelism
DP	Data Parallelism	Splits the batch across devices that each hold a full model copy.	/wiki/data_parallelism
EP	Expert Parallelism	Distributes experts of a MoE model across devices.	/wiki/expert_parallelism
PEFT	Parameter-Efficient Fine-Tuning	Family of fine-tuning methods that update only a small fraction of parameters.	/wiki/peft
LoRA	Low-Rank Adaptation	PEFT method that injects trainable low-rank matrices into frozen weights.^[5]	/wiki/lora
QLoRA	Quantized LoRA	LoRA on top of a 4-bit quantized base model, enabling fine-tuning on a single GPU.	/wiki/qlora
DoRA	Weight-Decomposed Low-Rank Adaptation	LoRA variant that decomposes weights into magnitude and direction.	/wiki/dora
SFT	Supervised Fine-Tuning	Fine-tuning a pretrained model on labeled prompt-response pairs.	/wiki/sft
DPO	Direct Preference Optimization	Preference learning method that skips the explicit reward model used in RLHF.^[6]	/wiki/dpo
IPO	Identity Preference Optimization	DPO variant that does not assume the Bradley-Terry preference model.	/wiki/ipo
KTO	Kahneman-Tversky Optimization	Preference method using prospect theory style loss on binary feedback.	/wiki/kto
ORPO	Odds Ratio Preference Optimization	Combines SFT and preference optimization in a single loss.	/wiki/orpo
GRPO	Group Relative Policy Optimization	DeepSeek RL method that scores groups of samples relative to each other.	/wiki/grpo
PPO	Proximal Policy Optimization	Standard RL algorithm by Schulman et al., long used in RLHF.^[15]	/wiki/ppo
TRPO	Trust Region Policy Optimization	Predecessor to PPO with explicit trust region constraint.	/wiki/trpo
SAC	Soft Actor-Critic	Off-policy continuous-control RL method with entropy regularization.	/wiki/sac
DDPG	Deep Deterministic Policy Gradient	Off-policy actor-critic for continuous action spaces.	/wiki/ddpg
A2C	Advantage Actor-Critic	Synchronous actor-critic method using an advantage estimate.	/wiki/a2c
A3C	Asynchronous Advantage Actor-Critic	The asynchronous variant from DeepMind.	/wiki/a3c
EMA	Exponential Moving Average	Average of weights over training steps, often used for inference stability.	/wiki/ema
EM	Expectation-Maximization	Algorithm for fitting latent variable models such as GMMs.	/wiki/em

Metrics and evaluation

Numbers people report in tables. Many overlap across NLP, CV, and speech.

Abbreviation	Expansion	Description	Link
ACC	Accuracy	Fraction of predictions that are correct.	/wiki/accuracy
F1	F1 score	Harmonic mean of precision and recall.	/wiki/f1_score
AUC	Area Under the Curve	Usually the area under a ROC or PR curve, summarizing ranking quality.	/wiki/auc
ROC	Receiver Operating Characteristic	Plot of true positive rate against false positive rate across thresholds.	/wiki/roc
PR	Precision-Recall	The curve and its summary metrics for imbalanced classification.	/wiki/precision_recall
MAP	Mean Average Precision	Average of per-query average precision, standard in retrieval and detection.	/wiki/map_metric
IoU	Intersection over Union	Overlap measure between predicted and ground truth regions, used in detection and segmentation.	/wiki/iou
BLEU	Bilingual Evaluation Understudy	N-gram overlap metric for machine translation, from Papineni et al. 2002.	/wiki/bleu
ROUGE	Recall-Oriented Understudy for Gisting Evaluation	N-gram overlap metric for summarization.	/wiki/rouge
METEOR	Metric for Evaluation of Translation with Explicit ORdering	Translation metric that considers stems and synonyms.	/wiki/meteor
CIDEr	Consensus-based Image Description Evaluation	TF-IDF weighted n-gram metric for image captioning.	/wiki/cider
WER	Word Error Rate	Edit distance metric for ASR systems, measured in words.	/wiki/wer
CER	Character Error Rate	Edit distance metric measured at the character level.	/wiki/cer
MOS	Mean Opinion Score	Subjective audio quality rating averaged over listeners.	/wiki/mos
PPL	Perplexity	Exponential of the average negative log likelihood, standard language model metric.	/wiki/perplexity
ECE	Expected Calibration Error	Measures how well a model's predicted probabilities match observed frequencies.	/wiki/ece
KL	Kullback-Leibler divergence	Information theoretic measure of how one distribution differs from another.	/wiki/kl_divergence
JSD	Jensen-Shannon Divergence	Symmetric version of KL divergence.	/wiki/jsd
MSE	Mean Squared Error	Average squared difference between predictions and targets.	/wiki/mse
MAE	Mean Absolute Error	Average absolute difference between predictions and targets.	/wiki/mae
RMSE	Root Mean Squared Error	Square root of MSE, in the same units as the target.	/wiki/rmse
MAPE	Mean Absolute Percentage Error	Average of absolute errors expressed as a percentage of the true value.	/wiki/mape
NDCG	Normalized Discounted Cumulative Gain	Ranking quality measure weighted by position.	/wiki/ndcg
TPR	True Positive Rate	Sensitivity or recall, fraction of positives correctly identified.	/wiki/tpr
FPR	False Positive Rate	Fraction of negatives incorrectly flagged as positive.	/wiki/fpr
EER	Equal Error Rate	Operating point where FPR equals FNR, common in biometrics.	/wiki/eer
Pass@k	Pass at k	Fraction of problems solved by at least one of k samples, standard for code benchmarks.	/wiki/pass_at_k
Elo	Elo rating	Pairwise rating system, used by LMSYS Chatbot Arena.	/wiki/elo

Tasks and benchmarks

The shorthand for problems researchers try to solve and the test sets they use to compare progress.

Abbreviation	Expansion	Description	Link
VQA	Visual Question Answering	Answering natural language questions about images.	/wiki/vqa
ASR	Automatic Speech Recognition	Converting speech audio to text.	/wiki/asr
TTS	Text-to-Speech	Synthesizing speech audio from text.	/wiki/tts
VAD	Voice Activity Detection	Detecting when speech is present in an audio stream.	/wiki/vad
SED	Sound Event Detection	Detecting and labeling sound events in audio.	/wiki/sed
T2I	Text-to-Image	Generating images from a text prompt.	/wiki/t2i
I2T	Image-to-Text	Generating text from an image, such as captioning or OCR.	/wiki/i2t
T2V	Text-to-Video	Generating video clips from a text prompt.	/wiki/t2v
I2V	Image-to-Video	Animating a still image into video.	/wiki/i2v
T2A	Text-to-Audio	Generating audio or music from a text prompt.	/wiki/t2a
T23D	Text-to-3D	Generating 3D models from a text prompt.	/wiki/t23d
OCR	Optical Character Recognition	Reading text from images of documents or signs.	/wiki/ocr
OD	Object Detection	Localizing and classifying objects in images.	/wiki/object_detection
SS	Semantic Segmentation	Pixel level classification of an image.	/wiki/semantic_segmentation
IS	Instance Segmentation	Object detection plus per-instance pixel masks.	/wiki/instance_segmentation
SQuAD	Stanford Question Answering Dataset	Reading comprehension benchmark by Rajpurkar et al.	/wiki/squad
MMLU	Massive Multitask Language Understanding	57-subject knowledge benchmark by Hendrycks et al.^[1]	/wiki/mmlu
MMLU-Pro	MMLU Professional	Harder MMLU successor with 10 answer choices and reasoning emphasis.	/wiki/mmlu_pro
MMMU	Massive Multi-discipline Multimodal Understanding	College-level multimodal benchmark across many subjects.	/wiki/mmmu
HumanEval	HumanEval	OpenAI's 164 hand-written Python coding problems.	/wiki/humaneval
MBPP	Mostly Basic Python Problems	Google's Python programming benchmark of about 1,000 problems.	/wiki/mbpp
APPS	Automated Programming Progress Standard	Competitive programming benchmark from Hendrycks et al.	/wiki/apps
SWE-bench	Software Engineering benchmark	Real GitHub issues paired with patches, evaluating LLM coding agents.	/wiki/swe_bench
RepoBench	RepoBench	Repository level code completion benchmark.	/wiki/repobench
GSM8K	Grade School Math 8K	8,500 grade-school word problems by OpenAI.	/wiki/gsm8k
MATH	Mathematics benchmark	12,500 competition math problems by Hendrycks et al.	/wiki/math_benchmark
AIME	American Invitational Mathematics Examination	High school competition used as an AI reasoning benchmark.	/wiki/aime
GPQA	Graduate-Level Google-Proof Q&A	Hard science benchmark designed to resist web search.	/wiki/gpqa
HLE	Humanity's Last Exam	Cross-disciplinary expert-level benchmark by Center for AI Safety and Scale AI.	/wiki/hle
ARC	AI2 Reasoning Challenge	Allen Institute multiple-choice science exam benchmark.	/wiki/arc
ARC-AGI	Abstraction and Reasoning Corpus	Francois Chollet's visual reasoning benchmark, distinct from AI2 ARC.^[14]	/wiki/arc_agi
HellaSwag	HellaSwag	Sentence completion benchmark targeting commonsense inference.	/wiki/hellaswag
WinoGrande	WinoGrande	Large scale Winograd schema benchmark for commonsense reasoning.	/wiki/winogrande
TruthfulQA	TruthfulQA	Benchmark measuring how often models repeat human misconceptions.	/wiki/truthfulqa
BIG-bench	Beyond the Imitation Game benchmark	Crowdsourced suite of 200+ tasks for language models.	/wiki/big_bench
AGIEval	AGIEval	Benchmark of human standardized exams across disciplines.	/wiki/agieval

Infrastructure and hardware

The chips, interconnects, and runtime formats that run the models.

Abbreviation	Expansion	Description	Link
CPU	Central Processing Unit	General purpose processor, still used for data preparation and serving small models.	/wiki/cpu
GPU	Graphics Processing Unit	Parallel processor originally for graphics, dominant for ML training.	/wiki/gpu
TPU	Tensor Processing Unit	Google's custom AI accelerator.	/wiki/tpu
NPU	Neural Processing Unit	Generic term for accelerators on phones and laptops dedicated to ML.	/wiki/npu
LPU	Language Processing Unit	Groq's branding for its inference chip.	/wiki/lpu
IPU	Intelligence Processing Unit	Graphcore's AI accelerator.	/wiki/ipu
DPU	Data Processing Unit	NVIDIA's term for SmartNIC class processors for networking and storage.	/wiki/dpu
VRAM	Video RAM	High speed memory on a GPU, the binding constraint for large model training.	/wiki/vram
HBM	High Bandwidth Memory	Stacked DRAM technology used on data center GPUs and accelerators.	/wiki/hbm
SRAM	Static RAM	On-chip memory with low latency, used for caches and TPU systolic arrays.	/wiki/sram
DRAM	Dynamic RAM	Standard system memory technology.	/wiki/dram
ASIC	Application-Specific Integrated Circuit	Custom silicon, of which TPUs are an example.	/wiki/asic
FPGA	Field-Programmable Gate Array	Reconfigurable hardware used for some inference deployments.	/wiki/fpga
CUDA	Compute Unified Device Architecture	NVIDIA's parallel computing platform and programming model.	/wiki/cuda
ROCm	Radeon Open Compute	AMD's open software stack for GPU compute.	/wiki/rocm
NVLink	NVIDIA NVLink	NVIDIA's high bandwidth GPU to GPU interconnect.	/wiki/nvlink
NVSwitch	NVIDIA NVSwitch	Switch fabric that connects many NVLink-capable GPUs in a node.	/wiki/nvswitch
PCIe	Peripheral Component Interconnect Express	Standard host to device interconnect, used for most GPUs.	/wiki/pcie
IB	InfiniBand	Low-latency interconnect used in HPC and AI clusters.	/wiki/infiniband
RDMA	Remote Direct Memory Access	Network feature allowing direct memory access between hosts, used in IB and RoCE.	/wiki/rdma
ONNX	Open Neural Network Exchange	Cross framework model format.	/wiki/onnx
MIG	Multi-Instance GPU	NVIDIA feature that partitions a single GPU into isolated instances.	/wiki/mig
SM	Streaming Multiprocessor	The basic compute unit inside an NVIDIA GPU.	/wiki/sm
TFLOPS	Tera Floating Point Operations Per Second	Common unit of compute throughput.	/wiki/tflops
MFU	Model FLOPs Utilization	Achieved throughput as a fraction of theoretical peak.	/wiki/mfu

Frameworks and libraries

The software people actually type into their imports.

Abbreviation	Expansion	Description	Link
TF	TensorFlow	Google's machine learning framework, originally graph based.	/wiki/tensorflow
PT	PyTorch	Meta's eager-mode deep learning framework, now the research standard.	/wiki/pytorch
JAX	JAX	Google's NumPy-compatible framework with composable transformations.	/wiki/jax
HF	Hugging Face	Platform and library suite hosting models, datasets, and the Transformers library.^[12]	/wiki/hugging_face
DL4J	Deep Learning for Java	JVM-based deep learning library from Skymind.	/wiki/dl4j
TRT	TensorRT	NVIDIA's inference optimizer and runtime.	/wiki/tensorrt
TFLite	TensorFlow Lite	Mobile and edge runtime for TensorFlow models.	/wiki/tflite
MLX	MLX	Apple's array framework for Apple silicon.	/wiki/mlx
vLLM	vLLM	High throughput LLM inference engine with PagedAttention.	/wiki/vllm
TGI	Text Generation Inference	Hugging Face's production LLM serving framework.	/wiki/tgi
SGLang	SGLang	Structured generation language and runtime for LLM serving.	/wiki/sglang
DSPy	DSPy	Stanford framework for programming, not prompting, language models.	/wiki/dspy

Tools and methods

Techniques and abstractions used in research, agents, and applied systems.

Abbreviation	Expansion	Description	Link
MCP	Model Context Protocol	Anthropic's open protocol for connecting AI assistants to tools and data.	/wiki/model_context_protocol
RAG	Retrieval-Augmented Generation	Pattern of retrieving documents and conditioning generation on them.	/wiki/rag
CRAG	Corrective Retrieval-Augmented Generation	RAG variant that grades retrieved passages and rewrites queries when they look weak.	/wiki/crag
HyDE	Hypothetical Document Embeddings	Retrieval method that embeds a hypothesized answer instead of the query.	/wiki/hyde
ReAct	Reasoning and Acting	Agent prompting pattern from Yao et al. 2022 that interleaves thought and action.^[7]	/wiki/react
CoT	Chain of Thought	Eliciting step by step reasoning in the output.^[8]	/wiki/chain_of_thought
ToT	Tree of Thoughts	Search over multiple reasoning paths rather than a single chain.	/wiki/tree_of_thoughts
GoT	Graph of Thoughts	Reasoning structure where thoughts form a directed graph.	/wiki/graph_of_thoughts
ICL	In-Context Learning	Performing a new task purely from examples in the prompt, without parameter updates.	/wiki/in_context_learning
FSL	Few-Shot Learning	Learning from a small number of labeled examples.	/wiki/few_shot_learning
ZSL	Zero-Shot Learning	Performing a task without task-specific labeled training data.	/wiki/zero_shot_learning
KD	Knowledge Distillation	Training a smaller student model to mimic a larger teacher.	/wiki/knowledge_distillation
NAS	Neural Architecture Search	Automatically searching for good network architectures.	/wiki/nas
AutoML	Automated Machine Learning	Automating the data, feature, model, and tuning pipeline.	/wiki/automl
HPO	Hyperparameter Optimization	Searching over training hyperparameters like LR or batch size.	/wiki/hpo
CFG	Classifier-Free Guidance	Diffusion sampling trick that interpolates between conditional and unconditional predictions.	/wiki/cfg
SDE	Stochastic Differential Equation	Continuous-time formulation of diffusion processes.	/wiki/sde
ODE	Ordinary Differential Equation	Deterministic formulation used in flow matching and DDIM-style samplers.	/wiki/ode
MCTS	Monte Carlo Tree Search	Search method underlying AlphaGo and many planning systems.	/wiki/mcts
BoN	Best-of-N sampling	Sampling N candidates and picking the highest scoring one.	/wiki/best_of_n
PTQ	Post-Training Quantization	Quantizing a trained model without retraining.	/wiki/ptq
QAT	Quantization-Aware Training	Training that simulates quantization in the forward pass.	/wiki/qat
GGUF	GGUF file format	Quantized model file format used by llama.cpp.	/wiki/gguf
GPTQ	GPTQ	Post-training quantization method targeting LLMs.	/wiki/gptq
AWQ	Activation-aware Weight Quantization	LLM quantization method that protects salient weights.	/wiki/awq

Data and modalities

Things you find in datasets and feature pipelines.

Abbreviation	Expansion	Description	Link
i.i.d.	Independently and Identically Distributed	Standard statistical assumption that samples are drawn independently from the same distribution.	/wiki/iid
OOD	Out of Distribution	Inputs drawn from a different distribution than the training data.	/wiki/ood
OOV	Out of Vocabulary	Tokens not seen in the model's vocabulary.	/wiki/oov
TF-IDF	Term Frequency Inverse Document Frequency	Classical weighting scheme for sparse text features.	/wiki/tf_idf
BoW	Bag of Words	Representation of text as unordered token counts.	/wiki/bow
NER	Named Entity Recognition	Tagging spans like people, places, and organizations.	/wiki/ner
POS	Part of Speech tagging	Assigning grammatical categories to words.	/wiki/pos_tagging
SRL	Semantic Role Labeling	Tagging predicate argument structure in sentences.	/wiki/srl
WSD	Word Sense Disambiguation	Selecting the intended sense of a polysemous word in context.	/wiki/wsd
IR	Information Retrieval	Finding relevant documents for a query.	/wiki/information_retrieval
QA	Question Answering	Returning an answer rather than a list of documents.	/wiki/question_answering
MT	Machine Translation	Automatic translation between languages.	/wiki/machine_translation
SMT	Statistical Machine Translation	Pre-neural translation using phrase tables and language models.	/wiki/smt
NMT	Neural Machine Translation	Translation using neural sequence to sequence models.	/wiki/nmt
BPE	Byte-Pair Encoding	Subword tokenization algorithm widely used in LLMs.	/wiki/bpe
SPM	SentencePiece	Subword tokenization toolkit by Google.	/wiki/sentencepiece
WP	WordPiece	Subword tokenizer used in BERT.	/wiki/wordpiece
EDA	Exploratory Data Analysis	Initial summarization and visualization of a dataset.	/wiki/eda
ETL	Extract, Transform, Load	Standard data pipeline pattern.	/wiki/etl
EHR	Electronic Health Record	Patient records used in clinical ML.	/wiki/ehr

Companies, labs, and organizations

A partial roll of the institutions whose names appear most often in papers and news. Many of these have their own articles on this wiki.

Abbreviation	Expansion	Description	Link
OpenAI	OpenAI	AI lab founded in 2015, maker of GPT and ChatGPT.	/wiki/openai
Anthropic	Anthropic	AI safety company founded in 2021, maker of Claude.	/wiki/anthropic
DeepMind	Google DeepMind	London-based lab founded in 2010, acquired by Google in 2014.	/wiki/deepmind
FAIR	Fundamental AI Research	Meta's AI research lab, formerly Facebook AI Research.	/wiki/fair
MSR	Microsoft Research	Microsoft's corporate research arm.	/wiki/msr
GR	Google Research	Google's research division, separate from DeepMind.	/wiki/google_research
MIT	Massachusetts Institute of Technology	University with major AI groups including CSAIL.	/wiki/mit
CMU	Carnegie Mellon University	University with a large AI and ML faculty.	/wiki/cmu
BAIR	Berkeley AI Research	UC Berkeley's AI research lab.	/wiki/bair
AI2	Allen Institute for AI	Seattle research institute founded by Paul Allen.	/wiki/ai2
MILA	Montreal Institute for Learning Algorithms	Quebec research institute affiliated with Université de Montréal.	/wiki/mila
CHAI	Center for Human-Compatible AI	Berkeley center founded by Stuart Russell.	/wiki/chai
MIRI	Machine Intelligence Research Institute	Berkeley nonprofit focused on AI alignment theory.	/wiki/miri
HAI	Stanford Institute for Human-Centered AI	Stanford institute studying AI and its societal impact.	/wiki/stanford_hai
CSAIL	Computer Science and Artificial Intelligence Laboratory	MIT's main computing research lab.	/wiki/csail
NeurIPS	Conference on Neural Information Processing Systems	Premier ML conference, held annually.	/wiki/neurips
ICML	International Conference on Machine Learning	Major annual ML conference.	/wiki/icml
ICLR	International Conference on Learning Representations	Open-review ML conference focused on representation learning.	/wiki/iclr
CVPR	Conference on Computer Vision and Pattern Recognition	Top computer vision conference.	/wiki/cvpr
ECCV	European Conference on Computer Vision	Biennial computer vision conference.	/wiki/eccv
ACL	Association for Computational Linguistics	Major NLP conference and society.	/wiki/acl
EMNLP	Empirical Methods in Natural Language Processing	Major NLP conference.	/wiki/emnlp
AAAI	Association for the Advancement of Artificial Intelligence	Long-running AI conference and society.	/wiki/aaai
IJCAI	International Joint Conference on Artificial Intelligence	Long-running AI conference.	/wiki/ijcai

Standards and governance

Laws, frameworks, and standards bodies that increasingly shape what models can do and where.

Abbreviation	Expansion	Description	Link
NIST AI RMF	NIST AI Risk Management Framework	Voluntary US framework for managing AI risks, first released January 2023.^[9]	/wiki/nist_ai_rmf
EU AI Act	European Union AI Act	EU regulation classifying AI systems by risk, agreed in 2024.^[10]	/wiki/eu_ai_act
AISI	AI Safety Institute	National bodies in the UK and US tasked with evaluating frontier AI.	/wiki/aisi
DSA	Digital Services Act	EU regulation of online platforms covering some AI-driven services.	/wiki/dsa
GDPR	General Data Protection Regulation	EU data protection law that applies to many AI training and deployment scenarios.	/wiki/gdpr
C2PA	Coalition for Content Provenance and Authenticity	Industry standard for cryptographically signing media provenance.	/wiki/c2pa
HIPAA	Health Insurance Portability and Accountability Act	US law governing protected health information.	/wiki/hipaa
COPPA	Children's Online Privacy Protection Act	US law restricting data collection from children under 13.	/wiki/coppa
CCPA	California Consumer Privacy Act	California privacy law extended by the CPRA in 2023.	/wiki/ccpa
CFR	Code of Federal Regulations	The codified rules of US federal agencies.	/wiki/cfr
OECD AI	OECD AI Principles	International principles for trustworthy AI adopted in 2019.	/wiki/oecd_ai
ISO/IEC 42001	Artificial Intelligence Management System standard	International standard for AI governance systems, published in 2023.^[11]	/wiki/iso_iec_42001
FERPA	Family Educational Rights and Privacy Act	US law protecting student educational records.	/wiki/ferpa
EO 14110	Executive Order 14110	The 2023 US executive order on AI, later rescinded in 2025.	/wiki/executive_order_14110

Safety and interpretability

The smaller corner that worries about what models do internally and how they affect people.

Abbreviation	Expansion	Description	Link
AI Safety	AI Safety	Field studying how to keep advanced AI systems beneficial and controllable.	/wiki/ai_safety
AI Alignment	AI Alignment	Subfield focused on getting models to pursue intended goals.	/wiki/ai_alignment
CAI	Constitutional AI	Anthropic's training method using a written set of principles to guide model behavior.^[13]	/wiki/constitutional_ai
MI	Mechanistic Interpretability	Reverse engineering neural networks at the level of circuits and features.	/wiki/mechanistic_interpretability
SAE	Sparse Autoencoder	Used to extract interpretable features from model activations.	/wiki/sparse_autoencoder
IRL	Inverse Reinforcement Learning	Recovering a reward function from observed behavior.	/wiki/irl
IDA	Iterated Distillation and Amplification	Alignment proposal by Paul Christiano that recursively trains helpers and distills them.	/wiki/ida
RSP	Responsible Scaling Policy	Anthropic-style commitments tying capability levels to safety measures.	/wiki/responsible_scaling_policy
ASL	AI Safety Level	Tiered capability and safety classification used by Anthropic.	/wiki/asl
CBRN	Chemical, Biological, Radiological, Nuclear	Risk categories used in frontier model safety evaluations.	/wiki/cbrn
METR	Model Evaluation and Threat Research	Nonprofit specializing in dangerous-capability evaluations.	/wiki/metr
Apollo	Apollo Research	AI evaluation nonprofit focused on deception and scheming.	/wiki/apollo_research
DAN	Do Anything Now	A long-running family of jailbreak prompts.	/wiki/dan

Notes on ambiguous acronyms

A few three letter strings carry more than one common meaning inside AI:

LDA can mean Linear Discriminant Analysis (a classifier) or Latent Dirichlet Allocation (a topic model). Both are listed above.
SAE can mean Sparse Autoencoder (interpretability) or Society of Automotive Engineers (autonomy levels). Only the AI sense is listed here.
CAI can mean Constitutional AI (Anthropic) or Conversational AI as a marketing term.
DPO can mean Direct Preference Optimization (training) or Data Protection Officer (governance).
IPO can mean Identity Preference Optimization (training) or Initial Public Offering (finance) when AI companies go public.
GRPO can mean Group Relative Policy Optimization (DeepSeek) or unrelated terms in older RL literature.
ARC can mean AI2 Reasoning Challenge or the Abstraction and Reasoning Corpus from Francois Chollet, which is also confusingly extended as ARC-AGI.^[14]
AGI can refer to Artificial General Intelligence (the goal) or Adjusted Gross Income (US tax), but in AI contexts the former is almost always meant.
T5 is a model name, not a chemistry abbreviation, despite the letter T appearing in many machine learning contexts.

When in doubt, the context of the surrounding paper or article almost always disambiguates.

References

Hendrycks et al., "Measuring Massive Multitask Language Understanding," ICLR 2021. https://arxiv.org/abs/2009.03300 ↩
Vaswani et al., "Attention Is All You Need," NeurIPS 2017. https://arxiv.org/abs/1706.03762
Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," NAACL 2019. https://arxiv.org/abs/1810.04805 ↩
Ho, Jain, Abbeel, "Denoising Diffusion Probabilistic Models," NeurIPS 2020. https://arxiv.org/abs/2006.11239 ↩
Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," ICLR 2022. https://arxiv.org/abs/2106.09685 ↩
Rafailov et al., "Direct Preference Optimization," NeurIPS 2023. https://arxiv.org/abs/2305.18290 ↩
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR 2023. https://arxiv.org/abs/2210.03629 ↩
Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS 2022. https://arxiv.org/abs/2201.11903 ↩
NIST, "AI Risk Management Framework (AI RMF 1.0)," January 2023. https://www.nist.gov/itl/ai-risk-management-framework ↩
European Union, "Artificial Intelligence Act," Regulation (EU) 2024/1689. ↩
ISO/IEC 42001:2023, "Information technology, Artificial intelligence, Management system." ↩
Hugging Face Transformers documentation. https://huggingface.co/docs/transformers ↩
Bai et al., "Constitutional AI: Harmlessness from AI Feedback," Anthropic 2022. https://arxiv.org/abs/2212.08073 ↩
Chollet, "On the Measure of Intelligence," 2019. https://arxiv.org/abs/1911.01547 ↩
Schulman et al., "Proximal Policy Optimization Algorithms," 2017. https://arxiv.org/abs/1707.06347 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · full history

Suggest edit

What links here

Acronyms Guides

Overview

Preserved short list

General AI and ML

Models and architectures

Training and optimization

Metrics and evaluation

Tasks and benchmarks

Infrastructure and hardware

Frameworks and libraries

Tools and methods

Data and modalities

Companies, labs, and organizations

Standards and governance

Safety and interpretability

Notes on ambiguous acronyms

See also

References

Improve this article

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here