AI Wiki
Category

Model Architecture

62 articles

ALiBi (Attention with Linear Biases)

Transformer Models

Albert Gu

People

AutoML (Automated Machine Learning)

Developer Tools, MLOps, Training & Optimization

Bahdanau attention

Deep Learning, Natural Language Processing

Bidirectional

Neural Networks

BitNet

Large Language Models, Microsoft

BitNet b1.58

Large Language Models, Microsoft

Byte Latent Transformer

Large Language Models, Meta AI

Cross-attention

Transformer Models

Depthwise Separable CNN

Computer Vision, Machine Learning

Depthwise separable convolutional neural network (sepCNN)

Computer Vision, Neural Networks

Differential Transformer

Microsoft, Transformer Models

Encoder

Deep Learning

Feature Pyramid Network (FPN)

Computer Vision

Graph Machine Learning Models

AI Models, Machine Learning

Hidden Markov Model

Machine Learning

Hyena

Deep Learning

Infini-Attention

Google, Transformer Models

Jamba

AI Companies, Large Language Models, Mixture of Experts

Jamba2

AI Models, Large Language Models, Mixture of Experts

Joint Embedding Predictive Architecture

Machine Learning, Meta AI

LSTM

Deep Learning, Neural Networks

Large Concept Model

Meta AI, Natural Language Processing

Layer normalization

Deep Learning

Linear Attention

Transformer Models

Liquid AI

AI Companies, AI Models

Long Short-Term Memory (LSTM)

Deep Learning, Machine Learning, Neural Networks

Long-context language models

Large Language Models

LongNet

Microsoft, Transformer Models

LongRoPE

Large Language Models, Microsoft

MEGABYTE

Meta AI, Transformer Models

MMDiT (Multimodal Diffusion Transformer)

Diffusion Models, Image Generation

Machine learning terms/Sequence Models

Machine Learning

Mamba

Large Language Models

Mamba 2

AI Models, Deep Learning

Mamba-3

Deep Learning

Mixture of Depths

Deep Learning, Machine Learning, Transformer Models

Multi-Head Self-Attention

Deep Learning, Machine Learning, Neural Networks

Multi-Query Attention (MQA)

Transformer Models

Multi-head Latent Attention

Deep Learning, Machine Learning, Neural Networks

Node (neural network)

Neural Networks

PagedAttention

AI Inference, AI Infrastructure

Perceiver

Google DeepMind, Neural Networks

RMSNorm

Artificial Intelligence, Transformer Models

RWKV

Deep Learning, Machine Learning, Neural Networks

RadixAttention

AI Inference, AI Infrastructure

Recurrent Neural Network

Deep Learning, Machine Learning, Neural Networks

Rotary Position Embedding

Deep Learning, Large Language Models, Transformer Models

Rotary Position Embedding (RoPE)

Deep Learning, Large Language Models, Transformer Models

Self-attention

Deep Learning, Machine Learning, Neural Networks

Sliding window attention

Transformer Models

Sparse attention

Deep Learning, Machine Learning, Transformer Models

SubQ

AI Companies, Large Language Models

SwiGLU

Deep Learning, Neural Networks

Titans (neural architecture)

Google, Neural Networks

Tower

Information Retrieval, Machine Learning

Transformers

Deep Learning, Neural Networks

Unidirectional

Vision Transformer

Computer Vision

YOCO (You Only Cache Once)

Large Language Models, Microsoft