Transformer models
46 articles
ALBERT
Deep Learning, Natural Language Processing
ALiBi (Attention with Linear Biases)
Model Architecture
Action Chunking with Transformers (ACT)
Machine Learning, Robotics
Aidan Gomez
AI Companies, People
Ashish Vaswani
People
BERT
Large Language Models
BioBERT
Healthcare AI, Large Language Models
Cross-attention
Model Architecture
DETR
Computer Vision, Deep Learning
DeBERTa
Deep Learning, Microsoft, Natural Language Processing
DeiT
Computer Vision, Deep Learning
Differential Transformer
Microsoft, Model Architecture
Diffusion Transformer (DiT)
Diffusion Models, Generative AI, Image Generation
DistilBERT
AI Models, Deep Learning, Natural Language Processing
ELECTRA
Deep Learning, Natural Language Processing
Flash Attention 3
AI Hardware, Algorithms
Generative pre-trained transformer
Large Language Models, OpenAI
Grouped-Query Attention
Deep Learning, Machine Learning
Hiera
Computer Vision, Meta AI
Induction Heads
Interpretability
Infini-Attention
Google, Model Architecture
KV Cache
AI Inference, Deep Learning, Machine Learning
Linear Attention
Model Architecture
Logit lens
Interpretability
LongNet
Microsoft, Model Architecture
Longformer
Large Language Models, Natural Language Processing
MEGABYTE
Meta AI, Model Architecture
Masked autoencoder (MAE)
Computer Vision, Machine Learning, Training & Optimization
Mixture of Depths
Deep Learning, Machine Learning, Model Architecture
Multi-Head Self-Attention
Deep Learning, Machine Learning, Model Architecture
Multi-Query Attention (MQA)
Model Architecture
Multi-head Latent Attention
Deep Learning, Machine Learning, Model Architecture
PaLM
Google DeepMind, Large Language Models, Natural Language Processing
Positional encoding
Deep Learning, Natural Language Processing
RMSNorm
Artificial Intelligence, Model Architecture
Ring Attention
Training & Optimization
RoBERTa
Deep Learning, Machine Learning, Natural Language Processing
Rotary Position Embedding
Deep Learning, Large Language Models, Model Architecture
Rotary Position Embedding (RoPE)
Deep Learning, Large Language Models, Model Architecture
Self-attention
Deep Learning, Machine Learning, Model Architecture
Sliding window attention
Model Architecture
Sparse attention
Deep Learning, Machine Learning, Model Architecture
Swin Transformer
Computer Vision, Deep Learning, Neural Networks
Switch Transformer
Google, Large Language Models, Mixture of Experts
T5 (language model)
Large Language Models
XLNet
Deep Learning, Machine Learning, Natural Language Processing