Model Architecture
62 articles
ALiBi (Attention with Linear Biases)
Transformer Models
Albert Gu
People
AutoML (Automated Machine Learning)
Developer Tools, MLOps, Training & Optimization
Bahdanau attention
Deep Learning, Natural Language Processing
Bidirectional
Neural Networks
BitNet
Large Language Models, Microsoft
BitNet b1.58
Large Language Models, Microsoft
Byte Latent Transformer
Large Language Models, Meta AI
Cross-attention
Transformer Models
Depthwise Separable CNN
Computer Vision, Machine Learning
Depthwise separable convolutional neural network (sepCNN)
Computer Vision, Neural Networks
Differential Transformer
Microsoft, Transformer Models
Encoder
Deep Learning
Feature Pyramid Network (FPN)
Computer Vision
Graph Machine Learning Models
AI Models, Machine Learning
Hidden Markov Model
Machine Learning
Hyena
Deep Learning
Infini-Attention
Google, Transformer Models
Jamba
AI Companies, Large Language Models, Mixture of Experts
Jamba2
AI Models, Large Language Models, Mixture of Experts
Joint Embedding Predictive Architecture
Machine Learning, Meta AI
LSTM
Deep Learning, Neural Networks
Large Concept Model
Meta AI, Natural Language Processing
Layer normalization
Deep Learning
Linear Attention
Transformer Models
Liquid AI
AI Companies, AI Models
Long Short-Term Memory (LSTM)
Deep Learning, Machine Learning, Neural Networks
Long-context language models
Large Language Models
LongNet
Microsoft, Transformer Models
LongRoPE
Large Language Models, Microsoft
MEGABYTE
Meta AI, Transformer Models
MMDiT (Multimodal Diffusion Transformer)
Diffusion Models, Image Generation
Machine learning terms/Sequence Models
Machine Learning
Mamba
Large Language Models
Mamba 2
AI Models, Deep Learning
Mamba-3
Deep Learning
Mixture of Depths
Deep Learning, Machine Learning, Transformer Models
Multi-Head Self-Attention
Deep Learning, Machine Learning, Neural Networks
Multi-Query Attention (MQA)
Transformer Models
Multi-head Latent Attention
Deep Learning, Machine Learning, Neural Networks
Node (neural network)
Neural Networks
PagedAttention
AI Inference, AI Infrastructure
Perceiver
Google DeepMind, Neural Networks
RMSNorm
Artificial Intelligence, Transformer Models
RWKV
Deep Learning, Machine Learning, Neural Networks
RadixAttention
AI Inference, AI Infrastructure
Recurrent Neural Network
Deep Learning, Machine Learning, Neural Networks
Rotary Position Embedding
Deep Learning, Large Language Models, Transformer Models
Rotary Position Embedding (RoPE)
Deep Learning, Large Language Models, Transformer Models
Self-attention
Deep Learning, Machine Learning, Neural Networks
Sliding window attention
Transformer Models
Sparse attention
Deep Learning, Machine Learning, Transformer Models
SubQ
AI Companies, Large Language Models
SwiGLU
Deep Learning, Neural Networks
Titans (neural architecture)
Google, Neural Networks
Tower
Information Retrieval, Machine Learning
Transformers
Deep Learning, Neural Networks
Unidirectional
Vision Transformer
Computer Vision
YOCO (You Only Cache Once)
Large Language Models, Microsoft