AI Inference
71 articles
AWQ (Activation-aware Weight Quantization)
Deep Learning, Large Language Models
AWS Inferentia
AI Hardware
Adaptive thinking
Anthropic, Reasoning Models
Beam search
Constitutional Classifiers
AI Alignment, AI Safety, Anthropic
Context caching
Developer Tools, Large Language Models
Continuous Batching
AI Infrastructure
DeepInfra
AI Companies, AI Infrastructure
Disaggregated serving
AI Infrastructure, Artificial Intelligence
Dynamic inference
Mixture of Experts
EAGLE (speculative decoding)
AI Infrastructure
EAGLE-2
Large Language Models
Etched Sohu
AI Hardware
ExLlamaV2 (EXL2)
Developer Tools, Open Source AI
FP4 (4-bit floating point)
AI Hardware, Training & Optimization
Fireworks AI
AI Companies, Developer Tools, Large Language Models
Flash-Decoding
Algorithms
GPTQ
Deep Learning
GRPO
Chinese AI, Reasoning Models, Reinforcement Learning
Google TPU 8i
AI Hardware, Google
GraphRAG
Information Retrieval, Microsoft, Open Source AI
Groq LPU
AI Hardware
H2O (Heavy-Hitter Oracle for KV Cache)
Large Language Models
Inference-time scaling
AI Research, Artificial Intelligence, Reasoning Models
Intel Crescent Island
AI Hardware
KTO
AI Alignment, Reinforcement Learning, Training & Optimization
KV Cache
Deep Learning, Machine Learning, Transformer Models
Knowledge Distillation
Deep Learning, Machine Learning
LLM inference engine
AI Infrastructure, Large Language Models
LLM.int8()
Large Language Models
Llama API
Developer Tools, Meta AI
Lookahead Decoding
Algorithms, Large Language Models
Medusa
AI Infrastructure
Model Compression
NVIDIA Dynamo
AI Infrastructure, Developer Tools, NVIDIA
NVIDIA Groq LPX Rack
AI Hardware, NVIDIA
NVIDIA NIM
AI Infrastructure, Developer Tools, Enterprise AI
NVIDIA Picasso
AI Hardware, AI Infrastructure, AI Models
NVIDIA Rubin CPX
AI Hardware, NVIDIA
NVIDIA TensorRT-LLM
NVIDIA, Open Source AI
NVIDIA Triton Inference Server
Deep Learning, Developer Tools, NVIDIA
NormalFloat 4-bit (NF4)
Training & Optimization
OctoAI
AI Companies, AI Infrastructure
Offline inference
MLOps
Online inference
MLOps
OpenVINO
Developer Tools, Open Source AI
Optimum-Quanto
Developer Tools, Open Source AI
PagedAttention
AI Infrastructure, Model Architecture
Positron AI
AI Companies, AI Hardware
Post-processing
MLOps
Product quantization
AI Infrastructure, Information Retrieval
Pruning
Machine Learning, Training & Optimization
QLoRA
Deep Learning, Large Language Models, Training & Optimization
Qualcomm AI200
AI Hardware, Data Centers
Qualcomm AI250
AI Hardware, Data Centers
Quantization
Deep Learning, Machine Learning
RLVR
Reasoning Models, Reinforcement Learning, Training & Optimization
RadixAttention
AI Infrastructure, Model Architecture
Rebellions REBEL-Quad
AI Hardware
Skeleton-of-Thought
Prompt Engineering