AI Wiki
Category

AI Inference

71 articles

AWQ (Activation-aware Weight Quantization)

Deep Learning, Large Language Models

AWS Inferentia

AI Hardware

Adaptive thinking

Anthropic, Reasoning Models

Beam search

Constitutional Classifiers

AI Alignment, AI Safety, Anthropic

Context caching

Developer Tools, Large Language Models

Continuous Batching

AI Infrastructure

DeepInfra

AI Companies, AI Infrastructure

Disaggregated serving

AI Infrastructure, Artificial Intelligence

Dynamic inference

Mixture of Experts

EAGLE (speculative decoding)

AI Infrastructure

EAGLE-2

Large Language Models

Etched Sohu

AI Hardware

ExLlamaV2 (EXL2)

Developer Tools, Open Source AI

FP4 (4-bit floating point)

AI Hardware, Training & Optimization

Fireworks AI

AI Companies, Developer Tools, Large Language Models

Flash-Decoding

Algorithms

GPTQ

Deep Learning

GRPO

Chinese AI, Reasoning Models, Reinforcement Learning

Google TPU 8i

AI Hardware, Google

GraphRAG

Information Retrieval, Microsoft, Open Source AI

Groq LPU

AI Hardware

H2O (Heavy-Hitter Oracle for KV Cache)

Large Language Models

Inference-time scaling

AI Research, Artificial Intelligence, Reasoning Models

Intel Crescent Island

AI Hardware

KTO

AI Alignment, Reinforcement Learning, Training & Optimization

KV Cache

Deep Learning, Machine Learning, Transformer Models

Knowledge Distillation

Deep Learning, Machine Learning

LLM inference engine

AI Infrastructure, Large Language Models

LLM.int8()

Large Language Models

Llama API

Developer Tools, Meta AI

Lookahead Decoding

Algorithms, Large Language Models

Medusa

AI Infrastructure

Model Compression

NVIDIA Dynamo

AI Infrastructure, Developer Tools, NVIDIA

NVIDIA Groq LPX Rack

AI Hardware, NVIDIA

NVIDIA NIM

AI Infrastructure, Developer Tools, Enterprise AI

NVIDIA Picasso

AI Hardware, AI Infrastructure, AI Models

NVIDIA Rubin CPX

AI Hardware, NVIDIA

NVIDIA TensorRT-LLM

NVIDIA, Open Source AI

NVIDIA Triton Inference Server

Deep Learning, Developer Tools, NVIDIA

NormalFloat 4-bit (NF4)

Training & Optimization

OctoAI

AI Companies, AI Infrastructure

Offline inference

MLOps

Online inference

MLOps

OpenVINO

Developer Tools, Open Source AI

Optimum-Quanto

Developer Tools, Open Source AI

PagedAttention

AI Infrastructure, Model Architecture

Positron AI

AI Companies, AI Hardware

Post-processing

MLOps

Product quantization

AI Infrastructure, Information Retrieval

Pruning

Machine Learning, Training & Optimization

QLoRA

Deep Learning, Large Language Models, Training & Optimization

Qualcomm AI200

AI Hardware, Data Centers

Qualcomm AI250

AI Hardware, Data Centers

Quantization

Deep Learning, Machine Learning

RLVR

Reasoning Models, Reinforcement Learning, Training & Optimization

RadixAttention

AI Infrastructure, Model Architecture

Rebellions REBEL-Quad

AI Hardware

Skeleton-of-Thought

Prompt Engineering