AA-LCR
ABench
Aider Polyglot
AIME 2024
AIME 2025
ARC-AGI 1
ARC-AGI 2
ARC-AGI 3
BALROG
BigBench-Hard
BrowseComp
BrowserGym
Codeforces
COLLIE
CharXiv
Creative Writing v3
Deep Research Bench
DeepResearch Bench
DROP
Dynabench
EQ-Bench 3
ERQA
Factorio Learning Environment
Finance Agent
Fox
FrontierMath
GeoBench
GPQA
GPQA Diamond
GSM8K
GSO
HellaSwag
HealthBench
HealthBench Hard
HumanEval
Humanity's Last Exam
IFBench
Inclusion Arena
InferenceMAX - https://inferencemax.semianalysis.com/
LiveBench
LiveCodeBench
Longform Creative Writing
MATH
MATH-500
MATH Level 5
MathArena
MathVista
METR
MGSM
Mind2Web
MMBench
MMLU
MMLU-Pro
MMMLU
MMStar
MMMU
OCRBench
OmniDocBench
OpenRouter LLM Rankings - https://openrouter.ai/rankings
OSWorld
POPE
SciCode
SimpleQA
SWE-bench
SWE-bench Verified
Tau-bench
Tau2-bench
terminal-bench
VideoMMMU
Vimgolf
VisualWebArena
VQAv2
WebArena
WebDev Arena
WebShop
WebVoyager
WeirdML
Winoground