Category
Reasoning
14 articles
ARC-AGI 1
AGI, AI Benchmarks
Agent planning
AI Agents, Artificial Intelligence
GPQA
Benchmarks
GSM8K
Benchmarks, Large Language Models, Machine Learning
MATH (benchmark)
Benchmarks, Large Language Models, Machine Learning
Natural language inference (NLI)
NLP Benchmarks, Natural Language Processing
OpenAI o-series
Artificial Intelligence, Large Language Models, OpenAI
OpenAI o1
AI Models, Large Language Models, OpenAI
OpenAI o3
AI Models, Large Language Models, OpenAI
ReAct (prompting)
AI Agents, Prompt Engineering
Reflexion
AI Agents, Machine Learning
SimpleBench
2024 in artificial intelligence, AI Benchmarks, Common Sense
Test-time compute
Artificial Intelligence, Large Language Models, Machine Learning
Tree of Thoughts
2023 in AI, LLM techniques, Prompt engineering