Reasoning
19 articles
ARC-AGI 1
AGI, AI Benchmarks
Agent planning
AI Agents, Artificial Intelligence
Command A Reasoning
AI Companies, Large Language Models
GPQA
Benchmarks
GSM8K
Benchmarks, Large Language Models, Machine Learning
Kimi K2 Thinking
Chinese AI, Large Language Models, Open Source AI
MATH (benchmark)
Benchmarks, Large Language Models, Machine Learning
Magistral
Large Language Models, Open Source AI
MuSR
AI Benchmarks, Model Evaluation
Natural language inference (NLI)
NLP Benchmarks, Natural Language Processing
OpenAI o-series
Artificial Intelligence, Large Language Models, OpenAI
OpenAI o1
AI Models, Large Language Models, OpenAI
OpenAI o3
AI Models, Large Language Models, OpenAI
ProcessBench
AI Benchmarks, Model Evaluation
ReAct (prompting)
AI Agents, Prompt Engineering
Reflexion
AI Agents, Machine Learning
SimpleBench
2024 in artificial intelligence, AI Benchmarks, Common Sense
Test-time compute
Artificial Intelligence, Large Language Models, Machine Learning
Tree of Thoughts
2023 in AI, LLM techniques, Prompt engineering