42 articles
2025 Benchmarks, Document Understanding Benchmarks, Knowledge Work Benchmarks
2024 Benchmarks, Mathematical Reasoning Benchmarks, Pages with reference errors
2025 Benchmarks, Mathematical Reasoning Benchmarks, Olympiad Mathematics
Artificial intelligence
2019 Benchmarks, Abstract Reasoning Benchmarks, General Intelligence Benchmarks
2025 in artificial intelligence, Artificial Intelligence, Cognitive Science
2026 Benchmarks, Game-Based AI Evaluation, General Intelligence Benchmarks
2024 Benchmarks, Code Generation Benchmarks, Multi-language Benchmarks
2024 Benchmarks, Agentic AI Benchmarks, Game-Based AI Evaluation
Artificial intelligence
2024 Benchmarks, Information Retrieval Benchmarks, OpenAI
Tool Use Benchmarks, Web Agent Benchmarks
2023 Benchmarks, Compositional Reasoning Benchmarks, Constrained Generation
2024 Benchmarks, Chart Understanding, Multimodal Benchmarks
Large language models
2025 Benchmarks, Creative Writing Benchmarks, Language Model Benchmarks
2025 Benchmarks, Information Retrieval Benchmarks, Multi-step Task Benchmarks
2025 Benchmarks, Academic AI Evaluation, Multilingual Benchmarks
2020 Establishments, Adversarial Evaluation, Dynamic Benchmarks
2025 Benchmarks, Emotional Intelligence Benchmarks, Language Model Benchmarks
2025 Benchmarks, Embodied AI, Google DeepMind
2025 Benchmarks, Game-Based AI Evaluation, Long-term Planning Benchmarks
2025 Benchmarks, Code Optimization Benchmarks, Multi-language Benchmarks
2023 Benchmarks, 2024 Benchmarks, Earth Observation Benchmarks
2025 Benchmarks, Healthcare AI, Medical Benchmarks
2025 Benchmarks, Challenging Benchmarks, Clinical AI Evaluation
2024 Benchmarks, Constraint Satisfaction Benchmarks, Instruction Following Benchmarks
2024 Benchmarks, Creative Writing Benchmarks, Long-form Generation Benchmarks
2021 Benchmarks, Competition Mathematics, Mathematical Reasoning Benchmarks
2021 Benchmarks, Competition Mathematics, Mathematical Reasoning Benchmarks
2022 Establishments, AI Risk Assessment, AI Safety Organizations
2023 Benchmarks, Knowledge Benchmarks, Multilingual Benchmarks
2025 in artificial intelligence, Mathematical Reasoning, Mathematics Competitions
Large language models
2024 Benchmarks, Code Generation Benchmarks, OpenAI
2024 Benchmarks, Code Generation Benchmarks, Domain-Specific Benchmarks
2024 in artificial intelligence, Common Sense, Natural Language Processing
2025 Benchmarks, Agent Evaluation, Conversational AI Benchmarks
2025 Benchmarks, Educational AI, EvolvingLMMs-Lab
2025 Benchmarks, Agent Benchmarks, Benchmarks
2024 Benchmarks, Coding Benchmarks, Community-Driven Benchmarks
2024 Benchmarks, Code Generation Benchmarks, Machine Learning Benchmarks