14 articles
AI Benchmarks, Document Understanding Benchmarks, Knowledge Work Benchmarks
AI Benchmarks, Mathematical Reasoning Benchmarks, Olympiad Mathematics
AI Benchmarks, Creative Writing Benchmarks, Language Model Benchmarks
AI Benchmarks, Information Retrieval Benchmarks, Multi-step Task Benchmarks
AI Benchmarks, Academic AI Evaluation, Multilingual Benchmarks
AI Benchmarks, Emotional Intelligence Benchmarks, Language Model Benchmarks
AI Benchmarks, Embodied AI, Google DeepMind
AI Benchmarks, Game-Based AI Evaluation, Long-term Planning Benchmarks
AI Benchmarks, Code Optimization Benchmarks, Multi-language Benchmarks
AI Benchmarks, Healthcare AI, Medical Benchmarks
AI Benchmarks, Challenging Benchmarks, Clinical AI Evaluation
AI Benchmarks, Agent Evaluation, Conversational AI Benchmarks
AI Benchmarks, Educational AI, EvolvingLMMs-Lab
AI Benchmarks, Agent Benchmarks, Benchmarks