Category
Evaluation
7 articles
Agent evaluation
AI Agents, Benchmarks
Inter-rater agreement
Annotation, Statistics
PR AUC
Classification, Machine Learning
Precision-Recall Curve
Classification, Machine Learning
ROC (Receiver Operating Characteristic) Curve
Classification, Machine Learning
Root Mean Squared Error (RMSE)
Machine Learning, Regression, Statistics
WebArena
AI Agents, Benchmarks