Category
AI Alignment
7 articles
Apollo Research
AI Companies, AI Research, AI Safety
Constitutional Classifiers
AI Safety, AI Techniques, Anthropic
Frontier Model Forum
AI Industry, AI Safety, AI Standards
KTO
AI Techniques, AI Training, Preference Optimization
MACHIAVELLI (benchmark)
AI Benchmarks, AI Ethics, AI Safety
Redwood Research
AI Research, AI Safety
Reward hacking
AI Safety, Machine Learning, Reinforcement Learning