Category
Alignment
5 articles
Constitutional AI
AI Safety, Anthropic
DPO
Preference Learning
Direct Preference Optimization (DPO)
Deep Learning, Machine Learning, Natural Language Processing
InstructGPT
Language Models, OpenAI, RLHF
Reinforcement Learning from Human Feedback (RLHF)
Deep Learning, Machine Learning, Natural Language Processing