Papers: Difference between revisions
Startledcat (talk | contribs) |
|||
Line 166: | Line 166: | ||
|- | |- | ||
|[[Language Models are Unsupervised Multitask Learners (GPT-2)]] || 2018 || [https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf paper] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-2]] || | |[[Language Models are Unsupervised Multitask Learners (GPT-2)]] || 2018 || [https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf paper] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-2]] || | ||
|- | |||
|[[Deep reinforcement learning from human preferences]] || 2017/06/12 || [[arxiv:1706.03741]]<br>[https://openai.com/research/learning-from-human-preferences Blog post]<br>[https://github.com/mrahtz/learning-from-human-preferences GitHub] || || [[OpenAI]] || [[RLHF]] ([[Reinforcement Learning from Human Feedback]]) || | |||
|- | |- | ||
|} | |} |