GPT-2

Revision as of 18:21, 2 March 2023 by Elegant angel (talk | contribs) (Created page with "{{Needs Expansion}} GPT-2 is the 2nd GPT model released by OpenAI in February 2019. Although it is larger than its predecessor, GPT-1, it is very similar. The main difference is that GPT-2 can multitask. It is able to perform well on multiple tasks without being trained on any examples. GPT-2 demonstrated that language model could better comprehend natural language and perform better on more tasks when it is trained on a larger d...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

GPT-2 is the 2nd GPT model released by OpenAI in February 2019. Although it is larger than its predecessor, GPT-1, it is very similar. The main difference is that GPT-2 can multitask. It is able to perform well on multiple tasks without being trained on any examples. GPT-2 demonstrated that language model could better comprehend natural language and perform better on more tasks when it is trained on a larger dataset and when it has more parameters; the larger the language model is, the better it is. GPT-2 exceeded the state-of-the-art for many tasks in zero shot settings. To get the training data, the authors used Reddit to scrape article text from the outbound links of Reddit posts that had received upvotes. This created a large, high-quality, and extensive dataset called the WebText. It contained 40GB of text data, a lot more than GPT-1's. GPT-2 was trained using the WebText data and had 1.5 billion parameters. This is ten times greater than GPT-1. GPT-2 was tested on many downstream tasks, including summarisation, translation, question answering, and reading comprehension.