Jump to content

GPT-2: Difference between revisions

63 bytes added ,  2 March 2023
no edit summary
(Created page with "{{Needs Expansion}} GPT-2 is the 2nd GPT model released by OpenAI in February 2019. Although it is larger than its predecessor, GPT-1, it is very similar. The main difference is that GPT-2 can multitask. It is able to perform well on multiple tasks without being trained on any examples. GPT-2 demonstrated that language model could better comprehend natural language and perform better on more tasks when it is trained on a larger d...")
 
No edit summary
Line 1: Line 1:
{{Needs Expansion}}
{{Needs Expansion}}
[[GPT-2]] is the 2nd [[GPT]] [[model]] released by [[OpenAI]] in February 2019. Although it is larger than its predecessor, [[GPT-1]], it is very similar. The main difference is that GPT-2 can [[multitask]]. It is able to perform well on multiple tasks without being [[trained]] on any [[examples]]. GPT-2 demonstrated that [[language model]] could better comprehend [[natural language]] and perform better on more tasks when it is trained on a larger [[dataset]] and when it has more [[parameters]]; the larger the language model is, the better it is. GPT-2 exceeded the state-of-the-art for many tasks in [[zero shot]] settings. To get the [[training data]], the authors used [[Reddit]] to scrape article text from the outbound links of Reddit posts that had received upvotes. This created a large, high-quality, and extensive [[dataset]] called the [[WebText]]. It contained 40GB of text data, a lot more than GPT-1's. GPT-2 was trained using the WebText data and had 1.5 billion parameters. This is ten times greater than GPT-1. GPT-2 was tested on many downstream tasks, including summarisation, translation, question answering, and reading comprehension.
[[GPT-2]] is the 2nd [[GPT]] [[model]] released by [[OpenAI]] in February 2019. Although it is larger than its predecessor, [[GPT-1]], it is very similar. The main difference is that GPT-2 can [[multitask]]. It is able to perform well on multiple tasks without being [[trained]] on any [[examples]]. GPT-2 demonstrated that [[language model]] could better comprehend [[natural language]] and perform better on more tasks when it is trained on a larger [[dataset]] and when it has more [[parameters]]; the larger the language model is, the better it is. With over 1.5 billion parameters (10 times greater than GPT-1), GPT-2's performance exceeded the state-of-the-art for many tasks in [[zero shot]] settings. To get the [[training data]], the authors used [[Reddit]] to scrape article text from the outbound links of Reddit posts that had received upvotes. This created a large, high-quality, and extensive [[dataset]] called the [[WebText]]. The dataset contained 40GB of text from over 8 million documents, which is far greater than the amount of data GPT-1 was trained on. GPT-2 was tested on many downstream tasks, including [[translation]], [[summarisation]], [[question and answering]], and [[reading comprehension]].