GPT-2

From AI Wiki

GPT-2 is the 2nd GPT model released by OpenAI in February 2019 with the paper Language Models are Unsupervised Multitask Learners. Although it is larger than its predecessor, GPT-1, it is very similar. The main difference is that GPT-2 can multitask. It is able to perform well on multiple tasks without being trained on any examples. GPT-2 demonstrated that language model could better comprehend natural language and perform better on more tasks when it is trained on a larger dataset and when it has more parameters; the larger the language model is, the better it is. With over 1.5 billion parameters (10 times greater than GPT-1), GPT-2's performance exceeded the state-of-the-art for many tasks in zero shot settings. To get the training data, the authors used Reddit to scrape article text from the outbound links of Reddit posts that had received upvotes. This created a large, high-quality, and extensive dataset called the WebText. The dataset contained 40GB of text from over 8 million documents, which is far greater than the amount of data GPT-1 was trained on. GPT-2 was tested on many downstream tasks, including translation, summarisation, question and answering, and reading comprehension.