Improving Language Understanding by Generative Pre-Training (GPT)

From AI Wiki
Revision as of 16:01, 23 February 2023 by Daikon Radish (talk | contribs) (Created page with "===Introduction=== In June 2018, OpenAI introduced GPT-1, a language model that combined unsupervised pre-training with the transformer architecture to achieve significant progress in natural language understanding. The team fine-tuned the model for specific tasks and found that pre-training helped it perform well on various NLP tasks with minimal fine-tuning. GPT-1 used the BooksCorpus dataset and self-attention in the transformer's decoder with 117 million parameters,...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

In June 2018, OpenAI introduced GPT-1, a language model that combined unsupervised pre-training with the transformer architecture to achieve significant progress in natural language understanding. The team fine-tuned the model for specific tasks and found that pre-training helped it perform well on various NLP tasks with minimal fine-tuning. GPT-1 used the BooksCorpus dataset and self-attention in the transformer's decoder with 117 million parameters, paving the way for future models with more parameters and larger datasets to enhance its potential further. One noteworthy feature of GPT-1 was its ability to perform zero-shot tasks in natural language processing, such as question-answering and sentiment analysis, thanks to pre-training. Zero-shot learning enables the model to understand a task based on instructions and a few examples without having seen previous examples of that task. GPT-1 was a crucial building block in the development of a language model with general language-based capabilities.