GPT-1

From AI Wiki

GPT-1 is the first GPT model released by OpenAI in June 2018 with the paper Improving Language Understanding by Generative Pre-Training. The developers found that combining the transformer architecture and unsupervised pertaining produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."

GPT-1 was an important stepping stone toward a language model that possesses general language-based abilitiess. It showed that language models can be efficiently pre-trained which may help them generalize. The architecture was capable of performing many NLP tasks with minimal fine-tuning. GPT-1 used BooksCorpus, which includes some 7,000 books. Self-attention was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million parameters. This model was a precursor to the larger models (models with far more parameters and trained on larger datasets) later on. Its notable ability was its performance on zero shot tasks in natural language processing (e.g., question-answering or sentiment analysis) thanks to its pre-training. Zero-shot learning means that a model can perform a task without seeing examples. Zero-shot task transfers is when the model is given very few examples. Instead, they must learn the task from the instructions and very few examples.