GPT-1: Difference between revisions

From AI Wiki
(Created page with "OpenAI released GPT-1 in June 2018. The developers found that combining the transformer architecture and unsupervised pertaining produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language." GPT-1 was an important stepping stone toward a language model that possesses general language-based abilitiess. It showed that language models can be efficiently pre-tra...")
 
No edit summary
Line 1: Line 1:
{{Needs expansion}}
[[OpenAI]] released [[GPT-1]] in June 2018. The developers found that combining the [[transformer]] [[architecture]] and [[unsupervised pertaining]] produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."
[[OpenAI]] released [[GPT-1]] in June 2018. The developers found that combining the [[transformer]] [[architecture]] and [[unsupervised pertaining]] produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."



Revision as of 17:56, 2 March 2023

Template:Needs expansion OpenAI released GPT-1 in June 2018. The developers found that combining the transformer architecture and unsupervised pertaining produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."

GPT-1 was an important stepping stone toward a language model that possesses general language-based abilitiess. It showed that language models can be efficiently pre-trained which may help them generalize. The architecture was capable of performing many NLP tasks with minimal fine-tuning. GPT-1 used BooksCorpus, which includes some 7,000 books. Self-attention was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million parameters. This model was a precursor to the larger models (models with far more parameters and trained on larger datasets) later on. Its notable ability was its performance on zero shot tasks in natural language processing (e.g., question-answering or sentiment analysis) thanks to its pre-training. Zero-shot learning means that a model can perform a task without seeing examples. Zero-shot task transfers is when the model is given very few examples. Instead, they must learn the task from the instructions and very few examples.