GPT-1: Difference between revisions

83 bytes added ,  2 March 2023
no edit summary
No edit summary
No edit summary
 
Line 1: Line 1:
{{Needs Expansion}}
{{Needs Expansion}}
[[GPT-1]] is the first [[GPT]] [[model]] released by [[OpenAI]] in June 2018. The developers found that combining the [[transformer]] [[architecture]] and [[unsupervised pertaining]] produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."
[[GPT-1]] is the first [[GPT]] [[model]] released by [[OpenAI]] in June 2018 with the [[paper]] [[Improving Language Understanding by Generative Pre-Training]]. The developers found that combining the [[transformer]] [[architecture]] and [[unsupervised pertaining]] produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."


GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.
GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.


[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:Models]] [[Category:GPT]] [[Category:OpenAI]]
[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:Models]] [[Category:GPT]] [[Category:OpenAI]]