GPT-1: Difference between revisions
No edit summary |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Needs | {{Needs Expansion}} | ||
[[ | [[GPT-1]] is the first [[GPT]] [[model]] released by [[OpenAI]] in June 2018 with the [[paper]] [[Improving Language Understanding by Generative Pre-Training]]. The developers found that combining the [[transformer]] [[architecture]] and [[unsupervised pertaining]] produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language." | ||
GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples. | GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples. | ||
[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:GPT]] [[Category:OpenAI]] | [[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:Models]] [[Category:GPT]] [[Category:OpenAI]] |
Latest revision as of 18:55, 2 March 2023
GPT-1 is the first GPT model released by OpenAI in June 2018 with the paper Improving Language Understanding by Generative Pre-Training. The developers found that combining the transformer architecture and unsupervised pertaining produced amazing results. GPT-1, according to the developers, was tailored for specific tasks in order to "strongly understand natural language."
GPT-1 was an important stepping stone toward a language model that possesses general language-based abilitiess. It showed that language models can be efficiently pre-trained which may help them generalize. The architecture was capable of performing many NLP tasks with minimal fine-tuning. GPT-1 used BooksCorpus, which includes some 7,000 books. Self-attention was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million parameters. This model was a precursor to the larger models (models with far more parameters and trained on larger datasets) later on. Its notable ability was its performance on zero shot tasks in natural language processing (e.g., question-answering or sentiment analysis) thanks to its pre-training. Zero-shot learning means that a model can perform a task without seeing examples. Zero-shot task transfers is when the model is given very few examples. Instead, they must learn the task from the instructions and very few examples.