GPT-1: Difference between revisions

20 bytes added ,  2 March 2023
no edit summary
No edit summary
No edit summary
Line 4: Line 4:
GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.
GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.


[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:GPT]] [[Category:OpenAI]]
[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:Models]] [[Category:GPT]] [[Category:OpenAI]]