GPT-1: Difference between revisions

GPT-1 (view source)

Revision as of 17:57, 2 March 2023

20 bytes added , 2 March 2023

no edit summary

Elegant angel

57

edits

Revision as of 17:57, 2 March 2023 (view source) Elegant angel (talk \| contribs) No edit summary ← Older edit		Revision as of 17:57, 2 March 2023 (view source) Elegant angel (talk \| contribs) No edit summary Newer edit →
Line 4:		Line 4:
	GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.		GPT-1 was an important stepping stone toward a [[language model]] that possesses general [[language-based abilities]]s. It showed that language models can be efficiently [[pre-trained]] which may help them generalize. The architecture was capable of performing many [[NLP]] tasks with minimal [[fine-tuning]]. GPT-1 used [[BooksCorpus]], which includes some 7,000 books. [[Self-attention]] was required in the transformer's coder to train the model. The architecture was identical to the original transformer and had 117 million [[parameters]]. This model was a precursor to the larger models (models with far more parameters and trained on larger [[datasets]]) later on. Its notable ability was its performance on [[zero shot]] [[tasks]] in [[natural language processing]] (e.g., [[question-answering]] or [[sentiment analysis]]) thanks to its [[pre-training]]. [[Zero-shot learning]] means that a model can perform a task without seeing [[examples]]. [[Zero-shot task transfer]]s is when the model is given very few examples. Instead, they must learn the task from the [[instructions]] and very few examples.

	[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:GPT]] [[Category:OpenAI]]		[[Category:Terms]] [[Category:Artificial intelligence terms]] [[Category:Models]] [[Category:GPT]] [[Category:OpenAI]]