GPT: Difference between revisions

From AI Wiki
No edit summary
No edit summary
Line 7: Line 7:
! rowspan="2" | Parameters
! rowspan="2" | Parameters
! colspan="3" | Context Window
! colspan="3" | Context Window
! colspan="7" | Training Data
! colspan="2" | Training Data
|-
|-
! Tokens
! Tokens
Line 13: Line 13:
! Equivalent
! Equivalent
! Amount
! Amount
! BookCorpus
! Dataset
! WebText
! CommonCrawl
! hello
! hello
|-
|-
| '''[[GPT-1]]''' || June 11, 2018 ||117 Million || 512 || 358 ||  || 40GB || Yes ||  ||
| '''[[GPT-1]]''' || June 11, 2018 ||117 Million || 512 || 358 ||  || 40GB || [[BookCorpus]]
|-
|-
| '''[[GPT-2]]''' ||  February 14, 2019 || 1.5 Billion || 1024 || 716 ||  || 40TB || Yes || Yes ||
| '''[[GPT-2]]''' ||  February 14, 2019 || 1.5 Billion || 1024 || 716 ||  || 40TB || [[WebText]]
|-
|-
| '''[[GPT-3]]''' || June 11, 2020 || 175 Billion || 2,048 || 1,433 ||  ||  ||  ||  ||  
| '''[[GPT-3]]''' || June 11, 2020 || 175 Billion || 2,048 || 1,433 ||  ||  ||  ||  || [[Common Crawl]], [[WebText2]], [[Books1]], [[Books2]], [[Wikipedia]]
|-
|-
| '''[[GPT-3.5]]''' || January 2022 || 1.3 Bilion, 6 Billion, 175 Billion  || 4,000 || 2,800 ||  ||  ||  ||  ||  
| '''[[GPT-3.5]]''' || January 2022 || 1.3 Bilion, 6 Billion, 175 Billion  || 4,000 || 2,800 ||  ||  ||  ||  ||  

Revision as of 09:39, 3 March 2023

Main GPT Models

Model Release Date Parameters Context Window Training Data
Tokens Words Equivalent Amount Dataset
GPT-1 June 11, 2018 117 Million 512 358 40GB BookCorpus
GPT-2 February 14, 2019 1.5 Billion 1024 716 40TB WebText
GPT-3 June 11, 2020 175 Billion 2,048 1,433 Common Crawl, WebText2, Books1, Books2, Wikipedia
GPT-3.5 January 2022 1.3 Bilion, 6 Billion, 175 Billion 4,000 2,800
ChatGPT November 30, 2022 xxx 4,096 2,867
GPT-4
(v1)
???? xxx 8,000 5,600
GPT-4
(v2)
???? xxx 32,000 22,400

What is GPT

GPT, which stands for Generative Pre-trained Transformer, is a type of language model developed by OpenAI. Based on the Transformer architecture and utilizes unsupervised learning, GPT is able to generate text indistinguishable from text written by humans.

How does GPT work?

Using unsupervised learning, GPT is able to generate text by predicting the next word in a sentence based on the context of the words that came before it. The model is trained on a large corpus of text, such as the entire internet, to learn the patterns and relationships between words and how they are used in context.

The Transformer architecture is used in GPT because it is well-suited for processing sequential data, such as text. The Transformer consists of self-attention mechanisms, which allow the model to focus on specific words in a sentence and learn the relationships between them.

Once the model is trained, it can be used for a variety of natural language processing (NLP) tasks, such as text generation, translation, and summarization. The model can also be fine-tuned for specific tasks by training it on smaller, task-specific datasets.

Applications of GPT

GPT has a wide range of applications in the field of NLP. Some of the most notable applications include:

Explain Like I'm 5 (ELI5)

GPT is a computer program that can write sentences like a person. It learned how to do this by reading lots of text on the internet. Now it can write new sentences and answer questions by using what it learned. It's like a smart robot that can talk and write in a way that sounds like a person!