GPT

AI Models Large Language Models OpenAI

27 min read

Updated Jun 20, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 20, 2026

Fact-checked

In review queue

Sources

42 citations

Revision

v9 · 5,402 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

What is GPT

GPT, short for Generative Pre-trained Transformer, is a family of large language models developed by OpenAI and built on the Transformer architecture introduced by Vaswani et al. in 2017. The first GPT was described in a 2018 paper by Alec Radford and colleagues titled "Improving Language Understanding by Generative Pre-Training," which showed that a single decoder-only Transformer pre-trained on a large unlabeled corpus could be fine-tuned to beat task-specific models on a wide range of benchmarks. That paper stated its core finding plainly: "We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task."^[1] Each subsequent generation, GPT-2 in 2019, GPT-3 in 2020, GPT-3.5 and ChatGPT in 2022, GPT-4 in 2023, GPT-4o in 2024, GPT-4.1 in 2025, and GPT-5 in August 2025, has scaled the same basic recipe of next-token prediction on internet-scale text, with refinements to data, alignment, and compute. The parameter count grew roughly 1,500-fold in the first two years alone, from 117 million in GPT-1 (2018) to 175 billion in GPT-3 (2020).^[1]^[2]^[3]

The acronym has spread well beyond OpenAI. Researchers and companies use the suffix "GPT" for any decoder-only Transformer trained with the generative pre-training objective, including BloombergGPT, Baidu's Wenxin Yiyan and ERNIE family, EleutherAI's GPT-J and GPT-NeoX, and many domain-specific models. The U.S. Patent and Trademark Office twice rejected OpenAI's attempt to register "GPT" as a trademark, ruling in February 2024 that the term was "merely descriptive" of a functional class of models rather than a unique brand.^[4]^[5]

GPT is the system that triggered the modern wave of generative AI. ChatGPT, the dialogue interface OpenAI launched on November 30, 2022, reached 1 million users in five days and roughly 100 million monthly users within two months, the fastest consumer software adoption recorded at the time, and it pushed every major technology company into shipping its own competing assistant.^[6]

Lineage of OpenAI GPT models

The table below summarizes the headline OpenAI GPT releases. Parameter counts for GPT-4 and later are not officially disclosed; figures shown are widely reported industry estimates and are noted as such.

Model	Release date	Parameters	Context window	Notes
GPT-1	June 11, 2018	117 million	512 tokens	Introduced in Improving Language Understanding by Generative Pre-Training; trained on the BookCorpus dataset.
GPT-2	February 14, 2019 (small); November 5, 2019 (full)	Up to 1.5 billion	1,024 tokens	Initially withheld due to misuse concerns; full 1.5B weights released in stages. Trained on WebText (40 GB).
GPT-3	June 11, 2020 (paper May 28, 2020)	175 billion	2,048 tokens	Paper "Language Models are Few-Shot Learners" demonstrated in-context learning; trained on roughly 400 billion tokens. NeurIPS 2020 Best Paper.
InstructGPT	January 27, 2022	1.3B to 175B (multiple sizes)	2,048 tokens	First public OpenAI model trained with reinforcement learning from human feedback (RLHF).
GPT-3.5	March 15, 2022 (text-davinci-002)	175 billion (estimated)	4,096 tokens	Improved instruction-following; basis for the original ChatGPT launch.
ChatGPT	November 30, 2022	175 billion (GPT-3.5)	4,096 tokens	Free conversational web app; reached ~100 million MAUs in two months.
GPT-4	March 14, 2023	Not disclosed; rumored ~1.8T total parameters in a mixture-of-experts configuration	8K and 32K variants	First GPT to accept image inputs alongside text. Passed a simulated bar exam in roughly the top 10%.
GPT-4 Turbo	November 6, 2023	Not disclosed	128,000 tokens	Cheaper, faster GPT-4 with vision; data freshness extended to April 2023.
GPT-4o	May 13, 2024	Not disclosed	128,000 tokens	"Omni" multimodal model: real-time text, image, and audio in a single neural network.
GPT-4.1	April 14, 2025	Not disclosed	1,000,000 tokens	API-only release focused on coding, instruction following, and long-context reasoning.
GPT-5	August 7, 2025	Not disclosed	256,000 tokens (input), 128,000 output	Unified system with a router that switches between fast and "thinking" reasoning models; mini, nano, and pro variants.

Sources for this table: OpenAI announcements, the original GPT papers, and Wikipedia articles on each model.^[1]^[2]^[3]^[7]^[8]^[9]^[10]^[11]^[12]

Architecture

Every GPT model from GPT-1 onward is a decoder-only Transformer, meaning it consists of a stack of identical Transformer blocks that use masked self-attention so each token can only attend to tokens earlier in the sequence. There is no encoder, and there is no bidirectional context the way BERT has. The model takes a sequence of tokens, runs them through token embeddings and positional embeddings, then through the Transformer stack, and finally through a linear layer plus softmax that produces a probability distribution over the vocabulary for the next token.^[13]

The core building block of each layer combines a multi-head causal self-attention sublayer with a position-wise feed-forward network. Residual connections and layer normalization sit around both sublayers. Position information is injected through learned positional embeddings in GPT-1 and GPT-2 and through more elaborate schemes such as rotary position embeddings in many later open-source GPT variants. Vocabulary is encoded with a byte pair encoding tokenizer; modern OpenAI models use a tokenizer family called tiktoken.

GPT-1 used 12 Transformer blocks and 12 attention heads in each block, with a 768-dimensional hidden state. GPT-2's largest variant scaled this to 48 blocks, 25 heads, and a 1,600-dimensional hidden state. GPT-3 took the same recipe to 96 blocks, 96 heads, and a 12,288-dimensional hidden state. The headline parameter counts (117 million for GPT-1, 1.5 billion for GPT-2, 175 billion for GPT-3) come almost entirely from the matrices inside attention and feed-forward layers; the embedding tables are large but a relatively small share of the total at scale.^[1]^[2]^[3]

Decoder-only architectures became the dominant design for generative AI for two practical reasons. They are simple to scale because every layer has the same shape, and every token in a training document contributes a gradient signal because next-token prediction targets every position in the sequence. That makes training data-efficient relative to encoder-decoder setups. The cost is that the model cannot look at future context, so tasks like document classification or extractive question answering, where bidirectional models such as BERT once dominated, are framed in GPT as text-in text-out problems.^[13]^[14]

From GPT-4 onward, OpenAI is widely believed to use mixture-of-experts (MoE) routing inside the feed-forward layers, which lets the model store many parameters but only activate a fraction of them per token. Industry leaks suggest GPT-4 has roughly 1.8 trillion total parameters split across 8 to 16 experts with about 220 billion or 111 billion parameters each, and that two experts are routed per forward pass. OpenAI has never confirmed these numbers, and the GPT-4 technical report explicitly omits architecture, hardware, training compute, and dataset details.^[15]^[16]

Context window length, the number of tokens the model can attend to at once, has expanded across generations. GPT-1 had a 512-token window, GPT-2 had 1,024, GPT-3 had 2,048, GPT-3.5 had 4,096, GPT-4 launched at 8,192 and 32,768 in two flavors, GPT-4 Turbo extended this to 128,000, GPT-4.1 jumped to 1 million tokens for input, and GPT-5 ships with 256,000 input and 128,000 output by default. Long-context training requires special engineering for attention scaling, including techniques such as FlashAttention, grouped-query attention, and various sparse and linear-attention variants. Most public details on these techniques come from open-source models because OpenAI does not document its production architecture.

For multimodal GPTs (GPT-4 with vision, GPT-4o, GPT-5), the model also accepts image and audio inputs through dedicated encoders that project those modalities into the same token embedding space the language model already understands. GPT-4o is described as a single end-to-end neural network that processes audio, vision, and text in one model rather than relying on separate speech-to-text and text-to-speech pipelines, which is why it can respond to spoken input in roughly 320 milliseconds, close to human conversational latency.^[9]

How is GPT trained

GPT models are trained in two or three main stages.

Pre-training

The pre-training stage uses self-supervised next-token prediction, also known as causal language modeling. The model sees a token sequence drawn from a large corpus and learns to predict the next token at every position. The loss is the negative log likelihood of the correct token under the model's predicted distribution, summed over the sequence. There are no human labels at this stage; the labels come from the text itself.^[1]

Pre-training data scaled rapidly across generations. GPT-1 used the 4.5 GB BookCorpus. GPT-2 used 40 GB of WebText scraped from outbound Reddit links with a karma threshold, and the GPT-2 paper reported that the 1.5-billion-parameter model "achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting."^[2] GPT-3 used roughly 570 GB of filtered Common Crawl, plus WebText2, Books1, Books2, and English Wikipedia, totaling about 400 billion tokens after subsampling. GPT-4 and later models are trained on undisclosed mixtures that almost certainly include licensed data, code repositories, and synthetic data generated by earlier models.^[2]^[3]

Fine-tuning and alignment

After pre-training, GPT models go through alignment so they follow instructions and avoid clearly unsafe outputs. There are usually two phases:

Supervised fine-tuning (SFT), where human contractors write demonstrations of helpful responses to a wide variety of prompts and the base model is fine-tuned to imitate them.
Reinforcement learning from human feedback (RLHF), introduced for OpenAI's deployed models in the InstructGPT paper by Ouyang et al. (2022). Labelers rank model responses, those rankings train a reward model, and the language model is fine-tuned with Proximal Policy Optimization (PPO) or a similar algorithm to maximize predicted reward.^[17]

The InstructGPT paper reported that a 1.3 billion parameter RLHF-tuned model produced outputs that human raters preferred to those of the 175 billion parameter GPT-3, a 100x compression in apparent capability driven entirely by alignment. In the authors' words, "outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters." That result is widely credited as the technique that made ChatGPT feel as useful as it does.^[17]

Later GPT models add further alignment stages such as Constitutional-AI-style critique loops, rule-based reward shaping, and tool-use pre-training where the model learns to call code interpreters, web search, and image generators. GPT-5 introduced a router model that decides at inference time whether to send a query to a fast "main" model or a slower "thinking" model that allocates more inference compute to reasoning.^[12]

Scaling laws

The scaling behavior of GPT-style models was formalized in two influential papers. Kaplan et al. (2020) at OpenAI showed that test loss falls as a power law in three quantities: parameter count, dataset size, and training compute. Their advice was to scale parameters faster than data within a fixed compute budget, which justified the parameter-heavy GPT-3 recipe.^[18]

Hoffmann et al. (2022) at DeepMind, in the Chinchilla paper, re-ran the experiments more carefully and concluded that for a given compute budget, parameters and tokens should scale roughly in equal proportion. Many large language models trained before Chinchilla, including GPT-3, were therefore over-parameterized for the amount of data they had seen. Later GPT models are believed to follow a more Chinchilla-style data-to-parameter ratio, although exact numbers are not public.^[19]

In-context learning and prompting

One of the surprises of GPT-3 was that the model could perform new tasks at inference time given only a few demonstrations in the prompt, with no parameter updates. The paper called this few-shot learning, and the broader phenomenon is now usually called in-context learning.^[3]

In-context learning is what made prompt engineering a distinct skill: changing the wording, ordering, or examples in the prompt can shift accuracy on a benchmark by tens of percentage points. Later work on chain-of-thought prompting (Wei et al. 2022) showed that asking the model to think step by step before answering improved performance on math word problems and logic puzzles, especially in larger models. The capability is sometimes described as emergent because it appears to switch on around a certain scale rather than improving smoothly.

In 2024 and 2025 OpenAI released a separate "reasoning" line, including o1, o3, and the GPT-5 thinking variants, that uses reinforcement learning to teach the model to spend more inference compute on chains of thought before producing a final answer. These models score much higher on math and coding benchmarks than non-reasoning siblings of comparable size.^[20]

ChatGPT and consumer adoption

OpenAI launched ChatGPT as a free "research preview" on November 30, 2022. The interface was a simple chat box wrapped around a fine-tuned GPT-3.5 model. Within five days it reached 1 million users; within two months it crossed 100 million monthly users, surpassing TikTok's nine months and Instagram's two-and-a-half years to that mark.^[6]

The launch is often called the "ChatGPT moment" because it forced the rest of the industry to react. Microsoft, which had invested $1 billion in OpenAI in 2019, expanded that to a multi-year, multibillion-dollar partnership in January 2023 and built ChatGPT-derived features into Bing, Office, GitHub Copilot, and Windows. Google issued a "code red" and accelerated its own Bard (later Gemini) chatbot. Anthropic released Claude in March 2023, and Baidu unveiled Ernie Bot the same month.

ChatGPT itself has gone through many backend models. The default has cycled through GPT-3.5, GPT-4, GPT-4 Turbo, GPT-4o, and GPT-5, with paid users gaining access to newer models first. OpenAI reported 300 million weekly active users in December 2024, 400 million in February 2025, and roughly 500 million by mid-2025, making ChatGPT among the most-used software products in the world.^[39]^[40]

What are Custom GPTs and the GPT Store

On November 6, 2023, at OpenAI's first DevDay, Sam Altman announced "GPTs," customizable versions of ChatGPT that any user could build with natural language instructions, retrieval over uploaded files, and optional connections to external APIs. Builders did not need to write code; they configured a system prompt, tools, and knowledge files through a chat-based builder.^[21]

On January 10, 2024, OpenAI opened the GPT Store, a marketplace where ChatGPT Plus, Team, and Enterprise subscribers could browse and use Custom GPTs published by other users. By that date users had already created more than 3 million Custom GPTs. Featured launches included Khan Academy's tutoring GPT, AllTrails' trail finder, Canva's design assistant, and the academic search GPT Consensus.^[22]

In early 2024, OpenAI announced a revenue program that pays U.S. builders based on user engagement with their GPTs, although the precise rates and eligibility have shifted over time. The Custom GPT format was the first widely deployed example of an agent-like layer on top of a base model: each GPT bundles a persistent persona, retrieval, and tool calls into a single shareable artifact.

Other "GPT" labels

Because OpenAI never secured a trademark on the acronym, dozens of organizations ship models, applications, and even unrelated products under "GPT" branding. The list below is not exhaustive but covers some of the most cited examples.

Model or product	Organization	Year	Notes
GPT-J-6B	EleutherAI	2021	6-billion-parameter open-weight decoder-only Transformer; an early attempt to replicate GPT-3 publicly.
GPT-NeoX-20B	EleutherAI	2022	20-billion-parameter open-weight model.
BloombergGPT	Bloomberg L.P.	March 2023	50-billion-parameter financial LLM trained on 363 billion tokens of Bloomberg's proprietary financial data plus 345 billion general tokens.
ERNIE / Wenxin Yiyan	Baidu	2019 onward	Chinese decoder-only Transformer family; the public chatbot launched March 16, 2023, and was renamed Wenxiaoyan in 2024. Reported 200 million users by April 2024.
BioGPT	Microsoft Research	2022	Domain-specific GPT trained on biomedical literature.
FinGPT	Multiple academic groups	2023	Open-source financial LLM project.
Cerebras-GPT	Cerebras	March 2023	Open-weight family of seven Chinchilla-scaled GPT models.
AssistGPT, AutoGPT, AgentGPT	Various	2023	Agent frameworks built on top of OpenAI APIs; not standalone models.

In its February 2024 ruling against OpenAI's trademark application, the USPTO specifically pointed to this proliferation as evidence that "GPT" had become a generic descriptor.^[4]

How does GPT compare with other LLM families

GPT is one of several major closed and open large language model families that emerged after the original Transformer paper. The table below sketches how the GPT line compares to its main competitors as of April 2026.

Family	Developer	License	Notable strength	Recent flagship
GPT	OpenAI	Closed (API and ChatGPT only)	Broadest ecosystem, strong general reasoning, large third-party tooling, GPT Store.	GPT-5 (August 2025)
Claude	Anthropic	Closed (API and chat)	Long-form writing, careful reasoning, lower hallucination rates on many evaluations.	Claude 4.5 Opus and Sonnet (2025)
Gemini	Google DeepMind	Closed (API, Gemini app, Workspace)	Native multimodality, very long context windows, integration with Google Search.	Gemini 2.5 Pro (2025)
LLaMA	Meta AI	Open weights with a community license	Strong base models that anyone can self-host or fine-tune.	LLaMA 4 (2025)
DeepSeek	DeepSeek (China)	Open weights	Cost-efficient training; competitive reasoning at a fraction of the inference cost.	DeepSeek-V3 and R1 (2024 to 2025)
Mistral	Mistral AI	Open weights and proprietary	Compact European-built models, strong multilingual coverage.	Mistral Large 2 (2025)

All of these systems use decoder-only Transformer cores, so the architectural distance between them is small. The differences come from training data, alignment recipes, safety tuning, and deployment surfaces. Independent reviewers tend to describe GPT models as the broadest "all-purpose" choice, Claude as the strongest at long-form writing and careful document work, Gemini as the best fit for users in Google's ecosystem, and LLaMA as the leading open-weight option.^[23]

What can GPT do

What a GPT can actually do depends on the version and the deployment, but the broad envelope is consistent across the family.

Open-ended text generation. Drafting emails, articles, fiction, marketing copy, and structured documents from a short prompt.
Question answering. Both closed-book (drawing on parametric knowledge) and retrieval-augmented (combining an external search step with the model).
Code generation, completion, and review. GPT-4 and later are the basis for GitHub Copilot, Cursor, and many other developer tools. GPT-5 reports 74.9% on SWE-bench Verified, a real-world software engineering benchmark.^[12]
Mathematical reasoning. GPT-5 thinking reports 94.6% on AIME 2025 without tools and is competitive with reasoning specialists on graduate-level math problems.^[12]
Multimodal understanding. From GPT-4 onward, the models accept images as input. GPT-4o adds real-time audio input and output. GPT-5 maintains and extends these modalities.
Tool and function calling. Through the OpenAI API, GPT models can call external functions, retrieve documents, browse the web, run Python in a sandbox, and generate images via DALL-E.
Translation. Strong on high-resource language pairs; weaker on low-resource languages where the training data is thin.
Persona and instruction following. A long system prompt can steer tone, format, and policy behavior, which is the basis for both Custom GPTs and enterprise deployments.

OpenAI reported that GPT-4 "passes a simulated bar exam with a score around the top 10% of test takers," achieving 298 out of 400 on the Uniform Bar Exam, a result later academic work argued overstated GPT-4's standing relative to the full pool of test takers.^[16]^[41]

What are the limitations of GPT

GPT models share the failure modes of all current large language models.

Hallucination. Generating fluent but factually wrong statements with high confidence. Hallucination rates have fallen across generations; OpenAI says GPT-5 with web search produces about 45% fewer factual errors than GPT-4o, but the problem has not gone away.^[12]
Stale knowledge. Each model has a training data cut-off, after which information is unknown unless retrieved at inference time.
Sensitivity to prompt phrasing. Small changes in wording or example ordering can change answers, sometimes substantially.
Long-tail reliability. Performance is excellent on common tasks and degrades on rare or adversarially constructed inputs.
Reasoning gaps. Even reasoning-tuned variants make algebraic mistakes, miscount, or fail at puzzles a careful human would solve.
Bias and stereotype amplification. Pretraining on internet text imports the biases of that text, and post-training can only partially correct it.
Privacy and copyright concerns. GPTs can occasionally reproduce passages from training data, which has spawned multiple lawsuits, including the December 2023 New York Times v. OpenAI and Microsoft case.
Cost and latency. Frontier GPT models still require expensive accelerator hardware; serving GPT-5 thinking at scale is materially more expensive than serving GPT-3.5.

Reception and impact

The research community received GPT-3 as a watershed paper; "Language Models are Few-Shot Learners" won a Best Paper award at NeurIPS 2020 and reframed how researchers think about scale and emergent behavior.^[3] ChatGPT's launch in November 2022 is the moment when most of the public, regulators, and the broader software industry took notice. Within a year, generative AI had become a fixture of national policy debates, including the U.S. executive order on AI of October 2023, the EU AI Act of 2024, and the UK AI Safety Summit at Bletchley Park.

Economic impact has been concrete. GitHub Copilot, built on OpenAI Codex (a GPT-3 derivative) and later GPT-4 class models, was reported to be in use by tens of millions of developers by 2025. A 2023 study by Brynjolfsson, Li, and Raymond examined the staggered rollout of a GPT-based assistant to 5,179 customer support agents and found that access to the tool raised issues resolved per hour by 14% on average, with a 34% gain for novice and low-skilled workers and little measurable effect on the most experienced staff.^[42] Many companies have built internal GPT deployments through Microsoft Azure OpenAI Service, ChatGPT Enterprise, and the OpenAI API, citing customer support, code authoring, and document workflows as primary use cases.

Reception has not been uniformly positive. Critics have pointed to the environmental cost of training and serving frontier models, the displacement of certain knowledge-work jobs, the concentration of frontier compute in a small number of U.S. and Chinese firms, the legal status of training on copyrighted work, and the safety risks of deploying systems whose internal reasoning is not interpretable. GPT-5's launch in August 2025 drew particular pushback from longtime ChatGPT users who said the new default model felt "flat" compared to GPT-4o and complained about the automatic router routing them to a smaller model than they wanted; OpenAI subsequently adjusted defaults and exposed model selection more directly.^[12]

Cultural footprint

"GPT" has entered everyday language as shorthand for "AI chatbot," similar to how "Google" became a verb for web search. The phrase "according to ChatGPT" appears in news articles, court filings, classroom syllabi, and political speeches. The acronym is referenced in books, television, and stand-up routines, and "the model" or "the GPT" is often invoked the way "the algorithm" was a few years earlier. Many writers and artists have organized against generative AI tools, especially after the 2023 Writers Guild of America strike, which secured contractual protections against the unconsented use of GPT-style systems in television and film writing.

In academia, GPTs have prompted rapid changes to assessment practice. Many universities re-introduced in-class exams or oral defenses after 2023, and journals such as Nature and Science updated their editorial policies in early 2023 to require disclosure of any GPT use in submitted manuscripts. Detection tools that claim to identify GPT output have struggled to keep up; OpenAI itself shut down its public AI text classifier in mid-2023, citing low accuracy.

The legal system has also had to adapt. In the New York case Mata v. Avianca (2023), two attorneys were sanctioned after submitting a brief that cited fictional cases hallucinated by ChatGPT. Multiple court systems in the United States, the United Kingdom, and Australia have since issued standing orders that require lawyers to disclose any use of GPT-style tools in filings and to verify every citation independently. Several state bars have published advisory opinions on the duty of competence in working with generative AI.

Safety, alignment, and policy

OpenAI has framed each GPT release with a public safety document called a system card or model card, beginning with GPT-4. These documents describe the alignment training, red-teaming, capability evaluations, and known failure modes for the model. They are not peer-reviewed and have been criticized for omitting key technical details, but they have also become a de facto industry standard; competitors including Anthropic, Google DeepMind, and Meta now publish similar documents for their own flagship models.

Alignment research on GPT-class models is an active field. Major threads include scalable oversight (training models to do tasks humans cannot easily evaluate), interpretability (understanding what circuits inside the network are doing), jailbreaking defenses (preventing users from bypassing safety training), and evaluation of dangerous capabilities such as biosecurity, cybersecurity, and autonomous replication. The Frontier Model Forum, founded in July 2023 by OpenAI, Anthropic, Google, and Microsoft, coordinates some of this work across labs.

Policy interest in GPT has accelerated in parallel. The U.S. AI Executive Order of October 2023 set reporting thresholds for models trained with more than 10^26 floating-point operations, a bar that GPT-4 and its successors are believed to clear. The EU AI Act, finalized in 2024 and entering force in stages through 2026, classifies general-purpose AI models with "systemic risk" and imposes additional transparency, evaluation, and incident reporting obligations on their providers. China requires generative AI services to undergo security assessments and content filtering, which is why models such as Wenxin Yiyan went through an approval process before public release. The United Kingdom established the AI Safety Institute, which conducts pre-deployment evaluations of frontier models, including GPT-class systems, under voluntary agreements with the labs.

References

Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. (2018). "Improving Language Understanding by Generative Pre-Training." OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf ↩
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." OpenAI. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf ↩
Brown, T.B. et al. (2020). "Language Models are Few-Shot Learners." arXiv:2005.14165. https://arxiv.org/abs/2005.14165 ↩
United States Patent and Trademark Office (February 2024). Refusal of OpenAI "GPT" trademark application. Reported in: Coldewey, D. "No 'GPT' trademark for OpenAI." TechCrunch, February 15, 2024. https://techcrunch.com/2024/02/15/no-gpt-trademark-for-openai/ ↩
Goodmans LLP (2024). "OpenAI's GPT Trademark Application Rejected by USPTO." https://www.goodmans.ca/insights/post/goodmans-ip-blog/openai-s-gpt-trademark-application-rejected-by-uspto ↩
Hu, K. (February 2, 2023). "ChatGPT sets record for fastest-growing user base." Reuters. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ ↩
OpenAI (November 30, 2022). "Introducing ChatGPT." https://openai.com/index/chatgpt/ ↩
OpenAI (March 14, 2023). "GPT-4." https://openai.com/index/gpt-4-research/ ↩
OpenAI (May 13, 2024). "Hello GPT-4o." https://openai.com/index/hello-gpt-4o/ ↩
OpenAI (April 14, 2025). "Introducing GPT-4.1 in the API." https://openai.com/index/gpt-4-1/ ↩
Wikipedia. "GPT-4.1." https://en.wikipedia.org/wiki/GPT-4.1 ↩
OpenAI (August 7, 2025). "Introducing GPT-5." https://openai.com/index/introducing-gpt-5/ ; Wikipedia. "GPT-5." https://en.wikipedia.org/wiki/GPT-5 ↩
Vaswani, A. et al. (2017). "Attention Is All You Need." NeurIPS 2017. https://arxiv.org/abs/1706.03762 ↩
Wolfe, C. (2023). "Decoder-Only Transformers: The Workhorse of Generative LLMs." https://cameronrwolfe.substack.com/p/decoder-only-transformers-the-workhorse ↩
The Decoder (July 2023). "GPT-4 architecture, datasets, costs and more leaked." https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/ ↩
OpenAI (March 2023). "GPT-4 Technical Report." arXiv:2303.08774. https://arxiv.org/abs/2303.08774 ↩
Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." arXiv:2203.02155. https://arxiv.org/abs/2203.02155 ↩
Kaplan, J. et al. (2020). "Scaling Laws for Neural Language Models." arXiv:2001.08361. https://arxiv.org/abs/2001.08361 ↩
Hoffmann, J. et al. (2022). "Training Compute-Optimal Large Language Models" (Chinchilla). arXiv:2203.15556. https://arxiv.org/abs/2203.15556 ↩
OpenAI (September 2024). "Learning to Reason with LLMs" (introducing OpenAI o1). https://openai.com/index/learning-to-reason-with-llms/ ↩
OpenAI (November 6, 2023). "Introducing GPTs." https://openai.com/index/introducing-gpts/ ↩
OpenAI (January 10, 2024). "Introducing the GPT Store." https://openai.com/index/introducing-the-gpt-store/ ↩
Wikipedia. "Generative pre-trained transformer." https://en.wikipedia.org/wiki/Generative_pre-trained_transformer ↩
Wikipedia. "GPT-2." https://en.wikipedia.org/wiki/GPT-2
Wikipedia. "GPT-3." https://en.wikipedia.org/wiki/GPT-3
Wikipedia. "GPT-4o." https://en.wikipedia.org/wiki/GPT-4o
Wikipedia. "Ernie Bot." https://en.wikipedia.org/wiki/Ernie_Bot
Bloomberg L.P. (March 30, 2023). "Introducing BloombergGPT, Bloomberg's 50-billion parameter large language model." https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/
Wu, S. et al. (2023). "BloombergGPT: A Large Language Model for Finance." arXiv:2303.17564. https://arxiv.org/abs/2303.17564
Wikipedia. "GPT Store." https://en.wikipedia.org/wiki/GPT_Store
Wikipedia. "GPT-5.5." https://en.wikipedia.org/wiki/GPT-5.5 ↩
Wikipedia. "GPT-5.2." https://en.wikipedia.org/wiki/GPT-5.2 ↩
OpenAI (April 23, 2026). "Introducing GPT-5.5." https://openai.com/index/introducing-gpt-5-5/ ↩
Wiggers, K. (April 23, 2026). "OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app'." TechCrunch. https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/ ↩
OpenAI (May 5, 2026). "GPT-5.5 Instant: smarter, clearer, and more personalized." https://openai.com/index/gpt-5-5-instant/ ↩
Wiggers, K. (May 5, 2026). "OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT." TechCrunch. https://techcrunch.com/2026/05/05/openai-releases-gpt-5-5-instant-a-new-default-model-for-chatgpt/ ↩
OpenAI (February 13, 2026). "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." https://openai.com/index/retiring-gpt-4o-and-older-models/ ↩
Wiggers, K. (February 27, 2026). "ChatGPT reaches 900M weekly active users." TechCrunch. https://techcrunch.com/2026/02/27/chatgpt-reaches-900m-weekly-active-users/ ↩
Wiggers, K. (February 20, 2025). "OpenAI now serves 400 million users every week." TechCrunch. https://techcrunch.com/2025/02/20/openai-now-serves-400-million-users-every-week/ ↩
Reuters (July 21, 2025). "OpenAI to release web browser in challenge to Google Chrome" (reporting ChatGPT around 500 million weekly active users). https://www.reuters.com/business/media-telecom/openai-release-web-browser-challenge-google-chrome-2025-07-09/ ↩
Martinez, E. (2024). "Re-evaluating GPT-4's bar exam performance." Artificial Intelligence and Law. https://link.springer.com/article/10.1007/s10506-024-09396-9 ↩
Brynjolfsson, E.; Li, D.; Raymond, L. (2023). "Generative AI at Work." NBER Working Paper 31161. https://www.nber.org/papers/w31161 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

8 revisions by 1 contributors · full history

Suggest edit

GPT

What is GPT

Lineage of OpenAI GPT models

Architecture

How is GPT trained

Pre-training

Fine-tuning and alignment

Scaling laws

In-context learning and prompting

ChatGPT and consumer adoption

What are Custom GPTs and the GPT Store

Other "GPT" labels

How does GPT compare with other LLM families

What can GPT do

What are the limitations of GPT

Reception and impact

Cultural footprint

Safety, alignment, and policy

See also

Recent developments (2026)

References

Improve this article

What links here (24 of 164)

What links here (24 of 164)

What is GPT

Lineage of OpenAI GPT models

Architecture

How is GPT trained

Pre-training

Fine-tuning and alignment

Scaling laws

In-context learning and prompting

ChatGPT and consumer adoption

What are Custom GPTs and the GPT Store

Other "GPT" labels

How does GPT compare with other LLM families

What can GPT do

What are the limitations of GPT

Reception and impact

Cultural footprint

Safety, alignment, and policy

See also

Recent developments (2026)

References

Improve this article

Related Articles

GPT-5

GPT-3.5

OpenAI o1

OpenAI o3

GPT-4.1

GPT-5.4

What links here (24 of 164)

Related Articles

GPT-5

GPT-3.5

OpenAI o1

OpenAI o3

GPT-4.1

GPT-5.4

What links here (24 of 164)