GPT (Generative Pre-trained Transformer)

Generative Pre-trained Transformer (GPT) is a family of autoregressive large language models developed by OpenAI. Built on the decoder-only transformer architecture, GPT models are pre-trained on massive text corpora using next-token prediction and then refined through supervised fine-tuning and reinforcement learning from human feedback (RLHF). Since the release of GPT-1 in 2018, the series has grown through several major iterations, including GPT-2, GPT-3, GPT-4, GPT-4o, GPT-4.5, GPT-5, and the post-GPT-5 point releases that arrived through 2025 and into 2026. Each generation has marked a notable jump in scale, capability, or deployment surface. GPT models have played a central role in the rapid expansion of generative AI, powering products such as ChatGPT and the OpenAI API, and influencing the broader development of language models across the industry.

ELI5: explain like I'm five

Imagine you have a really smart parrot that has read millions and millions of books, websites, and articles. When you start a sentence, the parrot guesses the next word based on everything it has read. It does this one word at a time, over and over, until it has written a whole paragraph or page. That is basically what GPT does. "Generative" means it creates new text. "Pre-trained" means it studied a huge amount of text before you ever talked to it. "Transformer" is the name of the math recipe it uses to understand which words are important and how they relate to each other. The more books the parrot reads (more data) and the bigger its brain gets (more parameters), the better it becomes at writing things that sound like a real person wrote them.

Architecture

Decoder-only transformer design

GPT models use a decoder-only variant of the transformer architecture introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. Unlike the original transformer, which includes both an encoder and a decoder, GPT retains only the decoder stack. Each decoder block consists of two primary sublayers: a causally masked self-attention layer and a position-wise feed-forward neural network. Layer normalization and residual connections are applied around each sublayer.

The causal (or "masked") self-attention mechanism ensures that when the model processes a sequence, each token can only attend to tokens at earlier positions. This left-to-right constraint is what makes the model autoregressive: it generates text one token at a time, with each new token conditioned on all preceding tokens.

Key architectural components

Component	Description
Token embedding	Maps each input token ID to a dense vector representation
Positional embedding	Encodes the position of each token in the sequence (learned embeddings in GPT-1/2/3, rotary or ALiBi-style schemes in later models)
Causal self-attention	Computes weighted sums of token representations, restricted so each position attends only to earlier positions
Feed-forward network	A two-layer fully connected network applied independently to each position
Layer normalization	Normalizes activations to stabilize training; GPT-2 and later moved this before each sublayer (pre-norm)
Language model head	A linear projection from the final hidden state to the vocabulary, producing logits for next-token prediction
Mixture-of-experts (rumored, GPT-4 onward)	Sparse routing of tokens through a subset of expert sub-networks; widely reported but not officially confirmed by OpenAI

Pre-training objective

The core training objective for all GPT models is next-token prediction (also called causal language modeling). Given a sequence of tokens, the model learns to predict the probability distribution over the vocabulary for the next token at each position. The loss function is the cross-entropy between the predicted distribution and the actual next token. This simple but powerful objective, applied over hundreds of billions to trillions of tokens, enables GPT models to learn grammar, factual knowledge, reasoning patterns, and even some degree of common sense from raw text.

Inference-time reasoning

Beginning with the o-series in late 2024 and merged into the GPT line with GPT-5 in 2025, OpenAI added a second axis of compute: thinking-time reasoning. Instead of producing an answer in a single forward pass, "thinking" variants of GPT generate a long internal chain of thought before emitting the user-visible response. The amount of internal reasoning is now controllable through discrete effort levels (none, low, medium, high, and in GPT-5.4 an additional "xhigh" level), letting developers trade latency and cost against accuracy on a per-request basis.

History and evolution

GPT-1 (2018)

OpenAI introduced GPT-1 in June 2018 with the paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. The model contained 117 million parameters organized in 12 transformer layers with a model dimension of 768 and a context window of 512 tokens.

GPT-1 was pre-trained on the BooksCorpus dataset, which contained roughly 7,000 unpublished books spanning a wide range of genres. The key insight of the paper was that generative pre-training on a large, unlabeled text corpus, followed by discriminative fine-tuning on specific downstream tasks, could achieve strong results across many natural language understanding benchmarks. The model improved on the state of the art in 9 out of 12 tasks evaluated, including textual entailment, question answering, and semantic similarity. This two-stage approach (unsupervised pre-training followed by supervised fine-tuning) became the foundational paradigm for subsequent GPT models and for the broader field of transfer learning in NLP.

GPT-2 (2019)

GPT-2 was announced in February 2019 with the paper "Language Models are Unsupervised Multitask Learners" by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. The model scaled to 1.5 billion parameters (roughly 13 times GPT-1) and was trained on WebText, a dataset of approximately 8 million web pages (about 40 GB of text) scraped from outbound links on Reddit that received at least three upvotes.

The central finding of GPT-2 was that a sufficiently large language model, trained on sufficiently diverse data, could perform many tasks in a zero-shot setting, without any task-specific fine-tuning. The model demonstrated coherent multi-paragraph text generation, basic reading comprehension, translation, and summarization capabilities, all driven purely by next-token prediction at scale.

GPT-2 became notable for its controversial staged release strategy. OpenAI initially released only the smallest version (124 million parameters) in February 2019, citing concerns about potential misuse for generating convincing fake news, spam, and disinformation. The 355 million parameter version followed in May, the 774 million version in August, and the full 1.5 billion parameter model in November 2019. OpenAI published a report titled "Release Strategies and the Social Impacts of Language Models" alongside the staged rollout, noting that the feared large-scale misuse had not materialized from the smaller releases. This was one of the first high-profile cases of an AI lab arguing that a model posed sufficient misuse risk to warrant controlled release, setting a precedent for safety-conscious deployment in the field.

GPT-3 (2020)

GPT-3 was described in the May 2020 paper "Language Models are Few-Shot Learners" by Tom Brown, Benjamin Mann, Nick Ryder, and dozens of co-authors at OpenAI. The paper presented a family of eight models ranging from 125 million to 175 billion parameters, with the largest designated as GPT-3.

The 175 billion parameter model used 96 transformer layers, a model dimension of 12,288, and 96 attention heads. It was trained on a mixture of datasets: a filtered version of Common Crawl (60% of weighted training data), WebText2, two book corpora, and English Wikipedia, totaling approximately 300 billion tokens. The context window was 2,048 tokens. The estimated training compute cost was in the range of $4.6 million on contemporary GPU hardware.

GPT-3's breakthrough contribution was demonstrating that in-context learning (or few-shot learning) emerged at scale. Rather than requiring gradient-based fine-tuning for each new task, users could simply provide a few examples of the desired input-output pattern in the prompt, and the model would generalize to new instances. This capability enabled GPT-3 to perform competitively on a wide range of NLP benchmarks, and in some cases match or exceed fine-tuned models, all without updating any model weights.

OpenAI chose not to release GPT-3's weights publicly. Instead, the model was made available through the OpenAI API, which launched in a private beta in June 2020. This marked a shift toward a commercial API-based deployment model for large language models.

GPT-3.5 and ChatGPT (2022)

GPT-3.5 refers to a family of models that served as intermediate improvements between GPT-3 and GPT-4. These models incorporated instruction tuning, a technique where the model is further trained on examples of following human instructions, making it more reliable at interpreting and executing user prompts. The code-davinci-002 and text-davinci-003 variants are among the best-known GPT-3.5 models.

On November 30, 2022, OpenAI launched ChatGPT, a conversational interface built on top of a GPT-3.5 variant fine-tuned with RLHF. ChatGPT attracted 1 million users within five days and reached 100 million monthly active users by January 2023, making it the fastest-growing consumer application in history at that time. ChatGPT's success demonstrated the commercial viability of conversational AI and ignited widespread public interest in large language models.

InstructGPT and RLHF (2022)

The paper "Training Language Models to Follow Instructions with Human Feedback" by Long Ouyang et al., published on arXiv in March 2022 and presented at NeurIPS 2022, introduced InstructGPT. The paper addressed a fundamental problem: making language models larger does not inherently make them better at following user intent. Large models can produce outputs that are untruthful, toxic, or unhelpful.

InstructGPT applied a three-step alignment process:

Supervised fine-tuning (SFT): Human labelers wrote demonstrations of desired model behavior for a set of prompts. A GPT-3 model was fine-tuned on these demonstrations.
Reward model training: Human labelers ranked multiple model outputs for the same prompt. These rankings were used to train a reward model that could predict human preferences.
Reinforcement learning from human feedback (RLHF): The SFT model was further optimized using Proximal Policy Optimization (PPO) against the reward model, nudging the model toward outputs humans would prefer.

A striking result was that the 1.3 billion parameter InstructGPT model was preferred by human evaluators over the 175 billion parameter GPT-3, despite having roughly 100 times fewer parameters. InstructGPT also showed measurable improvements in truthfulness and reductions in toxic output generation. The RLHF methodology pioneered by InstructGPT became the standard alignment technique for virtually all subsequent commercial language models.

GPT-4 (2023)

OpenAI released GPT-4 on March 14, 2023. The accompanying technical report described it as a "large-scale, multimodal model which can accept image and text inputs and produce text outputs." Unlike previous GPT models, OpenAI disclosed very few architectural details, citing competitive and safety considerations. The parameter count was not officially confirmed, though some press reports citing unnamed sources suggested it may contain around 1 trillion parameters arranged as a mixture of experts. CEO Sam Altman stated that the training cost exceeded $100 million.

GPT-4 represented a substantial capability jump over GPT-3.5. On the Uniform Bar Examination, GPT-4 scored in the top 10% of test takers, compared to GPT-3.5's bottom 10%. On the MMLU benchmark, GPT-4 achieved 86.4% accuracy in English and surpassed prior models across 24 of 26 other languages tested. On the Torrance Tests of Creative Thinking, GPT-4 scored in the top 1% for originality and fluency. The model also passed the USMLE medical licensing exam by more than 20 points above the passing threshold.

GPT-4 was released in two context-window variants: 8,192 tokens and 32,768 tokens, both substantially larger than GPT-3.5's 4,096-token limit.

GPT-4 Turbo, announced at OpenAI DevDay in November 2023, expanded the context window to 128,000 tokens (equivalent to roughly 300 pages of text), updated the knowledge cutoff to April 2023, added a JSON output mode, improved instruction following, and reduced API pricing to one-third of the original GPT-4 rate.

GPT-4o (2024)

GPT-4o ("o" for "omni") was announced on May 13, 2024. It is a natively multimodal model that can process and generate text, images, and audio within a single architecture. Unlike earlier GPT-4 variants that relied on separate models for audio processing, GPT-4o handles voice-to-voice interaction natively, with audio response latency averaging 320 milliseconds, comparable to human conversational response times.

GPT-4o matches GPT-4 Turbo performance on English text and code tasks, while delivering significant improvements on non-English languages (supporting over 50 languages covering 97% of the world's speakers). It operates at roughly half the cost and twice the speed of GPT-4 Turbo through the API. The model uses a 128,000-token context window. GPT-4o was made available to free-tier ChatGPT users, significantly broadening access to frontier model capabilities.

o1 and the o-series launch (late 2024)

In September 2024 OpenAI released o1-preview and o1-mini, the first models trained explicitly to perform extended chain-of-thought reasoning before producing a final answer. The full o1 model followed in December 2024 alongside ChatGPT Pro, a $200/month tier that bundled higher usage limits and access to "o1 pro mode," which spent additional compute per query for higher reliability on hard problems. o1 dramatically outperformed GPT-4o on competition math, graduate-level science questions (GPQA), and competitive programming, while being slower and more expensive per query.

GPT-4.5 "Orion" (February 2025)

GPT-4.5, codenamed "Orion," was released on February 27, 2025. OpenAI described it as the company's largest model trained with traditional unsupervised pre-training, signaling that this branch of the GPT line had reached the end of the road. Subscribers to the $200/month ChatGPT Pro tier received access on the day of announcement; ChatGPT Plus and Team users gained access the following week.

The model emphasized broader world knowledge, better instruction following, and what OpenAI called higher "EQ," with measurably lower hallucination rates than GPT-4o on internal evaluations. It outperformed GPT-4o across all 15 languages tested on MMLU, including Arabic, Bengali, Chinese, Hindi, Swahili, and Yoruba. Reception was mixed: critics characterized GPT-4.5 as an incremental upgrade rather than a leap, and the API pricing of $75 per million input tokens and $150 per million output tokens was an order of magnitude higher than GPT-4o, limiting practical adoption.

OpenAI deprecated GPT-4.5 from the API on July 14, 2025, in favor of the cheaper and faster GPT-4.1 family. On August 7, 2025, the day GPT-5 launched, GPT-4.5 was removed from ChatGPT Plus and Teams, surviving only as a "Legacy Models" option for Pro subscribers.

GPT-4.1 (April 2025)

On April 14, 2025, OpenAI released the GPT-4.1 model family, consisting of three variants: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models featured major improvements in coding, instruction following, and long-context comprehension, with a context window expanded to 1 million tokens.

Model	Key strengths	Notable benchmarks
GPT-4.1	Coding and instruction following	SWE-bench Verified: 21.4% improvement over GPT-4o; MultiChallenge: 38.3% (10.5% above GPT-4o)
GPT-4.1 mini	Small-model performance leader	Matches or exceeds GPT-4o in many benchmarks at 83% lower cost and nearly half the latency
GPT-4.1 nano	Speed and cost optimized	MMLU: 80.1%; GPQA: 50.3%; supports 1 million token context

The GPT-4.1 series used a knowledge cutoff of June 2024 and was initially available only through the API before being rolled out to ChatGPT Plus, Pro, and Team users.

o3 and o4-mini (April 2025)

Two days after GPT-4.1, on April 16, 2025, OpenAI released o3 and o4-mini, the next generation of its reasoning models. (The "o2" name was skipped to avoid a trademark conflict with the British mobile carrier O2.) o3-mini had already shipped on January 31, 2025, and o3-pro followed on June 10, 2025.

These models are notable for being the first OpenAI reasoning models that can agentically chain together every tool inside ChatGPT, including web browsing, Python execution, file analysis, image generation, and visual reasoning over uploaded screenshots and diagrams. o3 reduced major errors by roughly 20% relative to o1 on difficult real-world tasks evaluated by external experts.

Model	Release	Headline result
o3-mini	January 31, 2025	Three reasoning effort levels (low/medium/high); free-tier accessible
o3	April 16, 2025	GPQA Diamond 87.7%; SWE-bench Verified 71.7%; Codeforces Elo 2727; ~3x ARC-AGI score over o1
o4-mini	April 16, 2025	Best benchmarked model on AIME 2024 and AIME 2025 at the time of release
o3-pro	June 10, 2025	Parallel test-time compute for the most reliable single answer

GPT-5 (August 2025)

GPT-5 was released on August 7, 2025. It is the first GPT model to integrate fast-response and reasoning capabilities into a single product through a real-time router that decides, per turn, whether to answer with a high-throughput model (gpt-5-main) or hand the turn to a deeper reasoning model (gpt-5-thinking). The router considers conversation type, task complexity, tool requirements, and explicit user intent (for example, the phrase "think step by step").

OpenAI reported that GPT-5 set a new state of the art across math (94.6% on AIME 2025 without tools), real-world coding (74.9% on SWE-bench Verified, 88% on Aider Polyglot), multimodal understanding (84.2% on MMMU), and health (46.2% on HealthBench Hard). The company stated that GPT-5 with thinking matched or exceeded o3 while using 50% to 80% fewer output tokens, and reduced hallucination rates by up to 80% compared to GPT-4o. Agentic capabilities expanded too: GPT-5 can spin up its own desktop sandbox, drive a browser, and chain multi-step research workflows without supervision.

The product surface includes:

Variant	Tier	Notes
gpt-5-main	Free, Plus, Pro	Default fast model used by the router for most queries
gpt-5-main-mini	Plus, free fallback	Smaller fast model for routine completions
gpt-5-thinking	Plus (limited), Pro (higher limits)	Deeper reasoning, longer latency
gpt-5-thinking-mini	Plus, API	Reasoning at lower cost
gpt-5-thinking-nano	API only	Smallest reasoning variant for embedded uses
gpt-5-thinking-pro	Pro only ("GPT-5 Pro")	Parallel test-time compute for highest reliability

API pricing at launch was $0.625 per million input tokens and $5.00 per million output tokens for gpt-5-main, with GPT-5 Pro priced at a premium ($15 input / $120 output per million tokens). Free ChatGPT users received GPT-5 access for the first time, with tighter daily limits than paid tiers.

GPT-5.1 (November 2025)

GPT-5.1 launched on November 12, 2025. Two variants shipped that day, with mini and nano versions arriving the week of November 19. The headline change was tone: GPT-5.1 Instant was retuned to feel warmer and more conversational than GPT-5, with eight selectable personality presets (default, friendly, professional, candid, encouraging, witty, efficient, and quirky). Adaptive reasoning was added to GPT-5.1 Instant so the model could opportunistically "think" on harder questions without flipping to a separate variant. GPT-5.1 Thinking adjusted its internal reasoning length to question difficulty more precisely than GPT-5, posting gains on AIME 2025 and Codeforces. A specialized GPT-5.1-Codex-Max variant for autonomous coding agents shipped later in the same release window.

GPT-5.2 (December 2025)

GPT-5.2 was released on December 11, 2025, in three forms: GPT-5.2 Instant, GPT-5.2 Thinking (with a standard and an extended thinking mode), and GPT-5.2 Pro. The release focused on quantitative work, with measurable gains on spreadsheet construction, financial modeling, presentation generation, and multi-step project execution.

GPT-5.3-Codex and GPT-5.4 (early 2026)

GPT-5.3-Codex, released on February 5, 2026, fused the Codex coding stack with the GPT-5 reasoning stack into a single agentic coding model, replacing the separate Codex line that had run alongside the consumer GPT line since 2021.

GPT-5.4 followed on March 5, 2026, initially as Thinking and Pro variants for paid users; mini and nano arrived on March 17 with the mini version available to free-tier subscribers. GPT-5.4 added five discrete reasoning effort levels (none, low, medium, high, xhigh), a 272K context window for code and document workflows, built-in computer use capabilities, and improved deep research. OpenAI reported a 33% reduction in factual errors compared to GPT-5.2 and 75% on the OSWorld-Verified computer-use benchmark, up from GPT-5.2's 47.3%. Reception praised the Thinking variant's lower hallucination rate but criticized the higher per-token API pricing for the smaller variants (roughly 4x the equivalent GPT-5 pricing).

GPT-5.5 (April 2026)

GPT-5.5 became broadly available on April 23, 2026, rolling out to ChatGPT Plus, Pro, Business, and Enterprise users, with a higher-end GPT-5.5 Pro variant restricted to Pro, Business, and Enterprise tiers. OpenAI described GPT-5.5 as "a new class of intelligence," emphasizing improvements in agentic computer use, autonomous knowledge work, scientific research workflows (with claimed contributions to drug-discovery pipelines), and cybersecurity defense. The company reported that GPT-5.5 outperformed Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.5 across multiple benchmarks. API pricing roughly doubled headline rates relative to GPT-5.4 ($5 input / $30 output per million tokens, with GPT-5.5 Pro at $30 / $180), though OpenAI noted that lower per-task token consumption made the effective cost increase closer to 20%.

GPT-5.1 was retired from ChatGPT on March 11, 2026, with existing conversations migrated to the equivalent GPT-5.3 Instant, GPT-5.4 Thinking, or GPT-5.4 Pro variant. The same kind of silent migration is expected as the GPT-5 series continues to iterate.

Model comparison across generations

Version	Release date	Parameters	Context window	Training data	Key innovation
GPT-1	June 2018	117M	512 tokens	BooksCorpus	Generative pre-training + fine-tuning paradigm
GPT-2	February 2019	1.5B	1,024 tokens	WebText (40 GB)	Zero-shot task performance; staged release
GPT-3	May 2020	175B	2,048 tokens	Common Crawl mix (300B tokens)	Few-shot in-context learning
GPT-3.5	March 2022	Undisclosed	4,096 tokens	Undisclosed	Instruction tuning; basis of ChatGPT
GPT-4	March 2023	Undisclosed (rumored ~1T MoE)	8K / 32K tokens	Undisclosed	Multimodal input (text + image)
GPT-4 Turbo	November 2023	Undisclosed	128K tokens	Updated to April 2023	Longer context, lower cost, JSON mode
GPT-4o	May 2024	Undisclosed	128K tokens	Undisclosed	Native multimodal (text + image + audio)
o1	December 2024	Undisclosed	128K tokens	Undisclosed	First public reasoning model with inference-time chain of thought
GPT-4.5 "Orion"	February 27, 2025	Undisclosed (largest pre-trained GPT)	128K tokens	Undisclosed	Final pure pre-training scale-up; deprecated August 2025
GPT-4.1	April 14, 2025	Undisclosed	1M tokens	Updated to June 2024	1M context, major coding gains
o3 / o4-mini	April 16, 2025	Undisclosed	200K tokens	Undisclosed	Agentic tool use across all ChatGPT tools
GPT-5	August 7, 2025	Undisclosed	256K-class	Undisclosed	Dynamic routing between fast and reasoning variants
GPT-5.1	November 12, 2025	Undisclosed	256K-class	Undisclosed	Adaptive reasoning in Instant; eight personality presets
GPT-5.2	December 11, 2025	Undisclosed	256K-class	Undisclosed	Spreadsheets, financial modeling, multi-step projects
GPT-5.3-Codex	February 5, 2026	Undisclosed	272K tokens	Undisclosed	Unified Codex + GPT-5 agentic coding model
GPT-5.4	March 5, 2026	Undisclosed	272K tokens	Undisclosed	Five reasoning effort levels; 75% on OSWorld-Verified
GPT-5.5	April 23, 2026	Undisclosed	Undisclosed	Undisclosed	Agentic computer use; scientific research; cybersecurity

Pricing snapshot (April 2026)

API list prices for selected GPT and o-series models, in US dollars per million tokens. Volume discounts, batch processing, and prompt caching can substantially reduce effective cost.

Model	Input	Output	Notes
GPT-4o	$2.50	$10.00	Long-running default for cheap multimodal use
GPT-4.1	$2.00	$8.00	1M context
GPT-4.1 mini	$0.40	$1.60	Cost-leader for lightweight tasks
GPT-4.1 nano	$0.10	$0.40	Embedded and high-volume use
GPT-5 (main)	$0.625	$5.00	Routed default in ChatGPT
GPT-5 Pro	$15.00	$120.00	Parallel test-time compute
GPT-5.4	$2.50	$15.00	March 2026 generation
GPT-5.5	$5.00	$30.00	Current flagship as of April 2026
GPT-5.5 Pro	$30.00	$180.00	Highest-reliability tier

Training methodology

Stage 1: Pre-training

All GPT models begin with unsupervised pre-training on large text corpora. The model processes sequences of tokens and learns to predict the next token at each position using cross-entropy loss. This phase requires enormous computational resources: GPT-3's training consumed an estimated 3,640 petaflop-days of compute. Pre-training datasets have grown from the 4.5 GB BooksCorpus used for GPT-1 to the hundreds-of-terabytes-scale web crawls used for later models. The datasets typically include filtered web text, books, academic papers, and code repositories. GPT-4.5 was officially identified as the last model in the family to be built primarily through this kind of large-scale unsupervised pre-training; subsequent models lean more heavily on synthetic data, reasoning traces, and reinforcement learning over verifiable rewards.

Stage 2: Supervised fine-tuning (SFT)

After pre-training, the model is fine-tuned on a curated dataset of prompt-response pairs, where human labelers provide high-quality demonstrations of desired behavior. This stage teaches the model to follow instructions, maintain a conversational tone, and produce structured outputs. SFT was introduced as a formal alignment step with InstructGPT and has been applied to all subsequent GPT models.

Stage 3: Reinforcement learning from human feedback (RLHF)

The SFT model generates multiple responses to each prompt, which are then ranked by human evaluators. These rankings train a reward model that predicts human preference scores. The language model is then optimized using the PPO (Proximal Policy Optimization) algorithm to maximize the reward model's scores while staying close to the SFT model's distribution (to prevent reward hacking). RLHF has proven effective at reducing harmful outputs, improving factual accuracy, and making model behavior more aligned with user expectations.

Stage 4: Reinforcement learning over verifiable rewards (o-series and GPT-5 line)

Starting with o1, OpenAI added a fourth stage that trains the model to generate long chains of internal reasoning and rewards trajectories that produce verifiably correct answers (for example, math problems with checkable solutions, unit-tested code, or browser actions whose outcomes can be inspected). This is the source of the dramatic reasoning gains in o3, GPT-5 thinking, and GPT-5.4 Thinking. It is also why the thinking variants tend to consume significantly more compute per response than the older RLHF-only models.

Red-teaming and safety evaluations

Each major release is preceded by a system card describing internal and external red-teaming results. For GPT-4 OpenAI worked with the Alignment Research Center on autonomous-replication and resource-acquisition tests. For GPT-5 and the post-GPT-5 line the company published extended evaluations covering biological and chemical uplift, cyber offense, persuasion, and self-exfiltration. OpenAI's Preparedness Framework, introduced in late 2023 and updated in 2025, classifies frontier models on a four-tier risk scale (low, medium, high, critical) with launch gates tied to mitigations.

The OpenAI reasoning models (o-series)

Alongside the GPT series, OpenAI developed a parallel line of reasoning-focused models. In September 2024, OpenAI released o1-preview and o1-mini, models trained with reinforcement learning to perform extended chain-of-thought reasoning before producing a final answer. The full o1 model was released in December 2024.

In early 2025, OpenAI released o3, skipping the o2 designation to avoid a trademark conflict with the mobile carrier O2. The o3 model achieved 96.7% accuracy on the American Invitational Mathematics Examination (AIME) and 87.7% on a graduate-level science exam (GPQA Diamond). Unlike standard GPT models, which generate responses in a single forward pass per token, the o-series models allocate additional "thinking" compute at inference time, generating internal chains of reasoning that are not exposed to the user. o4-mini, also released in April 2025, extended this approach to a smaller, cheaper footprint and topped contemporary leaderboards on AIME 2024 and AIME 2025.

With GPT-5, OpenAI integrated elements of the o-series reasoning approach directly into the GPT line through its dynamic routing system, which can invoke deeper reasoning when the task demands it. The standalone o-series brand effectively merged into the GPT-5.x family, although the underlying training techniques remain distinct.

OpenAI API and commercial deployment

The OpenAI API launched in private beta in June 2020 with GPT-3. It was one of the first commercial APIs to provide access to a frontier language model as a service, establishing the "model-as-a-service" business model that has since been adopted across the industry.

Key milestones in the API's evolution include:

June 2020: Private beta launch with GPT-3
November 2022: ChatGPT launch built on GPT-3.5
March 2023: GPT-4 API access for select developers
November 2023: GPT-4 Turbo with 128K context and reduced pricing at DevDay
May 2024: GPT-4o API launch with native multimodal capabilities
September 2024: o1-preview and o1-mini reasoning APIs
February 2025: GPT-4.5 "Orion" launched in API and ChatGPT
April 2025: GPT-4.1 family with 1M token context windows; o3 and o4-mini reasoning APIs
August 2025: GPT-5 API launch with router and thinking variants
November 2025: GPT-5.1 with adaptive reasoning and personality presets
December 2025: GPT-5.2 with quantitative work focus
February 2026: GPT-5.3-Codex unified coding agent
March 2026: GPT-5.4 with five reasoning levels and 272K context
April 2026: GPT-5.5 and GPT-5.5 Pro

GPT models power a wide range of third-party products and integrations, including Microsoft Copilot (formerly Bing Chat), GitHub Copilot, Khan Academy's Khanmigo tutoring assistant, Duolingo's conversation practice feature, and Snapchat's My AI chatbot. OpenAI has also provided fine-tuning capabilities, allowing enterprise customers to adapt GPT models to domain-specific tasks. By October 2025 ChatGPT crossed 800 million weekly active users, a roughly 60% jump from March 2025's 500 million figure, making it the world's most-used generative AI product by a wide margin.

Comparison with other model families

The GPT series exists within a competitive landscape of large language models developed by several organizations.

Model family	Developer	Open/closed	Notable features
GPT series	OpenAI	Closed (API access)	Pioneered generative pre-training; largest commercial deployment via ChatGPT
Claude	Anthropic	Closed (API access)	Emphasis on safety and Constitutional AI alignment; strong coding performance
Gemini	Google DeepMind	Closed (API access)	Natively multimodal; 1M+ token context; integrated with Google products
LLaMA	Meta	Open weights	Most widely adopted open-weight model family; enables community fine-tuning
Mistral	Mistral AI	Open weights (some models)	Efficient architectures; mixture-of-experts approach in Mixtral
DeepSeek	DeepSeek	Open weights	Competitive performance at lower cost; strong reasoning capabilities
Grok	xAI	Mixed (some open weights)	Real-time data integration with X; long context

GPT models were among the earliest to demonstrate that scaling up language models and training data yields emergent capabilities. This insight, sometimes called the "scaling hypothesis," has influenced the development strategies of nearly every major AI lab. However, the trend has also prompted criticism regarding the sustainability and accessibility of a paradigm that requires ever-increasing computational resources.

Impact on NLP and the broader AI field

The GPT series has been one of the most influential developments in the history of natural language processing. Key contributions include:

Establishing the pre-train-then-fine-tune paradigm. GPT-1 demonstrated that unsupervised pre-training on large corpora, followed by task-specific fine-tuning, could match or exceed task-specific architectures. This approach was adopted by BERT, T5, and virtually every subsequent language model.
Demonstrating emergent capabilities at scale. GPT-3 showed that certain capabilities (such as few-shot learning, basic arithmetic, and code generation) emerge only at very large scales, leading to the concept of "emergence" in large language models.
Popularizing RLHF for alignment. InstructGPT's RLHF methodology became the standard approach for making language models follow human instructions, influencing alignment work at Anthropic, Google, and other labs.
Pioneering inference-time reasoning as a deployable product. The o-series and GPT-5's thinking variants demonstrated that spending compute at inference, not just at training, could push frontier benchmark scores beyond what additional pre-training alone delivered.
Bringing AI to mainstream public awareness. ChatGPT's launch in November 2022 was a pivotal moment in public engagement with AI technology, driving policy discussions, educational debates, and industry investment at an unprecedented scale.
Catalyzing the commercial AI ecosystem. The OpenAI API model demonstrated that foundation models could be deployed as a service, inspiring the creation of numerous startups and enterprise AI products built on top of language model APIs.

Controversies and criticisms

Safety and misuse

The GPT series has been at the center of debates about AI safety since GPT-2's staged release in 2019. Concerns include the potential for generating convincing disinformation, phishing emails, and spam at scale. Research has shown that humans can detect GPT-3-generated fake news only about 50% of the time. OpenAI has implemented usage policies, content filters, and rate limits to mitigate misuse, but the effectiveness of these measures remains a subject of ongoing debate. The agentic capabilities introduced with o3, GPT-5, and GPT-5.5 have raised an additional concern: a model that can drive a browser, run shell commands, and persist state across sessions opens new attack surfaces, including prompt injection from web pages and exfiltration of secrets from connected accounts.

Copyright and intellectual property

OpenAI's training data practices have drawn significant legal scrutiny. The models are trained on web-scraped data that includes copyrighted material, raising questions about fair use and consent. Beginning in mid-2023, a series of lawsuits were filed against OpenAI by authors (including George R.R. Martin, John Grisham, and Jodi Picoult) and publishers (including The New York Times). In March 2025 Judge Sidney Stein of the Southern District of New York rejected OpenAI's motion to dismiss the New York Times suit, allowing the core copyright infringement claims to proceed to discovery and, eventually, trial. By 2025 more than a dozen related cases had been consolidated in New York federal court. OpenAI has argued that training on publicly available data constitutes fair use, but courts have not yet issued definitive rulings on the merits.

Energy consumption and environmental impact

Training and running large language models requires substantial energy. GPT-3's training alone consumed an estimated 1,287 MWh of electricity. Inference costs are also significant: AI inference is estimated to account for roughly 60% of total AI energy consumption. Reports have indicated that GPT-5 consumes approximately 8.6 times the energy of GPT-4 during inference, with OpenAI's infrastructure requiring power equivalent to multiple nuclear reactors to meet demand. Data centers running these models also consume significant amounts of water for cooling. These environmental costs have prompted calls for greater transparency about the carbon footprint and resource consumption of large AI models.

Bias and hallucination

GPT models can reproduce and amplify biases present in their training data. Research has documented problematic associations in GPT-3's outputs, including a tendency to associate certain demographic groups with negative stereotypes. Additionally, all GPT models are prone to "hallucination," generating plausible-sounding but factually incorrect information. While each successive model generation has reduced hallucination rates (OpenAI claims an 80% reduction in GPT-5 versus GPT-4o, and a further 33% in GPT-5.4 versus GPT-5.2), the problem has not been eliminated. Hallucination is also a moving target: as models gain agentic capabilities and longer time horizons, fabricated reasoning steps can compound across many tool calls before a human ever sees the output.

Privacy

By default, OpenAI has used ChatGPT conversations to further train and improve its models, unless users explicitly opt out. This practice has raised privacy concerns, particularly for users who may share sensitive personal or business information in conversations without realizing it could become part of the training data. OpenAI has since introduced enterprise-tier products that do not use customer data for training and has provided opt-out mechanisms for individual users.

Pricing and access concentration

The shift toward thinking and Pro variants has steepened the price curve. GPT-5 Pro at $120 per million output tokens and GPT-5.5 Pro at $180 per million output tokens cost roughly 24x to 36x what GPT-5 Main costs per output token. Critics have argued that the most capable models are increasingly priced out of reach for individual researchers, students, and small startups, concentrating access within large enterprises and well-funded incumbents. OpenAI counters that smaller mini and nano variants have steadily improved on benchmarks and now match the previous generation's flagship at a fraction of the cost.

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). "Attention Is All You Need." *Advances in Neural Information Processing Systems 30 (NeurIPS 2017)*. arXiv:1706.03762
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). "Improving Language Understanding by Generative Pre-Training." *OpenAI Technical Report*. Semantic Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." *OpenAI Technical Report*. PDF
Brown, T. B., Mann, B., Ryder, N., et al. (2020). "Language Models are Few-Shot Learners." *Advances in Neural Information Processing Systems 33 (NeurIPS 2020)*. arXiv:2005.14165
Ouyang, L., Wu, J., Jiang, X., et al. (2022). "Training Language Models to Follow Instructions with Human Feedback." *Advances in Neural Information Processing Systems 35 (NeurIPS 2022)*. arXiv:2203.02155
OpenAI. (2023). "GPT-4 Technical Report." arXiv:2303.08774
OpenAI. (2024). "Hello GPT-4o." openai.com
OpenAI. (2024). "Learning to Reason with LLMs." openai.com
OpenAI. (2025). "Introducing GPT-4.5." openai.com
OpenAI. (2025). "Introducing GPT-4.1 in the API." openai.com
OpenAI. (2025). "Introducing OpenAI o3 and o4-mini." openai.com
OpenAI. (2025). "OpenAI o3 and o4-mini System Card." openai.com
OpenAI. (2025). "Introducing GPT-5." openai.com
OpenAI. (2025). "Introducing GPT-5 for developers." openai.com
OpenAI. (2025). "GPT-5.1: A smarter, more conversational ChatGPT." openai.com
OpenAI. (2025). "Building more with GPT-5.1-Codex-Max." openai.com
OpenAI. (2026). "Introducing GPT-5.4." openai.com
OpenAI. (2026). "Introducing GPT-5.5." openai.com
OpenAI. (2026). "GPT-5.5 System Card." openai.com
OpenAI. (2019). "Release Strategies and the Social Impacts of Language Models." PDF
Wiggers, K. (2025). "OpenAI unveils GPT-4.5 'Orion,' its largest AI model yet." *TechCrunch*. techcrunch.com
Wiggers, K. (2025). "OpenAI launches a pair of AI reasoning models, o3 and o4-mini." *TechCrunch*. techcrunch.com
Field, H. (2026). "OpenAI announces GPT-5.5, its latest artificial intelligence model." *CNBC*. cnbc.com
TechCrunch. (2026). "OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app'." techcrunch.com
Olson, P. (2025). "Sam Altman says ChatGPT has hit 800M weekly active users." *TechCrunch*. techcrunch.com
Allyn, B. (2025). "Judge allows 'New York Times' copyright case against OpenAI to go forward." *NPR*. npr.org
Generative pre-trained transformer. *Wikipedia*. en.wikipedia.org
GPT-4.5. *Wikipedia*. en.wikipedia.org
GPT-5. *Wikipedia*. en.wikipedia.org
GPT-5.4. *Wikipedia*. en.wikipedia.org

ELI5: explain like I'm five

Architecture

Decoder-only transformer design

Key architectural components

Pre-training objective

Inference-time reasoning

History and evolution

GPT-1 (2018)

GPT-2 (2019)

GPT-3 (2020)

GPT-3.5 and ChatGPT (2022)

InstructGPT and RLHF (2022)

GPT-4 (2023)

GPT-4o (2024)

o1 and the o-series launch (late 2024)

GPT-4.5 "Orion" (February 2025)

GPT-4.1 (April 2025)

o3 and o4-mini (April 2025)

GPT-5 (August 2025)

GPT-5.1 (November 2025)

GPT-5.2 (December 2025)

GPT-5.3-Codex and GPT-5.4 (early 2026)

GPT-5.5 (April 2026)

Model comparison across generations

Pricing snapshot (April 2026)

Training methodology

Stage 1: Pre-training

Stage 2: Supervised fine-tuning (SFT)

Stage 3: Reinforcement learning from human feedback (RLHF)

Stage 4: Reinforcement learning over verifiable rewards (o-series and GPT-5 line)

Red-teaming and safety evaluations

The OpenAI reasoning models (o-series)

OpenAI API and commercial deployment

Comparison with other model families

Impact on NLP and the broader AI field

Controversies and criticisms

Safety and misuse

Copyright and intellectual property

Energy consumption and environmental impact

Bias and hallucination

Privacy

Pricing and access concentration

References

Improve this article

Related Articles

Context window

Post-training

Sparse autoencoder

OCR Models

Pre-training

Supervised fine-tuning

ELI5: explain like I'm five

Architecture

Decoder-only transformer design

Key architectural components

Pre-training objective

Inference-time reasoning

History and evolution

GPT-1 (2018)

GPT-2 (2019)

GPT-3 (2020)

GPT-3.5 and ChatGPT (2022)

InstructGPT and RLHF (2022)

GPT-4 (2023)

GPT-4o (2024)

o1 and the o-series launch (late 2024)

GPT-4.5 "Orion" (February 2025)

GPT-4.1 (April 2025)

o3 and o4-mini (April 2025)

GPT-5 (August 2025)

GPT-5.1 (November 2025)

GPT-5.2 (December 2025)

GPT-5.3-Codex and GPT-5.4 (early 2026)

GPT-5.5 (April 2026)

Model comparison across generations

Pricing snapshot (April 2026)

Training methodology

Stage 1: Pre-training

Stage 2: Supervised fine-tuning (SFT)

Stage 3: Reinforcement learning from human feedback (RLHF)