Generative Pre-trained Transformer (GPT) is a family of autoregressive large language models developed by OpenAI. Built on the decoder-only transformer architecture, GPT models are pre-trained on massive text corpora using next-token prediction and then refined through supervised fine-tuning and reinforcement learning from human feedback (RLHF). Since the release of GPT-1 in 2018, the series has grown through several major iterations, including GPT-2, GPT-3, GPT-4, GPT-4o, GPT-4.5, GPT-5, and the post-GPT-5 point releases that arrived through 2025 and into 2026. Each generation has marked a notable jump in scale, capability, or deployment surface. GPT models have played a central role in the rapid expansion of generative AI, powering products such as ChatGPT and the OpenAI API, and influencing the broader development of language models across the industry.
Imagine you have a really smart parrot that has read millions and millions of books, websites, and articles. When you start a sentence, the parrot guesses the next word based on everything it has read. It does this one word at a time, over and over, until it has written a whole paragraph or page. That is basically what GPT does. "Generative" means it creates new text. "Pre-trained" means it studied a huge amount of text before you ever talked to it. "Transformer" is the name of the math recipe it uses to understand which words are important and how they relate to each other. The more books the parrot reads (more data) and the bigger its brain gets (more parameters), the better it becomes at writing things that sound like a real person wrote them.
GPT models use a decoder-only variant of the transformer architecture introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. Unlike the original transformer, which includes both an encoder and a decoder, GPT retains only the decoder stack. Each decoder block consists of two primary sublayers: a causally masked self-attention layer and a position-wise feed-forward neural network. Layer normalization and residual connections are applied around each sublayer.
The causal (or "masked") self-attention mechanism ensures that when the model processes a sequence, each token can only attend to tokens at earlier positions. This left-to-right constraint is what makes the model autoregressive: it generates text one token at a time, with each new token conditioned on all preceding tokens.
| Component | Description |
|---|---|
| Token embedding | Maps each input token ID to a dense vector representation |
| Positional embedding | Encodes the position of each token in the sequence (learned embeddings in GPT-1/2/3, rotary or ALiBi-style schemes in later models) |
| Causal self-attention | Computes weighted sums of token representations, restricted so each position attends only to earlier positions |
| Feed-forward network | A two-layer fully connected network applied independently to each position |
| Layer normalization | Normalizes activations to stabilize training; GPT-2 and later moved this before each sublayer (pre-norm) |
| Language model head | A linear projection from the final hidden state to the vocabulary, producing logits for next-token prediction |
| Mixture-of-experts (rumored, GPT-4 onward) | Sparse routing of tokens through a subset of expert sub-networks; widely reported but not officially confirmed by OpenAI |
The core training objective for all GPT models is next-token prediction (also called causal language modeling). Given a sequence of tokens, the model learns to predict the probability distribution over the vocabulary for the next token at each position. The loss function is the cross-entropy between the predicted distribution and the actual next token. This simple but powerful objective, applied over hundreds of billions to trillions of tokens, enables GPT models to learn grammar, factual knowledge, reasoning patterns, and even some degree of common sense from raw text.
Beginning with the o-series in late 2024 and merged into the GPT line with GPT-5 in 2025, OpenAI added a second axis of compute: thinking-time reasoning. Instead of producing an answer in a single forward pass, "thinking" variants of GPT generate a long internal chain of thought before emitting the user-visible response. The amount of internal reasoning is now controllable through discrete effort levels (none, low, medium, high, and in GPT-5.4 an additional "xhigh" level), letting developers trade latency and cost against accuracy on a per-request basis.
OpenAI introduced GPT-1 in June 2018 with the paper "Improving Language Understanding by Generative Pre-Training" by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. The model contained 117 million parameters organized in 12 transformer layers with a model dimension of 768 and a context window of 512 tokens.
GPT-1 was pre-trained on the BooksCorpus dataset, which contained roughly 7,000 unpublished books spanning a wide range of genres. The key insight of the paper was that generative pre-training on a large, unlabeled text corpus, followed by discriminative fine-tuning on specific downstream tasks, could achieve strong results across many natural language understanding benchmarks. The model improved on the state of the art in 9 out of 12 tasks evaluated, including textual entailment, question answering, and semantic similarity. This two-stage approach (unsupervised pre-training followed by supervised fine-tuning) became the foundational paradigm for subsequent GPT models and for the broader field of transfer learning in NLP.
GPT-2 was announced in February 2019 with the paper "Language Models are Unsupervised Multitask Learners" by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. The model scaled to 1.5 billion parameters (roughly 13 times GPT-1) and was trained on WebText, a dataset of approximately 8 million web pages (about 40 GB of text) scraped from outbound links on Reddit that received at least three upvotes.
The central finding of GPT-2 was that a sufficiently large language model, trained on sufficiently diverse data, could perform many tasks in a zero-shot setting, without any task-specific fine-tuning. The model demonstrated coherent multi-paragraph text generation, basic reading comprehension, translation, and summarization capabilities, all driven purely by next-token prediction at scale.
GPT-2 became notable for its controversial staged release strategy. OpenAI initially released only the smallest version (124 million parameters) in February 2019, citing concerns about potential misuse for generating convincing fake news, spam, and disinformation. The 355 million parameter version followed in May, the 774 million version in August, and the full 1.5 billion parameter model in November 2019. OpenAI published a report titled "Release Strategies and the Social Impacts of Language Models" alongside the staged rollout, noting that the feared large-scale misuse had not materialized from the smaller releases. This was one of the first high-profile cases of an AI lab arguing that a model posed sufficient misuse risk to warrant controlled release, setting a precedent for safety-conscious deployment in the field.
GPT-3 was described in the May 2020 paper "Language Models are Few-Shot Learners" by Tom Brown, Benjamin Mann, Nick Ryder, and dozens of co-authors at OpenAI. The paper presented a family of eight models ranging from 125 million to 175 billion parameters, with the largest designated as GPT-3.
The 175 billion parameter model used 96 transformer layers, a model dimension of 12,288, and 96 attention heads. It was trained on a mixture of datasets: a filtered version of Common Crawl (60% of weighted training data), WebText2, two book corpora, and English Wikipedia, totaling approximately 300 billion tokens. The context window was 2,048 tokens. The estimated training compute cost was in the range of $4.6 million on contemporary GPU hardware.
GPT-3's breakthrough contribution was demonstrating that in-context learning (or few-shot learning) emerged at scale. Rather than requiring gradient-based fine-tuning for each new task, users could simply provide a few examples of the desired input-output pattern in the prompt, and the model would generalize to new instances. This capability enabled GPT-3 to perform competitively on a wide range of NLP benchmarks, and in some cases match or exceed fine-tuned models, all without updating any model weights.
OpenAI chose not to release GPT-3's weights publicly. Instead, the model was made available through the OpenAI API, which launched in a private beta in June 2020. This marked a shift toward a commercial API-based deployment model for large language models.
GPT-3.5 refers to a family of models that served as intermediate improvements between GPT-3 and GPT-4. These models incorporated instruction tuning, a technique where the model is further trained on examples of following human instructions, making it more reliable at interpreting and executing user prompts. The code-davinci-002 and text-davinci-003 variants are among the best-known GPT-3.5 models.
On November 30, 2022, OpenAI launched ChatGPT, a conversational interface built on top of a GPT-3.5 variant fine-tuned with RLHF. ChatGPT attracted 1 million users within five days and reached 100 million monthly active users by January 2023, making it the fastest-growing consumer application in history at that time. ChatGPT's success demonstrated the commercial viability of conversational AI and ignited widespread public interest in large language models.
The paper "Training Language Models to Follow Instructions with Human Feedback" by Long Ouyang et al., published on arXiv in March 2022 and presented at NeurIPS 2022, introduced InstructGPT. The paper addressed a fundamental problem: making language models larger does not inherently make them better at following user intent. Large models can produce outputs that are untruthful, toxic, or unhelpful.
InstructGPT applied a three-step alignment process:
A striking result was that the 1.3 billion parameter InstructGPT model was preferred by human evaluators over the 175 billion parameter GPT-3, despite having roughly 100 times fewer parameters. InstructGPT also showed measurable improvements in truthfulness and reductions in toxic output generation. The RLHF methodology pioneered by InstructGPT became the standard alignment technique for virtually all subsequent commercial language models.
OpenAI released GPT-4 on March 14, 2023. The accompanying technical report described it as a "large-scale, multimodal model which can accept image and text inputs and produce text outputs." Unlike previous GPT models, OpenAI disclosed very few architectural details, citing competitive and safety considerations. The parameter count was not officially confirmed, though some press reports citing unnamed sources suggested it may contain around 1 trillion parameters arranged as a mixture of experts. CEO Sam Altman stated that the training cost exceeded $100 million.
GPT-4 represented a substantial capability jump over GPT-3.5. On the Uniform Bar Examination, GPT-4 scored in the top 10% of test takers, compared to GPT-3.5's bottom 10%. On the MMLU benchmark, GPT-4 achieved 86.4% accuracy in English and surpassed prior models across 24 of 26 other languages tested. On the Torrance Tests of Creative Thinking, GPT-4 scored in the top 1% for originality and fluency. The model also passed the USMLE medical licensing exam by more than 20 points above the passing threshold.
GPT-4 was released in two context-window variants: 8,192 tokens and 32,768 tokens, both substantially larger than GPT-3.5's 4,096-token limit.
GPT-4 Turbo, announced at OpenAI DevDay in November 2023, expanded the context window to 128,000 tokens (equivalent to roughly 300 pages of text), updated the knowledge cutoff to April 2023, added a JSON output mode, improved instruction following, and reduced API pricing to one-third of the original GPT-4 rate.
GPT-4o ("o" for "omni") was announced on May 13, 2024. It is a natively multimodal model that can process and generate text, images, and audio within a single architecture. Unlike earlier GPT-4 variants that relied on separate models for audio processing, GPT-4o handles voice-to-voice interaction natively, with audio response latency averaging 320 milliseconds, comparable to human conversational response times.
GPT-4o matches GPT-4 Turbo performance on English text and code tasks, while delivering significant improvements on non-English languages (supporting over 50 languages covering 97% of the world's speakers). It operates at roughly half the cost and twice the speed of GPT-4 Turbo through the API. The model uses a 128,000-token context window. GPT-4o was made available to free-tier ChatGPT users, significantly broadening access to frontier model capabilities.
In September 2024 OpenAI released o1-preview and o1-mini, the first models trained explicitly to perform extended chain-of-thought reasoning before producing a final answer. The full o1 model followed in December 2024 alongside ChatGPT Pro, a $200/month tier that bundled higher usage limits and access to "o1 pro mode," which spent additional compute per query for higher reliability on hard problems. o1 dramatically outperformed GPT-4o on competition math, graduate-level science questions (GPQA), and competitive programming, while being slower and more expensive per query.
GPT-4.5, codenamed "Orion," was released on February 27, 2025. OpenAI described it as the company's largest model trained with traditional unsupervised pre-training, signaling that this branch of the GPT line had reached the end of the road. Subscribers to the $200/month ChatGPT Pro tier received access on the day of announcement; ChatGPT Plus and Team users gained access the following week.
The model emphasized broader world knowledge, better instruction following, and what OpenAI called higher "EQ," with measurably lower hallucination rates than GPT-4o on internal evaluations. It outperformed GPT-4o across all 15 languages tested on MMLU, including Arabic, Bengali, Chinese, Hindi, Swahili, and Yoruba. Reception was mixed: critics characterized GPT-4.5 as an incremental upgrade rather than a leap, and the API pricing of $75 per million input tokens and $150 per million output tokens was an order of magnitude higher than GPT-4o, limiting practical adoption.
OpenAI deprecated GPT-4.5 from the API on July 14, 2025, in favor of the cheaper and faster GPT-4.1 family. On August 7, 2025, the day GPT-5 launched, GPT-4.5 was removed from ChatGPT Plus and Teams, surviving only as a "Legacy Models" option for Pro subscribers.
On April 14, 2025, OpenAI released the GPT-4.1 model family, consisting of three variants: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models featured major improvements in coding, instruction following, and long-context comprehension, with a context window expanded to 1 million tokens.
| Model | Key strengths | Notable benchmarks |
|---|---|---|
| GPT-4.1 | Coding and instruction following | SWE-bench Verified: 21.4% improvement over GPT-4o; MultiChallenge: 38.3% (10.5% above GPT-4o) |
| GPT-4.1 mini | Small-model performance leader | Matches or exceeds GPT-4o in many benchmarks at 83% lower cost and nearly half the latency |
| GPT-4.1 nano | Speed and cost optimized | MMLU: 80.1%; GPQA: 50.3%; supports 1 million token context |
The GPT-4.1 series used a knowledge cutoff of June 2024 and was initially available only through the API before being rolled out to ChatGPT Plus, Pro, and Team users.
Two days after GPT-4.1, on April 16, 2025, OpenAI released o3 and o4-mini, the next generation of its reasoning models. (The "o2" name was skipped to avoid a trademark conflict with the British mobile carrier O2.) o3-mini had already shipped on January 31, 2025, and o3-pro followed on June 10, 2025.
These models are notable for being the first OpenAI reasoning models that can agentically chain together every tool inside ChatGPT, including web browsing, Python execution, file analysis, image generation, and visual reasoning over uploaded screenshots and diagrams. o3 reduced major errors by roughly 20% relative to o1 on difficult real-world tasks evaluated by external experts.
| Model | Release | Headline result |
|---|---|---|
| o3-mini | January 31, 2025 | Three reasoning effort levels (low/medium/high); free-tier accessible |
| o3 | April 16, 2025 | GPQA Diamond 87.7%; SWE-bench Verified 71.7%; Codeforces Elo 2727; ~3x ARC-AGI score over o1 |
| o4-mini | April 16, 2025 | Best benchmarked model on AIME 2024 and AIME 2025 at the time of release |
| o3-pro | June 10, 2025 | Parallel test-time compute for the most reliable single answer |
GPT-5 was released on August 7, 2025. It is the first GPT model to integrate fast-response and reasoning capabilities into a single product through a real-time router that decides, per turn, whether to answer with a high-throughput model (gpt-5-main) or hand the turn to a deeper reasoning model (gpt-5-thinking). The router considers conversation type, task complexity, tool requirements, and explicit user intent (for example, the phrase "think step by step").
OpenAI reported that GPT-5 set a new state of the art across math (94.6% on AIME 2025 without tools), real-world coding (74.9% on SWE-bench Verified, 88% on Aider Polyglot), multimodal understanding (84.2% on MMMU), and health (46.2% on HealthBench Hard). The company stated that GPT-5 with thinking matched or exceeded o3 while using 50% to 80% fewer output tokens, and reduced hallucination rates by up to 80% compared to GPT-4o. Agentic capabilities expanded too: GPT-5 can spin up its own desktop sandbox, drive a browser, and chain multi-step research workflows without supervision.
The product surface includes:
| Variant | Tier | Notes |
|---|---|---|
| gpt-5-main | Free, Plus, Pro | Default fast model used by the router for most queries |
| gpt-5-main-mini | Plus, free fallback | Smaller fast model for routine completions |
| gpt-5-thinking | Plus (limited), Pro (higher limits) | Deeper reasoning, longer latency |
| gpt-5-thinking-mini | Plus, API | Reasoning at lower cost |
| gpt-5-thinking-nano | API only | Smallest reasoning variant for embedded uses |
| gpt-5-thinking-pro | Pro only ("GPT-5 Pro") | Parallel test-time compute for highest reliability |
API pricing at launch was $0.625 per million input tokens and $5.00 per million output tokens for gpt-5-main, with GPT-5 Pro priced at a premium ($15 input / $120 output per million tokens). Free ChatGPT users received GPT-5 access for the first time, with tighter daily limits than paid tiers.
GPT-5.1 launched on November 12, 2025. Two variants shipped that day, with mini and nano versions arriving the week of November 19. The headline change was tone: GPT-5.1 Instant was retuned to feel warmer and more conversational than GPT-5, with eight selectable personality presets (default, friendly, professional, candid, encouraging, witty, efficient, and quirky). Adaptive reasoning was added to GPT-5.1 Instant so the model could opportunistically "think" on harder questions without flipping to a separate variant. GPT-5.1 Thinking adjusted its internal reasoning length to question difficulty more precisely than GPT-5, posting gains on AIME 2025 and Codeforces. A specialized GPT-5.1-Codex-Max variant for autonomous coding agents shipped later in the same release window.
GPT-5.2 was released on December 11, 2025, in three forms: GPT-5.2 Instant, GPT-5.2 Thinking (with a standard and an extended thinking mode), and GPT-5.2 Pro. The release focused on quantitative work, with measurable gains on spreadsheet construction, financial modeling, presentation generation, and multi-step project execution.
GPT-5.3-Codex, released on February 5, 2026, fused the Codex coding stack with the GPT-5 reasoning stack into a single agentic coding model, replacing the separate Codex line that had run alongside the consumer GPT line since 2021.
GPT-5.4 followed on March 5, 2026, initially as Thinking and Pro variants for paid users; mini and nano arrived on March 17 with the mini version available to free-tier subscribers. GPT-5.4 added five discrete reasoning effort levels (none, low, medium, high, xhigh), a 272K context window for code and document workflows, built-in computer use capabilities, and improved deep research. OpenAI reported a 33% reduction in factual errors compared to GPT-5.2 and 75% on the OSWorld-Verified computer-use benchmark, up from GPT-5.2's 47.3%. Reception praised the Thinking variant's lower hallucination rate but criticized the higher per-token API pricing for the smaller variants (roughly 4x the equivalent GPT-5 pricing).
GPT-5.5 became broadly available on April 23, 2026, rolling out to ChatGPT Plus, Pro, Business, and Enterprise users, with a higher-end GPT-5.5 Pro variant restricted to Pro, Business, and Enterprise tiers. OpenAI described GPT-5.5 as "a new class of intelligence," emphasizing improvements in agentic computer use, autonomous knowledge work, scientific research workflows (with claimed contributions to drug-discovery pipelines), and cybersecurity defense. The company reported that GPT-5.5 outperformed Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.5 across multiple benchmarks. API pricing roughly doubled headline rates relative to GPT-5.4 ($5 input / $30 output per million tokens, with GPT-5.5 Pro at $30 / $180), though OpenAI noted that lower per-task token consumption made the effective cost increase closer to 20%.
GPT-5.1 was retired from ChatGPT on March 11, 2026, with existing conversations migrated to the equivalent GPT-5.3 Instant, GPT-5.4 Thinking, or GPT-5.4 Pro variant. The same kind of silent migration is expected as the GPT-5 series continues to iterate.
| Version | Release date | Parameters | Context window | Training data | Key innovation |
|---|---|---|---|---|---|
| GPT-1 | June 2018 | 117M | 512 tokens | BooksCorpus | Generative pre-training + fine-tuning paradigm |
| GPT-2 | February 2019 | 1.5B | 1,024 tokens | WebText (40 GB) | Zero-shot task performance; staged release |
| GPT-3 | May 2020 | 175B | 2,048 tokens | Common Crawl mix (300B tokens) | Few-shot in-context learning |
| GPT-3.5 | March 2022 | Undisclosed | 4,096 tokens | Undisclosed | Instruction tuning; basis of ChatGPT |
| GPT-4 | March 2023 | Undisclosed (rumored ~1T MoE) | 8K / 32K tokens | Undisclosed | Multimodal input (text + image) |
| GPT-4 Turbo | November 2023 | Undisclosed | 128K tokens | Updated to April 2023 | Longer context, lower cost, JSON mode |
| GPT-4o | May 2024 | Undisclosed | 128K tokens | Undisclosed | Native multimodal (text + image + audio) |
| o1 | December 2024 | Undisclosed | 128K tokens | Undisclosed | First public reasoning model with inference-time chain of thought |
| GPT-4.5 "Orion" | February 27, 2025 | Undisclosed (largest pre-trained GPT) | 128K tokens | Undisclosed | Final pure pre-training scale-up; deprecated August 2025 |
| GPT-4.1 | April 14, 2025 | Undisclosed | 1M tokens | Updated to June 2024 | 1M context, major coding gains |
| o3 / o4-mini | April 16, 2025 | Undisclosed | 200K tokens | Undisclosed | Agentic tool use across all ChatGPT tools |
| GPT-5 | August 7, 2025 | Undisclosed | 256K-class | Undisclosed | Dynamic routing between fast and reasoning variants |
| GPT-5.1 | November 12, 2025 | Undisclosed | 256K-class | Undisclosed | Adaptive reasoning in Instant; eight personality presets |
| GPT-5.2 | December 11, 2025 | Undisclosed | 256K-class | Undisclosed | Spreadsheets, financial modeling, multi-step projects |
| GPT-5.3-Codex | February 5, 2026 | Undisclosed | 272K tokens | Undisclosed | Unified Codex + GPT-5 agentic coding model |
| GPT-5.4 | March 5, 2026 | Undisclosed | 272K tokens | Undisclosed | Five reasoning effort levels; 75% on OSWorld-Verified |
| GPT-5.5 | April 23, 2026 | Undisclosed | Undisclosed | Undisclosed | Agentic computer use; scientific research; cybersecurity |
API list prices for selected GPT and o-series models, in US dollars per million tokens. Volume discounts, batch processing, and prompt caching can substantially reduce effective cost.
| Model | Input | Output | Notes |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | Long-running default for cheap multimodal use |
| GPT-4.1 | $2.00 | $8.00 | 1M context |
| GPT-4.1 mini | $0.40 | $1.60 | Cost-leader for lightweight tasks |
| GPT-4.1 nano | $0.10 | $0.40 | Embedded and high-volume use |
| GPT-5 (main) | $0.625 | $5.00 | Routed default in ChatGPT |
| GPT-5 Pro | $15.00 | $120.00 | Parallel test-time compute |
| GPT-5.4 | $2.50 | $15.00 | March 2026 generation |
| GPT-5.5 | $5.00 | $30.00 | Current flagship as of April 2026 |
| GPT-5.5 Pro | $30.00 | $180.00 | Highest-reliability tier |
All GPT models begin with unsupervised pre-training on large text corpora. The model processes sequences of tokens and learns to predict the next token at each position using cross-entropy loss. This phase requires enormous computational resources: GPT-3's training consumed an estimated 3,640 petaflop-days of compute. Pre-training datasets have grown from the 4.5 GB BooksCorpus used for GPT-1 to the hundreds-of-terabytes-scale web crawls used for later models. The datasets typically include filtered web text, books, academic papers, and code repositories. GPT-4.5 was officially identified as the last model in the family to be built primarily through this kind of large-scale unsupervised pre-training; subsequent models lean more heavily on synthetic data, reasoning traces, and reinforcement learning over verifiable rewards.
After pre-training, the model is fine-tuned on a curated dataset of prompt-response pairs, where human labelers provide high-quality demonstrations of desired behavior. This stage teaches the model to follow instructions, maintain a conversational tone, and produce structured outputs. SFT was introduced as a formal alignment step with InstructGPT and has been applied to all subsequent GPT models.
The SFT model generates multiple responses to each prompt, which are then ranked by human evaluators. These rankings train a reward model that predicts human preference scores. The language model is then optimized using the PPO (Proximal Policy Optimization) algorithm to maximize the reward model's scores while staying close to the SFT model's distribution (to prevent reward hacking). RLHF has proven effective at reducing harmful outputs, improving factual accuracy, and making model behavior more aligned with user expectations.
Starting with o1, OpenAI added a fourth stage that trains the model to generate long chains of internal reasoning and rewards trajectories that produce verifiably correct answers (for example, math problems with checkable solutions, unit-tested code, or browser actions whose outcomes can be inspected). This is the source of the dramatic reasoning gains in o3, GPT-5 thinking, and GPT-5.4 Thinking. It is also why the thinking variants tend to consume significantly more compute per response than the older RLHF-only models.
Each major release is preceded by a system card describing internal and external red-teaming results. For GPT-4 OpenAI worked with the Alignment Research Center on autonomous-replication and resource-acquisition tests. For GPT-5 and the post-GPT-5 line the company published extended evaluations covering biological and chemical uplift, cyber offense, persuasion, and self-exfiltration. OpenAI's Preparedness Framework, introduced in late 2023 and updated in 2025, classifies frontier models on a four-tier risk scale (low, medium, high, critical) with launch gates tied to mitigations.
Alongside the GPT series, OpenAI developed a parallel line of reasoning-focused models. In September 2024, OpenAI released o1-preview and o1-mini, models trained with reinforcement learning to perform extended chain-of-thought reasoning before producing a final answer. The full o1 model was released in December 2024.
In early 2025, OpenAI released o3, skipping the o2 designation to avoid a trademark conflict with the mobile carrier O2. The o3 model achieved 96.7% accuracy on the American Invitational Mathematics Examination (AIME) and 87.7% on a graduate-level science exam (GPQA Diamond). Unlike standard GPT models, which generate responses in a single forward pass per token, the o-series models allocate additional "thinking" compute at inference time, generating internal chains of reasoning that are not exposed to the user. o4-mini, also released in April 2025, extended this approach to a smaller, cheaper footprint and topped contemporary leaderboards on AIME 2024 and AIME 2025.
With GPT-5, OpenAI integrated elements of the o-series reasoning approach directly into the GPT line through its dynamic routing system, which can invoke deeper reasoning when the task demands it. The standalone o-series brand effectively merged into the GPT-5.x family, although the underlying training techniques remain distinct.
The OpenAI API launched in private beta in June 2020 with GPT-3. It was one of the first commercial APIs to provide access to a frontier language model as a service, establishing the "model-as-a-service" business model that has since been adopted across the industry.
Key milestones in the API's evolution include:
GPT models power a wide range of third-party products and integrations, including Microsoft Copilot (formerly Bing Chat), GitHub Copilot, Khan Academy's Khanmigo tutoring assistant, Duolingo's conversation practice feature, and Snapchat's My AI chatbot. OpenAI has also provided fine-tuning capabilities, allowing enterprise customers to adapt GPT models to domain-specific tasks. By October 2025 ChatGPT crossed 800 million weekly active users, a roughly 60% jump from March 2025's 500 million figure, making it the world's most-used generative AI product by a wide margin.
The GPT series exists within a competitive landscape of large language models developed by several organizations.
| Model family | Developer | Open/closed | Notable features |
|---|---|---|---|
| GPT series | OpenAI | Closed (API access) | Pioneered generative pre-training; largest commercial deployment via ChatGPT |
| Claude | Anthropic | Closed (API access) | Emphasis on safety and Constitutional AI alignment; strong coding performance |
| Gemini | Google DeepMind | Closed (API access) | Natively multimodal; 1M+ token context; integrated with Google products |
| LLaMA | Meta | Open weights | Most widely adopted open-weight model family; enables community fine-tuning |
| Mistral | Mistral AI | Open weights (some models) | Efficient architectures; mixture-of-experts approach in Mixtral |
| DeepSeek | DeepSeek | Open weights | Competitive performance at lower cost; strong reasoning capabilities |
| Grok | xAI | Mixed (some open weights) | Real-time data integration with X; long context |
GPT models were among the earliest to demonstrate that scaling up language models and training data yields emergent capabilities. This insight, sometimes called the "scaling hypothesis," has influenced the development strategies of nearly every major AI lab. However, the trend has also prompted criticism regarding the sustainability and accessibility of a paradigm that requires ever-increasing computational resources.
The GPT series has been one of the most influential developments in the history of natural language processing. Key contributions include:
The GPT series has been at the center of debates about AI safety since GPT-2's staged release in 2019. Concerns include the potential for generating convincing disinformation, phishing emails, and spam at scale. Research has shown that humans can detect GPT-3-generated fake news only about 50% of the time. OpenAI has implemented usage policies, content filters, and rate limits to mitigate misuse, but the effectiveness of these measures remains a subject of ongoing debate. The agentic capabilities introduced with o3, GPT-5, and GPT-5.5 have raised an additional concern: a model that can drive a browser, run shell commands, and persist state across sessions opens new attack surfaces, including prompt injection from web pages and exfiltration of secrets from connected accounts.
OpenAI's training data practices have drawn significant legal scrutiny. The models are trained on web-scraped data that includes copyrighted material, raising questions about fair use and consent. Beginning in mid-2023, a series of lawsuits were filed against OpenAI by authors (including George R.R. Martin, John Grisham, and Jodi Picoult) and publishers (including The New York Times). In March 2025 Judge Sidney Stein of the Southern District of New York rejected OpenAI's motion to dismiss the New York Times suit, allowing the core copyright infringement claims to proceed to discovery and, eventually, trial. By 2025 more than a dozen related cases had been consolidated in New York federal court. OpenAI has argued that training on publicly available data constitutes fair use, but courts have not yet issued definitive rulings on the merits.
Training and running large language models requires substantial energy. GPT-3's training alone consumed an estimated 1,287 MWh of electricity. Inference costs are also significant: AI inference is estimated to account for roughly 60% of total AI energy consumption. Reports have indicated that GPT-5 consumes approximately 8.6 times the energy of GPT-4 during inference, with OpenAI's infrastructure requiring power equivalent to multiple nuclear reactors to meet demand. Data centers running these models also consume significant amounts of water for cooling. These environmental costs have prompted calls for greater transparency about the carbon footprint and resource consumption of large AI models.
GPT models can reproduce and amplify biases present in their training data. Research has documented problematic associations in GPT-3's outputs, including a tendency to associate certain demographic groups with negative stereotypes. Additionally, all GPT models are prone to "hallucination," generating plausible-sounding but factually incorrect information. While each successive model generation has reduced hallucination rates (OpenAI claims an 80% reduction in GPT-5 versus GPT-4o, and a further 33% in GPT-5.4 versus GPT-5.2), the problem has not been eliminated. Hallucination is also a moving target: as models gain agentic capabilities and longer time horizons, fabricated reasoning steps can compound across many tool calls before a human ever sees the output.
By default, OpenAI has used ChatGPT conversations to further train and improve its models, unless users explicitly opt out. This practice has raised privacy concerns, particularly for users who may share sensitive personal or business information in conversations without realizing it could become part of the training data. OpenAI has since introduced enterprise-tier products that do not use customer data for training and has provided opt-out mechanisms for individual users.
The shift toward thinking and Pro variants has steepened the price curve. GPT-5 Pro at $120 per million output tokens and GPT-5.5 Pro at $180 per million output tokens cost roughly 24x to 36x what GPT-5 Main costs per output token. Critics have argued that the most capable models are increasingly priced out of reach for individual researchers, students, and small startups, concentrating access within large enterprises and well-funded incumbents. OpenAI counters that smaller mini and nano variants have steadily improved on benchmarks and now match the previous generation's flagship at a fraction of the cost.