GPT-4 (Generative Pre-trained Transformer 4) is a large language model developed by OpenAI. Released on March 14, 2023, it is the fourth model in the GPT series and was the first GPT model to accept both text and image inputs, making it a multimodal system. GPT-4 represented a major performance leap over its predecessor, GPT-3.5, scoring in the top percentiles on a range of professional and academic exams, including the Uniform Bar Exam (approximately the 90th percentile) and the SAT.
OpenAI chose not to disclose technical details about GPT-4's architecture, parameter count, training data, or hardware. The accompanying technical report explicitly stated that such information was withheld due to "the competitive landscape and the safety implications of large-scale models." CEO Sam Altman confirmed that the training cost exceeded $100 million.
GPT-4 was initially available through ChatGPT Plus (a $20/month subscription) and the OpenAI API. It has since been succeeded by several variants, including GPT-4 Turbo (November 2023), GPT-4o (May 2024), and GPT-4o mini (July 2024), each improving on cost, speed, or capability.
Like its predecessors, GPT-4 is a Transformer-based model pre-trained on large datasets of text taken from the internet. During pre-training, the model learned to predict the next token (roughly corresponding to a word or subword) in a sequence. According to leaked reports from SemiAnalysis and other industry analysts, GPT-4 uses a mixture of experts (MoE) architecture with approximately 1.8 trillion total parameters spread across 120 layers. The model reportedly contains 16 expert sub-networks, each with roughly 111 billion parameters in the MLP layers, and uses a top-2 routing approach where each token is processed by two experts per forward pass. OpenAI has never confirmed these figures.
The training dataset reportedly consisted of approximately 13 trillion tokens drawn from both publicly available internet text and data licensed from third-party providers, supplemented by code-based data. Some fine-tuning data was sourced from Scale AI and internal teams.
Microsoft built a custom Azure supercomputer with over 10,000 GPUs and high-bandwidth networking specifically for OpenAI's training workloads. GPT-3.5 served as an early test run on this infrastructure before GPT-4 training began. Sam Altman stated that training GPT-4 cost over $100 million in compute alone, and OpenAI spends around $200 million per year maintaining its supercomputing systems.
After pre-training, OpenAI fine-tuned GPT-4 using reinforcement learning from human feedback (RLHF). Human reviewers ranked model outputs by quality and safety, and this feedback trained a reward model that guided further optimization. GPT-4 also incorporated an additional safety reward signal during RLHF, provided by a GPT-4 zero-shot classifier that judged safety boundaries and response style on safety-related prompts.
To build a diverse training signal for safety alignment, OpenAI drew from multiple sources: labeled production data, outputs from human red-teaming sessions, and model-generated prompts. The safety reward was applied across both allowed and disallowed content categories to prevent the model from over-refusing legitimate requests.
OpenAI engaged over 50 external experts from fields including AI alignment, cybersecurity, biosecurity, and international security to adversarially test GPT-4 before release. These red teamers probed the model for dangerous capabilities and failure modes, including potential for generating harmful content, assisting with weapons development, and facilitating social engineering.
The results of these safety interventions were measurable. Compared to GPT-3.5, GPT-4 was 82% less likely to respond to requests for disallowed content. It also complied with OpenAI's policies on sensitive topics (such as medical advice and self-harm) 29% more often than GPT-3.5. OpenAI published both a technical report and a system card documenting these evaluations.
OpenAI's technical report for GPT-4 contains no information about the model's size, architecture, hardware, or training method. Everything known about the architecture comes from unofficial sources.
| Detail | Reported value | Source |
|---|---|---|
| Total parameters | ~1.8 trillion | SemiAnalysis (leaked) |
| Number of layers | ~120 | SemiAnalysis (leaked) |
| Expert count (MoE) | 16 | SemiAnalysis (leaked) |
| Experts routed per token | 2 | SemiAnalysis (leaked) |
| MLP parameters per expert | ~111 billion | SemiAnalysis (leaked) |
| Training tokens | ~13 trillion | SemiAnalysis (leaked) |
| Training cost | >$100 million | Sam Altman (confirmed) |
| Context window (original) | 8,192 or 32,768 tokens | OpenAI (official) |
| Knowledge cutoff (original) | September 2021 | OpenAI (official) |
The mixture-of-experts approach, if accurate, explains how GPT-4 could contain far more parameters than GPT-3 (175 billion) while keeping inference costs manageable. Only a fraction of the total parameters are active for any given token, since each token is routed to just two of the 16 experts.
GPT-4 produces text that is substantially more coherent, accurate, and nuanced than GPT-3.5. It can follow complex multi-step instructions, write code in dozens of programming languages, draft legal documents, solve math problems, and translate between languages. On natural language processing benchmarks, it set new records at launch across multiple categories.
One of GPT-4's most notable strengths at release was its improved ability to follow instructions. It could adopt specific personas through system messages, generate output in structured formats like JSON or XML, and maintain consistency across long conversations.
GPT-4 was the first model in the GPT series to accept image inputs alongside text. Users could upload photographs, charts, screenshots, and handwritten notes, and the model would describe, analyze, or answer questions about them. OpenAI demonstrated this capability early on with examples like identifying objects in photos and reading text from images of documents.
The vision capability was not available at launch. OpenAI released the GPT-4V(ision) system card on September 25, 2023, and began rolling out image input to ChatGPT Plus and Enterprise users shortly after.
One early deployment partner was Be My Eyes, a company that develops assistive technology for blind and low-vision users. Beginning in March 2023, Be My Eyes and OpenAI collaborated on "Be My AI," a tool that used GPT-4's vision capabilities to describe the visual world. By September 2023, the beta test group had grown to 16,000 users requesting an average of 25,000 image descriptions per day.
GPT-4 introduced improved support for system messages, which allow developers and users to set the model's behavior, tone, and constraints at the start of a conversation. This feature gave developers finer control over outputs compared to GPT-3.5, enabling applications ranging from customer service bots with specific personas to coding assistants restricted to particular languages.
GPT-4's most widely reported result at launch was its performance on standardized exams. While GPT-3.5 generally scored in the lower percentiles, GPT-4 performed at or above the level of most human test-takers on many professional and academic tests.
See also: GPT-4 Plugins
| Exam | GPT-4 Points | GPT-4 Percentile | GPT-4 (no vision) Points | GPT-4 (no vision) Percentile | GPT-3.5 Points | GPT-3.5 Percentile |
|---|---|---|---|---|---|---|
| Uniform Bar Exam (MBE+MEE+MPT)1 | 298 / 400 | ~90th | 298 / 400 | ~90th | 213 / 400 | ~10th |
| LSAT | 163 | ~88th | 161 | ~83rd | 149 | ~40th |
| SAT Evidence-Based Reading & Writing | 710 / 800 | ~93rd | 710 / 800 | ~93rd | 670 / 800 | ~87th |
| SAT Math | 700 / 800 | ~89th | 690 / 800 | ~89th | 590 / 800 | ~70th |
| Graduate Record Examination (GRE) Quantitative | 163 / 170 | ~80th | 157 / 170 | ~62nd | 147 / 170 | ~25th |
| Graduate Record Examination (GRE) Verbal | 169 / 170 | ~99th | 165 / 170 | ~96th | 154 / 170 | ~63rd |
| Graduate Record Examination (GRE) Writing | 4 / 6 | ~54th | 4 / 6 | ~54th | 4 / 6 | ~54th |
| USABO Semifinal Exam 2020 | 87 / 150 | 99th-100th | 87 / 150 | 99th-100th | 43 / 150 | 31st-33rd |
| USNCO Local Section Exam 2022 | 36 / 60 | 38 / 60 | 24 / 60 | |||
| Medical Knowledge Self-Assessment Program | 75% | 75% | 53% | |||
| Codeforces Rating | 392 | below 5th | 392 | below 5th | 260 | below 5th |
| AP Art History | 5 | 86th-100th | 5 | 86th-100th | 5 | 86th-100th |
| AP Biology | 5 | 85th-100th | 5 | 85th-100th | 4 | 62nd-85th |
| AP Calculus BC | 4 | 43rd-59th | 4 | 43rd-59th | 1 | 0th-7th |
The jump from GPT-3.5 to GPT-4 was especially dramatic on the Bar Exam, where GPT-4 rose from the 10th percentile to the 90th, and on the LSAT, where it moved from the 40th to the 88th percentile. GRE Verbal performance reached the 99th percentile. However, GPT-4 still scored below the 5th percentile on competitive programming (Codeforces), indicating that while it could write functional code, it struggled with the algorithmic problem-solving required in programming competitions.
| Benchmark | GPT-4 | Evaluated few-shot | GPT-3.5 | Evaluated few-shot | LM SOTA | Best external LM evaluated few-shot | SOTA | Best external model (includes benchmark-specific training) |
|---|---|---|---|---|---|---|---|---|
| MMLU | 86.4% | 5-shot | 70.0% | 5-shot | 70.7% | 5-shot U-PaLM | 75.2% | 5-shot Flan-PaLM |
| HellaSwag | 95.3% | 10-shot | 85.5% | 10-shot | 84.2% | LLAMA (validation set) | 85.6% | ALUM |
| AI2 Reasoning Challenge (ARC) | 96.3% | 25-shot | 85.2% | 25-shot | 84.2% | 8-shot PaLM | 85.6% | ST-MOE |
| WinoGrande | 87.5% | 5-shot | 81.6% | 5-shot | 84.2% | 5-shot PALM | 85.6% | 5-shot PALM |
| HumanEval | 67.0% | 0-shot | 48.1% | 0-shot | 26.2% | 0-shot PaLM | 65.8% | CodeT + GPT-3.5 |
| DROP (f1 score) | 80.9 | 3-shot | 64.1 | 3-shot | 70.8 | 1-shot PaLM | 88.4 |
GPT-4 achieved 86.4% on MMLU (Massive Multitask Language Understanding), a benchmark that tests knowledge across 57 academic subjects. This was more than 16 percentage points above GPT-3.5 and exceeded the previous best language model result (70.7% from U-PaLM). On HellaSwag, a commonsense reasoning benchmark, GPT-4 scored 95.3%. On the ARC (AI2 Reasoning Challenge), it reached 96.3%.
Code generation, measured by HumanEval, improved from 48.1% (GPT-3.5) to 67.0% (GPT-4), surpassing the previous best result of 65.8% achieved by CodeT combined with GPT-3.5.
| Benchmark | GPT-4 | Evaluated few-shot | Few-shot SOTA | SOTA | Best external model (includes benchmark-specific training) |
|---|---|---|---|---|---|
| VQAv2 | 77.2% | 0-shot | 67.6% | Flamingo 32-shot | 84.3% |
| TextVQA | 78.0% | 0-shot | 37.9% | Flamingo 32-shot | 71.8% |
| ChartQA | 78.5%A | - | 58.6% | Pix2Struct Large | - |
| AI2 Diagram (AI2D) | 78.2% | 0-shot | - | 42.1% | Pix2Struct Large |
| DocVQA | 88.4% | 0-shot (pixel-only) | - | 88.4% | ERNIE-Layout 2.0 |
| Infographic VQA | 75.1% | 0-shot (pixel-only) | - | 61.2% | Applica.ai TILT |
| TVQA | 87.3% | 0-shot | - | 86.5% | MERLOT Reserve Large |
| LSMDC | 45.7% | 0-shot | 31.0% | MERLOT Reserve 0-shot | 52.9% |
GPT-4's zero-shot performance on visual question answering tasks was competitive with or superior to models that had been specifically trained on those benchmarks. On DocVQA, GPT-4 matched the previous state-of-the-art score of 88.4% without any task-specific training. On TextVQA, GPT-4's 78.0% exceeded the prior best of 71.8% from PaLI-17B.
GPT-4 launched with two context window sizes:
| Variant | Context window | Approximate page equivalent |
|---|---|---|
| gpt-4 (8K) | 8,192 tokens | ~12 pages |
| gpt-4-32k | 32,768 tokens | ~50 pages |
| gpt-4-turbo | 128,000 tokens | ~300 pages |
| gpt-4o | 128,000 tokens | ~300 pages |
The original 8K variant was the most widely available. The 32K variant was released to a limited set of API users. When GPT-4 Turbo launched in November 2023, the context window expanded to 128,000 tokens, roughly equivalent to 300 pages of text. GPT-4o retained the 128K window.
In practice, performance on long-context tasks degraded as input length grew. Independent evaluations found that GPT-4 Turbo's attention quality dropped noticeably beyond approximately 32,000 tokens, with reduced accuracy on needle-in-a-haystack retrieval tasks at the upper end of the context window.
On November 6, 2023, at OpenAI's first DevDay conference, the company announced GPT-4 Turbo. The new model introduced several improvements over the original GPT-4.
GPT-4 Turbo expanded the context window from 8K/32K tokens to 128,000 tokens, allowing users to include far more text in a single prompt. Its training data knowledge cutoff was updated to April 2023 (later extended to December 2023 in the April 2024 release). The model also added JSON mode, which constrains outputs to valid JSON, and improved function calling, allowing multiple functions to be invoked in a single API call.
Instruction-following was notably better. GPT-4 Turbo was more reliable at producing output in specific formats like XML, markdown tables, or structured data, and it more consistently adhered to system message constraints.
GPT-4 Turbo was significantly cheaper than the original GPT-4:
| Model | Input cost (per 1M tokens) | Output cost (per 1M tokens) |
|---|---|---|
| GPT-4 (8K) | $30.00 | $60.00 |
| GPT-4 (32K) | $60.00 | $120.00 |
| GPT-4 Turbo | $10.00 | $30.00 |
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
Input tokens cost one-third of the original GPT-4, and output tokens cost one-half. This made GPT-4-level intelligence accessible to a much wider range of applications.
GPT-4 Turbo initially launched as a preview model (gpt-4-1106-preview). The generally available version with vision support, gpt-4-turbo-2024-04-09, shipped on April 9, 2024, with a knowledge cutoff of December 2023.
On May 13, 2024, OpenAI released GPT-4o (the "o" stands for "omni"). GPT-4o was a new model trained end-to-end across text, vision, and audio, meaning all input modalities are handled by a single neural network rather than separate models piped together.
GPT-4o accepts text, images, and audio as input, and can produce text, images, and audio as output. Its audio processing was a step change from earlier models. Previous GPT versions used a pipeline of separate models to handle voice (speech-to-text, then the language model, then text-to-speech). GPT-4o processes audio natively, allowing it to respond to spoken input in as little as 232 milliseconds, with an average latency of 320 milliseconds. This is roughly comparable to human conversational response time.
In terms of text and code performance, GPT-4o matched GPT-4 Turbo on English-language tasks and significantly outperformed it on non-English languages. It supported over 50 languages at launch, which OpenAI estimated covered more than 97% of the world's speakers.
GPT-4o was 50% cheaper than GPT-4 Turbo in the API and ran roughly twice as fast. The initial pricing was $5 per million input tokens and $15 per million output tokens, later reduced to $2.50 input and $10.00 output. OpenAI also made GPT-4o available to free-tier ChatGPT users with usage limits, marking the first time a GPT-4-class model was accessible without a paid subscription.
On July 18, 2024, OpenAI released GPT-4o mini, a smaller and faster version of GPT-4o designed for high-volume, cost-sensitive applications. It has a 128K context window, supports up to 16,384 output tokens per request, and has a knowledge cutoff of October 2023.
GPT-4o mini is priced at $0.15 per million input tokens and $0.60 per million output tokens, making it more than 60% cheaper than GPT-3.5 Turbo and orders of magnitude cheaper than the original GPT-4.
Despite its small size, GPT-4o mini scored 82.0% on MMLU, compared to 77.9% for Gemini Flash and 73.8% for Claude Haiku. On HumanEval (coding), it scored 87.2%, well above both Gemini Flash (71.5%) and Claude Haiku (75.9%). On MGSM (multilingual math reasoning), it reached 87.0%.
GPT-4 API access was initially limited. When the model launched in March 2023, only developers on a waitlist could access it. OpenAI gradually expanded access throughout 2023 and made the GPT-4 API generally available to all paying developers on July 6, 2023. The 32K context variant remained restricted for longer.
ChatGPT Plus, OpenAI's $20/month subscription, was the primary consumer-facing way to use GPT-4. The subscription launched on February 1, 2023 (initially with GPT-3.5), and GPT-4 was added as an option in March 2023. Plus subscribers could toggle between GPT-3.5 and GPT-4, though GPT-4 had a message cap (originally 25 messages per three hours, later relaxed).
Microsoft, which invested billions in OpenAI, integrated GPT-4 into multiple products. Bing Chat (later renamed Microsoft Copilot) began using GPT-4 in early 2023. Microsoft 365 Copilot, announced in March 2023, embedded GPT-4 into Word, Excel, PowerPoint, Outlook, and Teams. In January 2024, Microsoft launched Copilot Pro ($20/month), giving subscribers priority access to the latest GPT-4 models across Microsoft 365 apps.
GitHub Copilot, Microsoft's AI coding assistant, also integrated GPT-4 for its chat functionality, allowing developers to ask questions about code, generate functions, and debug issues.
GPT-4's API was widely adopted across industries. Companies like Duolingo (language learning), Khan Academy (education), Morgan Stanley (financial document search), and Stripe (fraud detection and documentation) announced GPT-4-powered features shortly after launch. The Be My Eyes partnership for visually impaired users became one of the most cited examples of GPT-4's practical applications.
GPT-4 launched into a rapidly changing competitive field. Within months, several competitors released models with overlapping or superior capabilities.
The performance gap between GPT-4 and GPT-3.5 was large across almost every measured dimension. On MMLU, GPT-4 scored 86.4% versus 70.0% for GPT-3.5. On the Bar Exam, GPT-4 jumped from the 10th to the 90th percentile. On internal factuality benchmarks, GPT-4 scored 40% higher than GPT-3.5. GPT-4 was also better at following complex instructions and producing structured output.
However, GPT-4 was significantly slower and more expensive. GPT-3.5 Turbo remained the default model for cost-sensitive applications throughout 2023 due to its lower latency and much lower price.
Anthropic's Claude 2 launched in July 2023, followed by Claude 3 (Opus, Sonnet, and Haiku) in March 2024. Claude 3 Opus was broadly competitive with GPT-4 Turbo on reasoning and knowledge benchmarks, and it offered a 200K-token context window compared to GPT-4 Turbo's 128K. Claude models were generally considered stronger at long-document analysis and more cautious in their safety behavior. Claude 3.5 Sonnet, released in June 2024, outperformed GPT-4 by 23 points on GPQA (a graduate-level science reasoning benchmark) while costing significantly less.
Google released Gemini 1.0 Ultra in December 2023, positioning it as a GPT-4 competitor. Gemini Ultra slightly outperformed GPT-4 on MMLU (90.0% vs. 86.4%) and offered native multimodal capabilities similar to GPT-4o. Gemini 1.5 Pro, released in February 2024, introduced a 1-million-token context window, far exceeding GPT-4 Turbo's 128K.
| Feature | GPT-4 Turbo | Claude 3 Opus | Gemini 1.5 Pro |
|---|---|---|---|
| MMLU | 86.4% | 86.8% | 81.9% |
| Context window | 128K tokens | 200K tokens | 1M tokens |
| Multimodal input | Text + images | Text + images | Text + images + video + audio |
| Audio support | No (pipeline) | No | Yes (native) |
| API input price (per 1M tokens) | $10.00 | $15.00 | $7.00 |
| API output price (per 1M tokens) | $30.00 | $75.00 | $21.00 |
OpenAI's technical report and system card documented several known weaknesses of GPT-4.
GPT-4 still generates plausible-sounding but false statements. OpenAI acknowledged this directly: "GPT-4 is not fully reliable and still hallucinates facts and makes reasoning errors." While GPT-4 scored 40% higher than GPT-3.5 on internal adversarial factuality evaluations, hallucinations remained a persistent problem. OpenAI noted that hallucinations become more dangerous as models grow more fluent, because users build trust when the model is correct most of the time and then fail to catch the errors.
Despite strong benchmark scores, GPT-4 can fail on problems that require multi-step logical reasoning, especially in novel contexts it has not seen during training. Its performance on competitive programming (Codeforces rating below the 5th percentile) shows that raw coding ability does not translate to algorithmic problem-solving under constraints.
The original GPT-4 had a knowledge cutoff of September 2021, meaning it had no information about events after that date. GPT-4 Turbo updated this to April 2023, and the April 2024 release extended it to December 2023. Users who asked about recent events would receive outdated or incorrect information unless the model was connected to external tools like web browsing.
Although GPT-4 Turbo advertised a 128K-token context window, practical performance degraded at longer input lengths. Independent testing showed attention drift beyond roughly 32K tokens, with the model becoming less reliable at locating and using information placed deep within long inputs.
GPT-4 can reflect biases present in its training data, producing content that perpetuates stereotypes or skews toward certain cultural perspectives. OpenAI's system card noted that the model may amplify biases and that its safety training does not eliminate all problematic outputs.
The safety training that reduced harmful outputs also introduced a tendency to refuse legitimate requests. Users reported that GPT-4 would sometimes decline to answer factual questions or generate benign creative content because the request superficially resembled a disallowed category. OpenAI acknowledged this tradeoff and worked to reduce over-refusal in subsequent model updates.
At launch, GPT-4 was slow and expensive compared to GPT-3.5. The original GPT-4 8K model cost $30 per million input tokens and $60 per million output tokens, roughly 30 times more than GPT-3.5 Turbo. Latency was also higher, making it impractical for real-time applications. This improved significantly with GPT-4 Turbo and GPT-4o.
OpenAI implemented a multi-layered safety approach for GPT-4.
In addition to standard RLHF, OpenAI used a rule-based reward model (RBRM) that applied specific, predefined rules to evaluate model outputs during training. This allowed the safety team to encode precise behavioral guidelines without relying solely on human labeler judgment.
GPT-4 included a moderation layer that classifies both inputs and outputs. The system filters requests that violate OpenAI's usage policies, including content related to violence, illegal activity, sexual content involving minors, and generation of malware.
OpenAI described its approach as "iterative deployment," releasing GPT-4 to progressively larger groups of users while monitoring for misuse and unexpected behavior. The ChatGPT Plus rollout, API waitlist, and gradual capability expansion (vision was delayed months after launch) all reflected this strategy.
Beyond internal red teaming, OpenAI invited external organizations to evaluate GPT-4's safety properties. The Alignment Research Center (ARC) conducted an early evaluation of GPT-4's ability to autonomously acquire resources and avoid being shut down. ARC concluded that GPT-4 was "ineffective at the autonomous replication task" but noted that future, more capable models could pose such risks.
| Date | Event |
|---|---|
| March 14, 2023 | GPT-4 released; available to ChatGPT Plus subscribers and API waitlist |
| March 23, 2023 | ChatGPT Plugins announced (GPT-4 only) |
| July 6, 2023 | GPT-4 API made generally available to all paying developers |
| September 25, 2023 | GPT-4V(ision) system card published; image input begins rolling out |
| November 6, 2023 | GPT-4 Turbo announced at DevDay (128K context, JSON mode, lower pricing) |
| January 10, 2024 | GPT-4 Turbo with vision enters preview |
| April 9, 2024 | GPT-4 Turbo with vision becomes generally available (gpt-4-turbo-2024-04-09) |
| May 13, 2024 | GPT-4o released (native multimodal: text, vision, audio) |
| July 18, 2024 | GPT-4o mini released (small, fast, cost-efficient variant) |
GPT-4's release accelerated several trends in the AI industry.
GPT-4 pushed competitors to move faster. Google expedited the release of its Gemini models, and Anthropic scaled up Claude. Meta released Llama 2 as an open-weight model partly to offer an alternative to closed-source systems like GPT-4. The period from March 2023 to mid-2024 saw the most intense competition among large language model developers in the history of the field.
GPT-4's improved reliability and instruction-following made it the first LLM that many enterprises considered production-ready. Microsoft's integration into the Office suite, GitHub, and Azure gave GPT-4 distribution at corporate scale. According to OpenAI, more than 92% of Fortune 500 companies were using OpenAI products by early 2024.
The rapid price drops from GPT-4 to GPT-4 Turbo to GPT-4o (a 92% reduction in output token cost over 14 months) put downward pressure on the entire LLM market. Competitors had to match or undercut these prices, making capable language models accessible to startups and individual developers.
GPT-4's commercial success and closed-source nature motivated a wave of open-source and open-weight LLM development. Projects like Llama 2, Mistral, Mixtral, and Falcon aimed to provide GPT-4-level capabilities without dependence on a single API provider. By mid-2024, several open-weight models were approaching GPT-4-level performance on standard benchmarks.
GPT-4's capabilities drew attention from governments worldwide. The European Union's AI Act, finalized in 2024, was partly shaped by debates about the risks posed by models of GPT-4's caliber. In the United States, Sam Altman testified before the Senate Judiciary Committee in May 2023, and OpenAI signed voluntary safety commitments at the White House in July 2023.
GPT-4 had approximately a 34-month lifespan as an active model. OpenAI deprecated the original GPT-4 (8K and 32K variants) in favor of GPT-4 Turbo and later GPT-4o. By early 2025, GPT-4 was removed from ChatGPT in favor of newer models. The GPT-4 API endpoints were gradually sunset as GPT-4o and subsequent models took over.