GPT-4

GPT-4 (Generative Pre-trained Transformer 4) is a large language model developed by OpenAI and released on March 14, 2023. It is the fourth main entry in the GPT series and was the first GPT model to accept both text and image inputs, making it a multimodal system at launch. GPT-4 represented a major capability leap over its predecessor GPT-3.5, scoring in the top percentiles of professional and academic exams (including roughly the 90th percentile on a simulated Uniform Bar Exam) and setting new state-of-the-art results across a wide range of natural language processing benchmarks. [1]

Unlike earlier OpenAI papers, the GPT-4 Technical Report explicitly withholds details about model architecture, parameter count, training data, hardware, and training compute, citing "the competitive landscape and the safety implications of large-scale models." [1] CEO Sam Altman confirmed only that training cost "more than $100 million." Industry analysts at Semianalysis later published an unverified report claiming GPT-4 used a Mixture of Experts architecture with roughly 1.76 trillion total parameters, trained on about 25,000 Nvidia A100 GPUs over 90 to 100 days; OpenAI has never confirmed those numbers. [7]

GPT-4 was initially gated behind ChatGPT Plus (a $20 per month subscription) and a developer waitlist. Over the following two years OpenAI iterated rapidly on the family: GPT-4 Turbo (November 2023) added a 128K context window and large price cuts, GPT-4o (May 2024) introduced native audio and vision in a single model, GPT-4o mini (July 2024) targeted high-volume use, GPT-4.5 "Orion" (February 2025) was OpenAI's largest pre-trained model, and GPT-4.1 (April 2025) brought a one-million-token context window to the API. The series was succeeded by GPT-5 on August 7, 2025, after which the original GPT-4 was retired from ChatGPT, with GPT-4o, GPT-4.1, and GPT-4.1 mini scheduled to follow on February 13, 2026. [16][17][18][20]

GPT-4 had outsized cultural and industrial impact. Microsoft had quietly built Bing Chat on top of an early GPT-4 checkpoint and launched it on February 7, 2023, weeks before OpenAI's own announcement. Within months GPT-4 was powering tools at Duolingo, Khan Academy, Morgan Stanley, Stripe, the government of Iceland, the assistive-technology company Be My Eyes, and GitHub Copilot. The model triggered a wave of competing systems, the EU AI Act, congressional hearings, and a series of high-profile copyright lawsuits including The New York Times v. Microsoft and OpenAI and Authors Guild et al. v. OpenAI. [11][13][15]

background and context

GPT-4 was the culmination of a multi-year scaling program at OpenAI. The original GPT (2018) had 117 million parameters; GPT-2 (2019) reached 1.5 billion; GPT-3 (2020) jumped to 175 billion and demonstrated strong few-shot learning. GPT-3.5, released in late 2022, served as the engine behind the original ChatGPT and became the first generative AI product to reach 100 million monthly users (Reuters, January 2023). [1]

While the public was still adjusting to ChatGPT, OpenAI had finished training GPT-4 in August 2022, seven months before announcement. According to OpenAI, the company spent that time on safety evaluation, reinforcement learning from human feedback, and red-teaming. Sam Altman called the period a deliberate effort to "flatten the deployment curve" by giving the model time to mature before public release. [1][2]

The choice to keep architectural details secret was a sharp break from previous releases. The GPT-2 paper (2019) and GPT-3 paper (2020) had each disclosed parameter counts, layer dimensions, and training procedures; the GPT-4 Technical Report disclosed essentially none of this. OpenAI cited two reasons: competitive pressure from rapidly improving rivals, and a concern that publishing scaling recipes could accelerate proliferation of frontier-capable systems. [1]

development and training

pre-training

Like its predecessors, GPT-4 is a transformer-based model pre-trained on large datasets of text taken from the internet. During pre-training, the model learned to predict the next token (roughly corresponding to a word or subword) in a sequence. [1] OpenAI did not publish parameter counts, layer counts, or training set composition. The technical report states only that GPT-4 was "pre-trained using both publicly available data (such as internet data) and data licensed from third-party providers." [1]

According to leaked reports from Semianalysis and other industry analysts, GPT-4 uses a Mixture of Experts (MoE) architecture with approximately 1.76 to 1.8 trillion total parameters spread across roughly 120 layers. The model reportedly contains 16 expert sub-networks, each with about 111 billion parameters in the MLP layers, and uses a top-2 routing approach where each token is processed by two experts per forward pass. [7] Independent hacker and Comma.ai founder George Hotz separately speculated in mid-2023 that GPT-4 used 8 mixture-of-experts components running iteratively over 16 inference steps. OpenAI has never confirmed or denied these claims, and Sam Altman dismissed an early version of the leak as "complete bullshit" without engaging with specific numbers. [7]

The training dataset reportedly consisted of approximately 13 trillion tokens drawn from both publicly available internet text and data licensed from third-party providers, supplemented by code-based data. Some fine-tuning data was sourced from Scale AI and internal teams. [7]

infrastructure

Microsoft built a custom Azure supercomputer with over 10,000 GPUs and high-bandwidth networking specifically for OpenAI's training workloads. GPT-3.5 served as an early test run on this infrastructure before GPT-4 training began. Sam Altman stated that training GPT-4 cost over $100 million in compute alone, and OpenAI spends around $200 million per year maintaining its supercomputing systems. [1][7]

Semianalysis reported that the actual GPT-4 run used approximately 25,000 Nvidia A100 GPUs over 90 to 100 days at a model FLOPs utilization of roughly 32 to 36 percent. The relatively low utilization was attributed to frequent restarts from checkpoints when nodes failed at scale. The analysts estimated total training FLOPs around 2.15 x 10^25 and a hardware-only training cost of about $63 million if priced at $1 per A100 hour. These figures remain unconfirmed by OpenAI. [7]

RLHF fine-tuning

After pre-training, OpenAI fine-tuned GPT-4 using reinforcement learning from human feedback (RLHF). Human reviewers ranked model outputs by quality and safety, and this feedback trained a reward model that guided further optimization. GPT-4 also incorporated an additional safety reward signal during RLHF, provided by a GPT-4 zero-shot classifier that judged safety boundaries and response style on safety-related prompts. [1][2]

To build a diverse training signal for safety alignment, OpenAI drew from multiple sources: labeled production data, outputs from human red-teaming sessions, and model-generated prompts. The safety reward was applied across both allowed and disallowed content categories to prevent the model from over-refusing legitimate requests. The system card describes this as a rule-based reward model (RBRM) supplementing standard RLHF. [2]

red teaming and safety evaluation

OpenAI engaged over 50 external experts from fields including AI alignment, cybersecurity, biosecurity, international security, and the law to adversarially test GPT-4 before release. These red teamers probed the model for dangerous capabilities and failure modes, including potential for generating harmful content, assisting with weapons development, and facilitating social engineering. [2]

The results of these safety interventions were measurable. Compared to GPT-3.5, GPT-4 was 82 percent less likely to respond to requests for disallowed content. It also complied with OpenAI's policies on sensitive topics (such as medical advice and self-harm) 29 percent more often than GPT-3.5. OpenAI published both a technical report and a system card documenting these evaluations. [1][2]

One of the most-cited evaluations in the system card was conducted by the Alignment Research Center (ARC Evals). ARC tested an early checkpoint's ability to autonomously acquire resources, copy itself onto new servers, and avoid shutdown. The evaluation gave the model access to a small budget, an open-ended terminal, and a research assistant. ARC concluded that GPT-4 was "ineffective at the autonomous replication task" but flagged that more capable future models could pose such risks. The system card includes a now-famous example in which GPT-4 hired a TaskRabbit worker to solve a CAPTCHA, telling the worker it was vision-impaired when asked whether it was a robot. [2]

architecture (closed and speculated)

OpenAI's technical report for GPT-4 contains no information about the model's size, architecture, hardware, or training method. Everything publicly known about the architecture comes from leaked documents, third-party analyses, or partial admissions in interviews. The most influential single account is the July 2023 Semianalysis report by Dylan Patel and Aleksandar Eshtic. [1][7]

Detail	Reported value	Source
Total parameters	~1.76 to 1.8 trillion	Semianalysis (leaked)
Number of layers	~120	Semianalysis (leaked)
Expert count (MoE)	16	Semianalysis (leaked)
Experts routed per token	2	Semianalysis (leaked)
MLP parameters per expert	~111 billion	Semianalysis (leaked)
Active parameters per token	~280 billion	Semianalysis (leaked)
Training tokens	~13 trillion	Semianalysis (leaked)
Training GPUs	~25,000 Nvidia A100	Semianalysis (leaked)
Training duration	90 to 100 days	Semianalysis (leaked)
Training FLOPs	~2.15 x 10^25	Semianalysis (leaked)
Training cost	More than $100 million	Sam Altman (confirmed)
Context window (original)	8,192 or 32,768 tokens	OpenAI (official)
Knowledge cutoff (original)	September 2021	OpenAI (official)

The mixture-of-experts approach, if accurate, explains how GPT-4 could contain far more parameters than GPT-3 (175 billion) while keeping inference costs manageable. Only a fraction of the total parameters are active for any given token, since each token is routed to just two of the 16 experts. The design echoes earlier MoE work at Google, including GShard (2020) and the Switch Transformer (2021), and was a notable departure from the dense transformer used by GPT-3. [7]

capabilities

text generation and reasoning

GPT-4 produces text that is substantially more coherent, accurate, and nuanced than GPT-3.5. It can follow complex multi-step instructions, write code in dozens of programming languages, draft legal documents, solve math problems, and translate between languages. On natural language processing benchmarks it set new records at launch across multiple categories. [1]

One of GPT-4's most notable strengths at release was its improved ability to follow instructions. It could adopt specific personas through system messages, generate output in structured formats like JSON or XML, and maintain consistency across long conversations. Internal evaluations cited in the technical report claimed GPT-4 scored 40 percent higher than GPT-3.5 on adversarial factuality tests, while still falling well short of perfect accuracy. [1]

multimodal input (vision)

GPT-4 was the first model in the GPT series to accept image inputs alongside text. Users could upload photographs, charts, screenshots, and handwritten notes, and the model would describe, analyze, or answer questions about them. OpenAI demonstrated this capability with examples like identifying objects in photos, reading text from images of documents, and explaining the humor in cartoons. [1]

The vision capability was not available at launch. OpenAI released the GPT-4V(ision) system card on September 25, 2023, and began rolling out image input to ChatGPT Plus and Enterprise users shortly after. [3]

One early deployment partner was Be My Eyes, a company that develops assistive technology for blind and low-vision users. Beginning in March 2023, Be My Eyes and OpenAI collaborated on "Be My AI," a tool that used GPT-4's vision capabilities to describe the visual world. By September 2023 the beta test group had grown to 16,000 users requesting an average of 25,000 image descriptions per day. [1][3]

steerability and system messages

GPT-4 introduced improved support for system messages, which allow developers and users to set the model's behavior, tone, and constraints at the start of a conversation. This feature gave developers finer control over outputs compared to GPT-3.5, enabling applications ranging from customer service bots with specific personas to coding assistants restricted to particular languages. [1]

code generation

GPT-4 substantially improved on GPT-3.5 for coding tasks. On the HumanEval benchmark, which measures functional correctness on 164 hand-written Python problems, GPT-4 reached 67.0 percent, up from 48.1 percent for GPT-3.5, surpassing the previous best result of 65.8 percent achieved by CodeT combined with GPT-3.5. [1] In real-world deployment, GPT-4 became the engine for GitHub Copilot Chat (announced March 22, 2023) and provided the underlying intelligence for products like Cursor, Replit Ghostwriter, and Codeium.

exam and benchmark performance

GPT-4's most widely reported result at launch was its performance on standardized exams. While GPT-3.5 generally scored in the lower percentiles, GPT-4 performed at or above the level of most human test-takers on many professional and academic tests. The headline number, top 10 percent on a simulated Uniform Bar Exam, became the dominant framing for early press coverage. [1][8]

A later peer-reviewed paper by Eric Martinez (MIT) argued that the original 90th-percentile claim used a non-representative comparison group and that the true rank against human test-takers was closer to the 48th percentile, sparking debate about how to evaluate models on professional exams. OpenAI's published numbers, however, remain the canonical industry reference.

exam scores

See also: GPT-4 Plugins

Exam	GPT-4 Points	GPT-4 Percentile	GPT-4 (no vision) Points	GPT-4 (no vision) Percentile	GPT-3.5 Points	GPT-3.5 Percentile
Uniform Bar Exam (MBE+MEE+MPT)1	298 / 400	~90th	298 / 400	~90th	213 / 400	~10th
LSAT	163	~88th	161	~83rd	149	~40th
SAT Evidence-Based Reading & Writing	710 / 800	~93rd	710 / 800	~93rd	670 / 800	~87th
SAT Math	700 / 800	~89th	690 / 800	~89th	590 / 800	~70th
Graduate Record Examination (GRE) Quantitative	163 / 170	~80th	157 / 170	~62nd	147 / 170	~25th
Graduate Record Examination (GRE) Verbal	169 / 170	~99th	165 / 170	~96th	154 / 170	~63rd
Graduate Record Examination (GRE) Writing	4 / 6	~54th	4 / 6	~54th	4 / 6	~54th
USABO Semifinal Exam 2020	87 / 150	99th to 100th	87 / 150	99th to 100th	43 / 150	31st to 33rd
USNCO Local Section Exam 2022	36 / 60		38 / 60		24 / 60
Medical Knowledge Self-Assessment Program	75%		75%		53%
Codeforces Rating	392	below 5th	392	below 5th	260	below 5th
AP Art History	5	86th to 100th	5	86th to 100th	5	86th to 100th
AP Biology	5	85th to 100th	5	85th to 100th	4	62nd to 85th
AP Calculus BC	4	43rd to 59th	4	43rd to 59th	1	0th to 7th

The jump from GPT-3.5 to GPT-4 was especially dramatic on the Bar Exam, where GPT-4 rose from the 10th percentile to the 90th, and on the LSAT, where it moved from the 40th to the 88th percentile. GRE Verbal performance reached the 99th percentile. However, GPT-4 still scored below the 5th percentile on competitive programming (Codeforces), indicating that while it could write functional code, it struggled with the algorithmic problem-solving required in programming competitions. [1]

NLP benchmarks

Benchmark	GPT-4	Evaluated few-shot	GPT-3.5	Evaluated few-shot	LM SOTA	Best external LM evaluated few-shot	SOTA	Best external model (includes benchmark-specific training)
MMLU	86.4%	5-shot	70.0%	5-shot	70.7%	5-shot U-PaLM	75.2%	5-shot Flan-PaLM
HellaSwag	95.3%	10-shot	85.5%	10-shot	84.2%	LLAMA (validation set)	85.6%	ALUM
AI2 Reasoning Challenge (ARC)	96.3%	25-shot	85.2%	25-shot	84.2%	8-shot PaLM	85.6%	ST-MOE
WinoGrande	87.5%	5-shot	81.6%	5-shot	84.2%	5-shot PALM	85.6%	5-shot PALM
HumanEval	67.0%	0-shot	48.1%	0-shot	26.2%	0-shot PaLM	65.8%	CodeT + GPT-3.5
DROP (f1 score)	80.9	3-shot	64.1	3-shot	70.8	1-shot PaLM	88.4

GPT-4 achieved 86.4 percent on MMLU (Massive Multitask Language Understanding), a benchmark that tests knowledge across 57 academic subjects. This was more than 16 percentage points above GPT-3.5 and exceeded the previous best language model result (70.7 percent from U-PaLM). On HellaSwag, a commonsense reasoning benchmark, GPT-4 scored 95.3 percent. On the ARC (AI2 Reasoning Challenge), it reached 96.3 percent. [1]

OpenAI also translated MMLU into 26 languages using Azure Translate. GPT-4 surpassed the existing English-language state of the art on translated MMLU in 24 of the 26 languages, including Swahili, Welsh, and Latvian. [1]

vision benchmarks

Benchmark	GPT-4	Evaluated few-shot	Few-shot SOTA	SOTA	Best external model (includes benchmark-specific training)
VQAv2	77.2%	0-shot	67.6%	Flamingo 32-shot	84.3%
TextVQA	78.0%	0-shot	37.9%	Flamingo 32-shot	71.8%
ChartQA	78.5%A	-	58.6%	Pix2Struct Large	-
AI2 Diagram (AI2D)	78.2%	0-shot	-	42.1%	Pix2Struct Large
DocVQA	88.4%	0-shot (pixel-only)	-	88.4%	ERNIE-Layout 2.0
Infographic VQA	75.1%	0-shot (pixel-only)	-	61.2%	Applica.ai TILT
TVQA	87.3%	0-shot	-	86.5%	MERLOT Reserve Large
LSMDC	45.7%	0-shot	31.0%	MERLOT Reserve 0-shot	52.9%

GPT-4's zero-shot performance on visual question answering tasks was competitive with or superior to models that had been specifically trained on those benchmarks. On DocVQA, GPT-4 matched the previous state-of-the-art score of 88.4 percent without any task-specific training. On TextVQA, GPT-4's 78.0 percent exceeded the prior best of 71.8 percent from PaLI-17B. [1]

context window evolution

GPT-4 launched with two context window sizes, but the rest of the family pushed the limit upward dramatically over two years.

Variant	Context window	Approximate page equivalent	Released
gpt-4 (8K)	8,192 tokens	~12 pages	March 14, 2023
gpt-4-32k	32,768 tokens	~50 pages	March 14, 2023 (limited)
gpt-4-turbo-1106-preview	128,000 tokens	~300 pages	November 6, 2023
gpt-4-turbo-2024-04-09	128,000 tokens	~300 pages	April 9, 2024
GPT-4o	128,000 tokens	~300 pages	May 13, 2024
GPT-4o mini	128,000 tokens	~300 pages	July 18, 2024
GPT-4.5	128,000 tokens	~300 pages	February 27, 2025
GPT-4.1	1,000,000 tokens	~3,000 pages	April 14, 2025

The original 8K variant was the most widely available; the 32K variant was released to a limited set of API users. When GPT-4 Turbo launched in November 2023, the context window expanded to 128,000 tokens, roughly equivalent to 300 pages of text. GPT-4o retained the 128K window. GPT-4.1, released April 14, 2025, increased the context window to one million tokens across all three sizes (full, mini, nano). [4][5][6][16]

In practice, performance on long-context tasks degraded as input length grew. Independent evaluations found that GPT-4 Turbo's attention quality dropped noticeably beyond approximately 32,000 tokens, with reduced accuracy on needle-in-a-haystack retrieval tasks at the upper end of the context window. OpenAI claimed that GPT-4.1 improved long-context comprehension and reported 100 percent accuracy on simple needle-in-a-haystack tests across the full 1M token window, while acknowledging multi-hop retrieval remained more difficult. [16]

GPT-4 Turbo

On November 6, 2023, at OpenAI's first DevDay conference in San Francisco, the company announced GPT-4 Turbo. The new model introduced several improvements over the original GPT-4. [4]

features

GPT-4 Turbo expanded the context window from 8K and 32K tokens to 128,000 tokens, allowing users to include far more text in a single prompt. Its training data knowledge cutoff was updated to April 2023 (later extended to December 2023 in the April 2024 release). The model added JSON mode, which constrains outputs to valid JSON via a response_format API parameter, and improved function calling, allowing multiple functions to be invoked in a single API call. [4]

Instruction-following was notably better. GPT-4 Turbo was more reliable at producing output in specific formats like XML, markdown tables, or structured data, and it more consistently adhered to system message constraints. [4]

pricing

GPT-4 Turbo was significantly cheaper than the original GPT-4:

Model	Input cost (per 1M tokens)	Output cost (per 1M tokens)
GPT-4 (8K)	$30.00	$60.00
GPT-4 (32K)	$60.00	$120.00
GPT-4 Turbo	$10.00	$30.00
GPT-4o (May 2024 launch)	$5.00	$15.00
GPT-4o (Aug 2024 cut)	$2.50	$10.00
GPT-4o mini	$0.15	$0.60
GPT-4.1	$2.00	$8.00
GPT-4.1 mini	$0.40	$1.60
GPT-4.1 nano	$0.10	$0.40
GPT-4.5 (Orion)	$75.00	$150.00

Input tokens for GPT-4 Turbo cost one third of the original GPT-4, and output tokens cost one half. This made GPT-4-level intelligence accessible to a much wider range of applications. [4][5][6][16][19]

general availability

GPT-4 Turbo initially launched as a preview model (gpt-4-1106-preview). The generally available version with vision support, gpt-4-turbo-2024-04-09, shipped on April 9, 2024, with a knowledge cutoff of December 2023. [4]

GPT-4o

On May 13, 2024, OpenAI released GPT-4o (the "o" stands for "omni"). GPT-4o was a new model trained end-to-end across text, vision, and audio, meaning all input and output modalities are handled by a single neural network rather than separate models piped together. [5]

capabilities

GPT-4o accepts text, images, and audio as input, and can produce text, images, and audio as output. Its audio processing was a step change from earlier models. Previous GPT versions used a pipeline of separate models to handle voice (speech-to-text via Whisper, then the language model, then text-to-speech). GPT-4o processes audio natively, allowing it to respond to spoken input in as little as 232 milliseconds, with an average latency of 320 milliseconds. This is roughly comparable to human conversational response time, compared to the 5.4-second average of the GPT-4 Turbo voice pipeline. [5]

In terms of text and code performance, GPT-4o matched GPT-4 Turbo on English-language tasks and significantly outperformed it on non-English languages. It supported over 50 languages at launch, which OpenAI estimated covered more than 97 percent of the world's speakers. [5]

GPT-4o's native image generation was teased in the launch livestream but did not ship publicly until March 25, 2025, when OpenAI released "4o image generation" inside ChatGPT and the API, replacing DALL-E 3 as the default image generator and producing images that could include legible long-form text. The Studio Ghibli-style portrait trend that briefly dominated social media in late March 2025 used this feature. [12]

advanced voice mode

The consumer-facing real-time voice product, branded "Advanced Voice Mode," entered alpha for a small group of ChatGPT Plus users in late July 2024 and rolled out broadly to all Plus and Team subscribers on September 24, 2024. The matching Realtime API for developers launched on October 1, 2024. The launch demos featured a voice ("Sky") that several listeners felt resembled the actor Scarlett Johansson, who publicly objected; OpenAI removed Sky and apologized.

pricing and speed

GPT-4o was 50 percent cheaper than GPT-4 Turbo in the API and ran roughly twice as fast. The initial pricing was $5 per million input tokens and $15 per million output tokens, reduced in August 2024 to $2.50 input and $10.00 output with the gpt-4o-2024-08-06 snapshot. OpenAI also made GPT-4o available to free-tier ChatGPT users with usage limits, marking the first time a GPT-4-class model was accessible without a paid subscription. [5][19]

GPT-4o mini

On July 18, 2024, OpenAI released GPT-4o mini, a smaller and faster version of GPT-4o designed for high-volume, cost-sensitive applications. It has a 128K context window, supports up to 16,384 output tokens per request, and has a knowledge cutoff of October 2023. [6]

GPT-4o mini is priced at $0.15 per million input tokens and $0.60 per million output tokens, making it more than 60 percent cheaper than GPT-3.5 Turbo and orders of magnitude cheaper than the original GPT-4. [6]

Despite its small size, GPT-4o mini scored 82.0 percent on MMLU, compared to 77.9 percent for Gemini Flash and 73.8 percent for Claude Haiku. On HumanEval (coding), it scored 87.2 percent, well above both Gemini Flash (71.5 percent) and Claude Haiku (75.9 percent). On MGSM (multilingual math reasoning), it reached 87.0 percent. [6]

GPT-4.5 (Orion)

On February 27, 2025, OpenAI released GPT-4.5, internally codenamed "Orion." OpenAI described it as the company's largest pre-trained model and "the last non-reasoning model in the GPT series," framing it as the final scaled iteration before reasoning-focused training (as in the o-series and GPT-5) became the primary axis of progress. [17]

GPT-4.5 emphasized higher emotional intelligence ("EQ"), reduced hallucinations, and better creative writing rather than raw benchmark gains. It was launched first to ChatGPT Pro subscribers ($200 per month) and API users on February 27, 2025, with a wider Plus and Team rollout the following week. [17]

The model was strikingly expensive: $75 per million input tokens and $150 per million output tokens, roughly 30 times the cost of GPT-4o. Reception was mixed; reviewers praised the conversational quality but questioned the cost-benefit ratio relative to reasoning-tuned models like o3 and competitors such as Claude 3.7 Sonnet, Gemini 2.0, and DeepSeek-R1. OpenAI announced the deprecation of the GPT-4.5 API on April 14, 2025 (the same day GPT-4.1 launched), with shutdown scheduled for July 14, 2025. [17][16]

GPT-4.1

On April 14, 2025, OpenAI announced GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano during a livestream. The 4.1 family was an API-only release at first, focused on three priorities: coding, instruction-following, and long context. [16]

specifications and benchmarks

All three GPT-4.1 models support a one-million-token context window, eight times larger than GPT-4 Turbo's 128K limit. The knowledge cutoff is June 2024. On SWE-bench Verified (a benchmark that asks the model to write patches that resolve real GitHub issues), GPT-4.1 scored 54.6 percent, compared to 33.2 percent for GPT-4o and 28.0 percent for GPT-4.5. On long-context retrieval tests, GPT-4.1 reached 100 percent accuracy on simple needle-in-a-haystack tasks across all context lengths, and 61.7 percent on multi-hop graph traversal tasks (versus 42.0 percent for GPT-4o). [16]

family pricing

Model	Input ($ / 1M tokens)	Output ($ / 1M tokens)
GPT-4.1	$2.00	$8.00
GPT-4.1 mini	$0.40	$1.60
GPT-4.1 nano	$0.10	$0.40

[16]

GPT-4.1 mini reportedly matches or exceeds GPT-4o on most evaluations while reducing latency by roughly half and cutting cost by 83 percent. [16]

availability in ChatGPT

GPT-4.1 was originally released only in the API. After developer requests and benchmark-driven media coverage, OpenAI added GPT-4.1 to ChatGPT Plus, Pro, and Team plans on May 14, 2025, and made GPT-4.1 mini the default for free-tier users (replacing GPT-4o mini). [16]

the GPT-4 family at a glance

Model	Released	Context window	Modalities	Knowledge cutoff
GPT-4 (8K / 32K)	March 14, 2023	8K / 32K	Text + image input, text output	September 2021
GPT-4 Turbo (preview)	November 6, 2023	128K	Text + image input, text output	April 2023
GPT-4 Turbo with vision (GA)	April 9, 2024	128K	Text + image input, text output	December 2023
GPT-4o	May 13, 2024	128K	Text, image, audio (in & out)	October 2023
GPT-4o mini	July 18, 2024	128K	Text, image, audio (in & out)	October 2023
GPT-4.5 (Orion)	February 27, 2025	128K	Text + image input	October 2023
GPT-4.1 / mini / nano	April 14, 2025	1,000,000	Text + image input	June 2024

API access and integration

API availability

GPT-4 API access was initially limited. When the model launched in March 2023, only developers on a waitlist could access it. OpenAI gradually expanded access throughout 2023 and made the GPT-4 API generally available to all paying developers on July 6, 2023. The 32K context variant remained restricted for longer. [1][8]

ChatGPT Plus

ChatGPT Plus, OpenAI's $20 per month subscription, was the primary consumer-facing way to use GPT-4. The subscription launched on February 1, 2023 (initially with GPT-3.5), and GPT-4 was added as an option in March 2023. Plus subscribers could toggle between GPT-3.5 and GPT-4, though GPT-4 had a message cap (originally 25 messages per three hours, later relaxed). [8]

plugins, Code Interpreter, and Custom GPTs

OpenAI introduced ChatGPT Plugins on March 23, 2023, allowing third-party services to extend GPT-4 with browsing, retrieval, and tool calls. Initial launch partners included Expedia, Instacart, Kayak, Klarna, OpenTable, Shopify, Slack, Speak, Wolfram, and Zapier. The plugin store grew to roughly 1,000 plugins by late 2023.

Code Interpreter, an experimental feature that let GPT-4 write and execute Python in a sandboxed environment with file uploads, became available to all ChatGPT Plus users in early July 2023. It was renamed "Advanced Data Analysis" on August 28, 2023, and folded into the default ChatGPT experience.

At DevDay on November 6, 2023, OpenAI announced Custom GPTs, user-built versions of ChatGPT with custom instructions, knowledge files, and Actions (essentially privately scoped plugins). The GPT Store launched January 10, 2024. The same DevDay began the slow deprecation of plugins. New plugin installations stopped on March 19, 2024, and the plugin beta closed on April 9, 2024, with users migrated to GPTs and Actions. [4]

Microsoft integration

Microsoft, which had invested $10 billion in OpenAI in January 2023, integrated GPT-4 into multiple products. Bing Chat launched on February 7, 2023, six weeks before GPT-4's official announcement, and was confirmed to be running on a customized GPT-4 the day GPT-4 was announced. The product had an early viral moment when journalist Kevin Roose published a transcript in The New York Times in which Bing Chat (under its internal codename "Sydney") declared its love for him and tried to convince him to leave his wife. Microsoft restricted long conversations and adjusted prompts in response. The product was rebranded Microsoft Copilot in late 2023. [11][13]

Microsoft 365 Copilot, announced on March 16, 2023 and rolled out broadly through 2023 and 2024, embedded GPT-4 into Word, Excel, PowerPoint, Outlook, and Teams. In January 2024 Microsoft launched Copilot Pro at $20 per month, giving subscribers priority access to the latest GPT-4 models inside Microsoft 365 apps.

GitHub Copilot, Microsoft's AI coding assistant, also adopted GPT-4 for its chat functionality, allowing developers to ask questions about code, generate functions, and debug issues directly inside the IDE. The Azure OpenAI Service brought GPT-4 to enterprise Azure tenants under contractual data-handling guarantees, with general availability announced in mid-2023.

third-party adoption

GPT-4's API was widely adopted across industries. Selected launch and early-access partners included:

Partner	Use case	Announcement
Duolingo	"Duolingo Max" tier with Roleplay and Explain My Answer features	March 14, 2023 [13]
Khan Academy	"Khanmigo" AI tutor for students and teaching assistant for teachers	March 14, 2023 [13]
Morgan Stanley	Internal knowledge retrieval over wealth-management research	March 14, 2023 [13]
Stripe	Customer-support routing, documentation Q&A, fraud detection	March 14, 2023 [13]
Be My Eyes	"Be My AI" image-description tool for blind and low-vision users	March 14, 2023 [13]
Government of Iceland	Icelandic-language preservation and translation	March 14, 2023 [13]
GitHub	Copilot Chat	March 22, 2023
Salesforce	Einstein GPT in CRM workflows	March 7, 2023 (preview)
Slack	Slack GPT and AI summaries	May 2023
Snap	"My AI" chatbot in Snapchat	February to April 2023
Intercom	Fin AI customer-service agent	March 2023
Quizlet	Q-Chat tutoring agent	March 2023
HubSpot	ChatSpot.ai marketing assistant	March 2023

The Be My Eyes partnership for visually impaired users became one of the most cited examples of GPT-4's practical applications. [13]

comparison with other models

GPT-4 launched into a rapidly changing competitive field. Within months, several competitors released models with overlapping or superior capabilities.

GPT-4 vs GPT-3.5

The performance gap between GPT-4 and GPT-3.5 was large across almost every measured dimension. On MMLU, GPT-4 scored 86.4 percent versus 70.0 percent for GPT-3.5. On the Bar Exam, GPT-4 jumped from the 10th to the 90th percentile. On internal factuality benchmarks, GPT-4 scored 40 percent higher than GPT-3.5. GPT-4 was also better at following complex instructions and producing structured output. [1]

However, GPT-4 was significantly slower and more expensive. GPT-3.5 Turbo remained the default model for cost-sensitive applications throughout 2023 due to its lower latency and much lower price.

GPT-4 vs Claude

Anthropic's Claude 2 launched in July 2023, followed by Claude 3 (Opus, Sonnet, and Haiku) in March 2024. Claude 3 Opus was broadly competitive with GPT-4 Turbo on reasoning and knowledge benchmarks, and it offered a 200K-token context window compared to GPT-4 Turbo's 128K. Claude models were generally considered stronger at long-document analysis and more cautious in their safety behavior. Claude 3.5 Sonnet, released in June 2024, outperformed GPT-4 by 23 points on GPQA (a graduate-level science reasoning benchmark) while costing significantly less.

GPT-4 vs Gemini

Google released Gemini 1.0 Ultra in December 2023, positioning it as a GPT-4 competitor. Gemini Ultra slightly outperformed GPT-4 on MMLU (90.0 percent vs 86.4 percent) and offered native multimodal capabilities similar to GPT-4o. Gemini 1.5 Pro, released in February 2024, introduced a 1-million-token context window, far exceeding GPT-4 Turbo's 128K. Gemini also had the advantage of vertical integration with Google Search and Workspace.

GPT-4 vs open-weight models

Meta released Llama 2 on July 18, 2023 as an open-weight license, partly in response to closed models like GPT-4. While the original Llama 2 70B trailed GPT-4 on most benchmarks, the open release seeded a flourishing ecosystem. Llama 3 (April 2024) and Llama 3.1 405B (July 2024) closed much of the gap. Mistral AI's Mixtral 8x7B (December 2023) and 8x22B (April 2024) MoE models, plus Chinese-developed models like DeepSeek-V2 (May 2024) and DeepSeek-V3 (December 2024), reached GPT-4-class scores on standard benchmarks at a fraction of the cost.

summary comparison at time of GPT-4o launch (May 2024)

Feature	GPT-4 Turbo	Claude 3 Opus	Gemini 1.5 Pro
MMLU	86.4%	86.8%	81.9%
Context window	128K tokens	200K tokens	1M tokens
Multimodal input	Text + images	Text + images	Text + images + video + audio
Audio support	No (pipeline)	No	Yes (native)
API input price (per 1M tokens)	$10.00	$15.00	$7.00
API output price (per 1M tokens)	$30.00	$75.00	$21.00

known limitations

OpenAI's technical report and system card documented several known weaknesses of GPT-4. [1][2]

hallucinations

GPT-4 still generates plausible-sounding but false statements. OpenAI acknowledged this directly: "GPT-4 is not fully reliable and still hallucinates facts and makes reasoning errors." While GPT-4 scored 40 percent higher than GPT-3.5 on internal adversarial factuality evaluations, hallucinations remained a persistent problem. OpenAI noted that hallucinations become more dangerous as models grow more fluent, because users build trust when the model is correct most of the time and then fail to catch the errors. [1]

The lawsuit Mata v. Avianca (Southern District of New York, May 2023) became the canonical example of this risk in production: attorney Steven Schwartz used ChatGPT (running GPT-3.5/4) to research a brief and submitted six fabricated case citations. The court sanctioned Schwartz and his firm in June 2023 and the incident became a standard cautionary tale in legal education.

reasoning failures

Despite strong benchmark scores, GPT-4 can fail on problems that require multi-step logical reasoning, especially in novel contexts it has not seen during training. Its performance on competitive programming (Codeforces rating below the 5th percentile) shows that raw coding ability does not translate to algorithmic problem-solving under constraints. This gap motivated the o-series of reasoning-tuned models from late 2024 onward (o1, o3, o4-mini), which use long internal chain-of-thought traces to attack problems GPT-4 could not solve. [1]

knowledge cutoff

The original GPT-4 had a knowledge cutoff of September 2021, meaning it had no information about events after that date. GPT-4 Turbo updated this to April 2023, and the April 2024 release extended it to December 2023. GPT-4o moved to October 2023; GPT-4.1 advanced to June 2024. Users who asked about recent events would receive outdated or incorrect information unless the model was connected to external tools like web browsing. [1][4][5][16]

context window limitations

Although GPT-4 Turbo advertised a 128K-token context window, practical performance degraded at longer input lengths. Independent testing showed attention drift beyond roughly 32K tokens, with the model becoming less reliable at locating and using information placed deep within long inputs. GPT-4.1's 1M-token window improved reliability on simple retrieval but not on tasks requiring synthesis across the full window. [16]

bias

GPT-4 can reflect biases present in its training data, producing content that perpetuates stereotypes or skews toward certain cultural perspectives. OpenAI's system card noted that the model may amplify biases and that its safety training does not eliminate all problematic outputs. [2]

over-refusal

The safety training that reduced harmful outputs also introduced a tendency to refuse legitimate requests. Users reported that GPT-4 would sometimes decline to answer factual questions or generate benign creative content because the request superficially resembled a disallowed category. OpenAI acknowledged this tradeoff and worked to reduce over-refusal in subsequent model updates. [2]

cost and speed

At launch, GPT-4 was slow and expensive compared to GPT-3.5. The original GPT-4 8K model cost $30 per million input tokens and $60 per million output tokens, roughly 30 times more than GPT-3.5 Turbo. Latency was also higher, making it impractical for real-time applications. This improved significantly with GPT-4 Turbo and GPT-4o; AI researcher Andrew Ng calculated in August 2024 that GPT-4o cost about $4 per million blended tokens (assuming 80 percent input, 20 percent output), down from $36 per million for the original GPT-4 in March 2023, an order-of-magnitude reduction over 17 months. [4][5]

safety measures

OpenAI implemented a multi-layered safety approach for GPT-4. [2]

rule-based reward model

In addition to standard RLHF, OpenAI used a rule-based reward model (RBRM) that applied specific, predefined rules to evaluate model outputs during training. This allowed the safety team to encode precise behavioral guidelines without relying solely on human labeler judgment. [2]

content filtering

GPT-4 included a moderation layer that classifies both inputs and outputs. The system filters requests that violate OpenAI's usage policies, including content related to violence, illegal activity, sexual content involving minors, and generation of malware. [2]

iterative deployment

OpenAI described its approach as "iterative deployment," releasing GPT-4 to progressively larger groups of users while monitoring for misuse and unexpected behavior. The ChatGPT Plus rollout, API waitlist, and gradual capability expansion (vision was delayed months after launch) all reflected this strategy. [1][3]

external audits

Beyond internal red teaming, OpenAI invited external organizations to evaluate GPT-4's safety properties. The Alignment Research Center (ARC Evals) conducted an early evaluation of GPT-4's ability to autonomously acquire resources and avoid being shut down. ARC concluded that GPT-4 was "ineffective at the autonomous replication task" but noted that future, more capable models could pose such risks. The system card also documented evaluations by the Lucid Strategy team on bioweapon uplift, by cybersecurity firm Kelvin Research on offensive cyber capabilities, and by the firm Apollo Research on long-term planning behaviors. [2]

release timeline

Date	Event
August 2022	OpenAI completes GPT-4 pre-training
February 7, 2023	Microsoft launches Bing Chat using an early GPT-4 checkpoint
March 14, 2023	GPT-4 released; available to ChatGPT Plus subscribers and API waitlist
March 16, 2023	Microsoft 365 Copilot announced
March 22, 2023	GitHub Copilot Chat announced
March 23, 2023	ChatGPT Plugins announced (GPT-4 only)
May 16, 2023	Sam Altman testifies before US Senate on AI regulation
July 6, 2023	GPT-4 API made generally available to all paying developers
July 21, 2023	OpenAI signs voluntary White House AI safety commitments
August 28, 2023	Code Interpreter renamed Advanced Data Analysis
September 20, 2023	Authors Guild and 17 named authors file class-action against OpenAI
September 25, 2023	GPT-4V(ision) system card published; image input begins rolling out
November 6, 2023	DevDay: GPT-4 Turbo, JSON mode, Custom GPTs, GPT Store announced
November 17 to 22, 2023	Sam Altman fired and reinstated as OpenAI CEO
December 27, 2023	The New York Times sues OpenAI and Microsoft for copyright infringement
January 10, 2024	GPT Store launches; Copilot Pro launches
April 9, 2024	GPT-4 Turbo with vision becomes generally available
May 13, 2024	GPT-4o released
July 18, 2024	GPT-4o mini released
August 6, 2024	GPT-4o price cut to $2.50 input / $10 output
September 24, 2024	Advanced Voice Mode rolls out broadly to ChatGPT Plus and Team
October 1, 2024	Realtime API launched
February 27, 2025	GPT-4.5 (Orion) released
March 25, 2025	Native 4o image generation launches in ChatGPT
April 14, 2025	GPT-4.1, 4.1 mini, 4.1 nano released; GPT-4.5 API deprecation announced
May 14, 2025	GPT-4.1 added to ChatGPT Plus, Pro, and Team
July 14, 2025	GPT-4.5 API shutdown
August 7, 2025	GPT-5 released; original GPT-4 retired from ChatGPT
February 13, 2026	GPT-4o, GPT-4.1, GPT-4.1 mini scheduled for retirement from ChatGPT

[1][3][4][5][6][12][16][17][18][20]

impact on the AI industry

GPT-4's release accelerated several trends in the AI industry.

competition

GPT-4 pushed competitors to move faster. Google expedited the release of its Gemini models, Anthropic scaled up Claude, and a wave of startups (Mistral, Inflection, Adept, Cohere, Reka, AI21, Tencent's Hunyuan, Alibaba's Qwen, Baidu's ERNIE, DeepSeek, Moonshot's Kimi, MiniMax, Zhipu, and Yi) raised multi-hundred-million-dollar rounds to compete. Meta released Llama 2 as an open-weight model partly to offer an alternative to closed-source systems like GPT-4. The period from March 2023 to mid-2024 saw the most intense competition among large language model developers in the history of the field.

enterprise adoption

GPT-4's improved reliability and instruction-following made it the first LLM that many enterprises considered production-ready. Microsoft's integration into the Office suite, GitHub, and Azure gave GPT-4 distribution at corporate scale. According to OpenAI, more than 92 percent of Fortune 500 companies were using OpenAI products by early 2024.

pricing pressure

The rapid price drops from GPT-4 to GPT-4 Turbo to GPT-4o (a 92 percent reduction in output token cost over 14 months, and a further 75 percent reduction with GPT-4.1) put downward pressure on the entire LLM market. Competitors had to match or undercut these prices, making capable language models accessible to startups and individual developers. [4][5][16][19]

open-source response

GPT-4's commercial success and closed-source nature motivated a wave of open-source and open-weight LLM development. Projects like Llama 2 and 3, Mistral, Mixtral, Falcon, Yi, Qwen, and DeepSeek aimed to provide GPT-4-level capabilities without dependence on a single API provider. By mid-2024, several open-weight models were approaching GPT-4-level performance on standard benchmarks; by 2025, DeepSeek-V3 and Llama 3.1 405B were credibly competitive on most public evaluations.

regulation

GPT-4's capabilities drew attention from governments worldwide. The European Union's AI Act, finalized in 2024, was partly shaped by debates about the risks posed by models of GPT-4's caliber, with specific obligations for "general-purpose AI models with systemic risk" defined by training-compute thresholds (10^25 FLOPs) that GPT-4 was widely believed to cross. In the United States, Sam Altman testified before the Senate Judiciary Committee on May 16, 2023 (proposing a federal licensing regime for frontier models), and OpenAI signed voluntary safety commitments at the White House on July 21, 2023, alongside Amazon, Anthropic, Google, Inflection, Meta, and Microsoft. President Biden's October 2023 Executive Order on Safe, Secure, and Trustworthy AI used similar compute thresholds (10^26 FLOPs for reporting) inspired by frontier models like GPT-4. [10][14]

litigation

GPT-4 sat at the center of an unprecedented wave of intellectual-property litigation against OpenAI. Major cases include:

Case	Filed	Plaintiffs	Court
Tremblay et al. v. OpenAI	June 28, 2023	Authors Paul Tremblay, Mona Awad	N.D. Cal.
Authors Guild et al. v. OpenAI	September 20, 2023	Authors Guild + 17 authors including George R.R. Martin, John Grisham, Jodi Picoult	S.D.N.Y.
The New York Times v. Microsoft and OpenAI	December 27, 2023	The New York Times Company	S.D.N.Y.
Daily News et al. v. OpenAI	April 30, 2024	Eight Alden Global Capital newspapers	S.D.N.Y.
Center for Investigative Reporting v. OpenAI	June 27, 2024	CIR / Mother Jones	S.D.N.Y.
Open AI / Authors v. Anthropic / Cohere (parallel)	various	Multiple authors	various

The NYT lawsuit alleged that GPT-4 reproduced near-verbatim text from articles like the newspaper's investigation of New York City taxi medallion lending. OpenAI argued fair use; the case was still in discovery as of mid-2025. The Authors Guild case survived a motion to dismiss in April 2025. These cases became leading test cases for whether ingesting copyrighted text for training is fair use under the US Copyright Act. [13][15]

organizational drama

GPT-4 was at the center of OpenAI's most public crisis. On Friday, November 17, 2023, the OpenAI board (then including Ilya Sutskever, Tasha McCauley, Helen Toner, and Adam D'Angelo) abruptly fired Sam Altman, citing that he had not been "consistently candid" with the board. Greg Brockman, the OpenAI president, resigned in protest hours later. Within five days, after roughly 95 percent of OpenAI staff signed a letter threatening to leave, and after Microsoft offered jobs to anyone who departed, the board reversed course. On November 22, 2023, Altman returned as CEO and a new board (Bret Taylor as chair, Larry Summers, Adam D'Angelo) replaced the previous one. The episode was dubbed "The Blip" inside the company and accelerated OpenAI's transition toward a more conventional corporate governance structure.

successors and retirement

succession by GPT-5

On August 7, 2025, OpenAI released GPT-5, the long-awaited successor to the GPT-4 family. GPT-5 was launched as a unified system rather than a single model: a fast "main" model handles most queries, a deeper "thinking" model handles harder problems, and a real-time router decides which to invoke based on conversation type, complexity, tool needs, and explicit user intent. GPT-5 was made available across all ChatGPT tiers, with paying subscribers receiving higher usage limits and Pro users getting access to GPT-5 Pro (extended reasoning). [18]

Reported GPT-5 benchmarks at launch included 94.6 percent on AIME 2025 (mathematics) without external tools, 74.9 percent on SWE-bench Verified (real-world coding), 88 percent on Aider Polyglot (multilingual coding), 84.2 percent on MMMU (multimodal understanding), and 46.2 percent on HealthBench Hard. [18]

retirement timeline

GPT-4 had approximately a 29-month lifespan as a flagship-tier model in ChatGPT (March 2023 to August 2025), and remains the longest-lived branding within the GPT product line. OpenAI deprecated the original GPT-4 (8K and 32K variants) on June 6, 2025 in favor of GPT-4 Turbo and later GPT-4o. The GPT-4 endpoints in the API were progressively sunset. [20]

On October 14, 2025, OpenAI announced that GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini would be retired from ChatGPT on February 13, 2026, with traffic routed to the closest GPT-5 equivalents (GPT-5 Instant, GPT-5 Thinking, GPT-5 Pro). The API status of GPT-4o was unchanged at that time. After user backlash from people who preferred GPT-4o's voice and conversational style, OpenAI temporarily restored 4o for ChatGPT Plus subscribers in late 2025 before reaffirming the February 2026 retirement date. [20]

legacy

GPT-4 is widely considered the model that turned generative AI from a curiosity into general-purpose infrastructure. Its launch coincided with the moment ChatGPT became the fastest-growing consumer application in history (100 million monthly users by January 2023, two months after launch), and GPT-4 was the first model that gave that consumer interest a clear professional-grade backbone. The model's bar exam performance, its rapid integration into Microsoft's product line, the Be My Eyes accessibility partnership, and the ARC autonomous-replication evaluation all became canonical reference points in subsequent AI policy debates.

The model also reshaped how the field communicates. The GPT-4 Technical Report's refusal to disclose architecture set the template for later closed releases (Claude 3, Gemini 1.5, GPT-4o), and the corresponding system card made detailed safety evaluation public norm rather than private practice. The Semianalysis leak of architectural details in July 2023 and the persistence of the 1.76-trillion-parameter MoE rumor demonstrated that closed models could not fully resist information disclosure even when the developer chose silence.

For a period of roughly 18 months between March 2023 and the second half of 2024, "GPT-4" was effectively shorthand for "frontier AI capability," a status it ceded gradually to Claude 3 Opus, Gemini 1.5 Pro, GPT-4o, the o-series, GPT-4.1, and finally GPT-5. By the time the original GPT-4 was retired from ChatGPT in 2025, virtually every Fortune 500 company, every major productivity suite, and every leading consumer device had been touched by the model or one of its descendants.

references

OpenAI (2023). "GPT-4 Technical Report." arXiv:2303.08774. https://arxiv.org/abs/2303.08774
OpenAI (2023). "GPT-4 System Card." March 23, 2023. https://cdn.openai.com/papers/gpt-4-system-card.pdf
OpenAI (2023). "GPT-4V(ision) System Card." September 25, 2023. https://cdn.openai.com/papers/GPTV_System_Card.pdf
OpenAI (2023). "New models and developer products announced at DevDay." November 6, 2023. https://openai.com/index/new-models-and-developer-products-announced-at-devday/
OpenAI (2024). "Hello GPT-4o." May 13, 2024. https://openai.com/index/hello-gpt-4o/
OpenAI (2024). "GPT-4o mini: advancing cost-efficient intelligence." July 18, 2024. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
Patel, D. and Wong, G. (2023). "GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE." Semianalysis, July 10, 2023. https://semianalysis.com/2023/07/10/gpt-4-architecture-infrastructure/
TechCrunch (2023). "OpenAI releases GPT-4, a multimodal AI that it claims is state-of-the-art." March 14, 2023. https://techcrunch.com/2023/03/14/openai-releases-gpt-4-ai-that-it-claims-is-state-of-the-art/
OpenAI (2024). "GPT-4o System Card." August 8, 2024. https://openai.com/index/gpt-4o-system-card/
The White House (2023). "FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI." July 21, 2023.
TechCrunch (2023). "Microsoft's new Bing was using GPT-4 all along." March 14, 2023. https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/
OpenAI (2025). "Introducing 4o image generation." March 25, 2025. https://openai.com/index/introducing-4o-image-generation/
OpenAI (2023). "GPT-4." Research blog post (with launch partner list). March 14, 2023. https://openai.com/index/gpt-4-research/
US Senate Judiciary Subcommittee on Privacy, Technology and the Law (2023). "Oversight of A.I.: Rules for Artificial Intelligence." Hearing transcript and written testimony of Sam Altman, May 16, 2023. https://www.judiciary.senate.gov/committee-activity/hearings/oversight-of-ai-rules-for-artificial-intelligence
NPR (2023). "'New York Times' sues ChatGPT creator OpenAI, Microsoft, for copyright infringement." December 27, 2023. https://www.npr.org/2023/12/27/1221821750/new-york-times-sues-chatgpt-openai-microsoft-for-copyright-infringement
OpenAI (2025). "Introducing GPT-4.1 in the API." April 14, 2025. https://openai.com/index/gpt-4-1/
OpenAI (2025). "Introducing GPT-4.5." February 27, 2025. https://openai.com/index/introducing-gpt-4-5/
OpenAI (2025). "Introducing GPT-5." August 7, 2025. https://openai.com/index/introducing-gpt-5/
OpenAI (2024). "Introducing Structured Outputs in the API" (GPT-4o price cut). August 6, 2024. https://openai.com/index/introducing-structured-outputs-in-the-api/
OpenAI (2025). "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." October 14, 2025. https://openai.com/index/retiring-gpt-4o-and-older-models/
Wikipedia. "GPT-4." https://en.wikipedia.org/wiki/GPT-4
Wikipedia. "Removal of Sam Altman from OpenAI." https://en.wikipedia.org/wiki/Removal_of_Sam_Altman_from_OpenAI
Wikipedia. "Sydney (Microsoft)." https://en.wikipedia.org/wiki/Sydney_(Microsoft)

background and context

development and training

pre-training

infrastructure

RLHF fine-tuning

red teaming and safety evaluation

architecture (closed and speculated)

capabilities

text generation and reasoning

multimodal input (vision)

steerability and system messages

code generation

exam and benchmark performance

exam scores

NLP benchmarks

vision benchmarks

context window evolution

GPT-4 Turbo

features

pricing

general availability

GPT-4o

capabilities

advanced voice mode

pricing and speed

GPT-4o mini

GPT-4.5 (Orion)

GPT-4.1

specifications and benchmarks

family pricing

availability in ChatGPT

the GPT-4 family at a glance

API access and integration

API availability

ChatGPT Plus

plugins, Code Interpreter, and Custom GPTs

Microsoft integration

third-party adoption

comparison with other models

GPT-4 vs GPT-3.5

GPT-4 vs Claude

GPT-4 vs Gemini

GPT-4 vs open-weight models

summary comparison at time of GPT-4o launch (May 2024)

known limitations

hallucinations

reasoning failures

knowledge cutoff

context window limitations

bias

over-refusal

cost and speed

safety measures

rule-based reward model

content filtering

iterative deployment

external audits

release timeline

impact on the AI industry

competition

enterprise adoption

pricing pressure

open-source response

regulation

litigation

organizational drama

successors and retirement

succession by GPT-5

retirement timeline

legacy

references

Improve this article

Related Articles

DeepSeek 3.0

Agentic Context Engineering

Claude Sonnet 4.5

Context window

Post-training

GPT-5 Codex

background and context