GPT-3.5
Last reviewed
Apr 30, 2026
Sources
15 citations
Review status
Source-backed
Revision
v7 ยท 3,731 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Apr 30, 2026
Sources
15 citations
Review status
Source-backed
Revision
v7 ยท 3,731 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPT-3.5 is a family of large language models developed by OpenAI that served as the backbone of the original ChatGPT release on November 30, 2022. Representing a significant step forward from GPT-3, the GPT-3.5 series incorporated techniques from the InstructGPT research program, including supervised fine-tuning and reinforcement learning from human feedback (RLHF), to produce models that were substantially better at following user instructions and generating helpful responses. Through 2023 and much of 2024, GPT-3.5 was the most widely used large language model in the world, powering billions of conversations and API calls before being gradually superseded by newer, more capable models.
The story of GPT-3.5 begins with OpenAI's earlier work on aligning language models with human intent. In January 2022, OpenAI published its InstructGPT paper, which described a three-step process for training language models to follow instructions more reliably [1]. The process involved first collecting demonstration data from human labelers and using it for supervised fine-tuning (SFT), then training a reward model on human comparisons of model outputs, and finally optimizing the language model against that reward model using proximal policy optimization (PPO). Human evaluators preferred InstructGPT outputs over those of the base GPT-3 model 85% of the time, even when the InstructGPT variant used far fewer parameters [2].
The InstructGPT training pipeline established the methodology that would later be applied to GPT-3.5 and subsequent models. The process involved three distinct phases [1][2]:
Phase 1: Supervised Fine-Tuning (SFT). OpenAI hired a team of approximately 40 human labelers to write high-quality demonstrations of desired model behavior for a set of prompts. The SFT dataset contained approximately 13,000 training prompts, drawn from both the OpenAI API and labeler-written examples. The base GPT-3 model was fine-tuned on these demonstrations to produce an initial "instruction-following" model.
Phase 2: Reward Model Training. A separate dataset of approximately 33,000 prompts was used to train the reward model. For each prompt, the SFT model generated multiple outputs, and human labelers ranked them from best to worst. The reward model was trained to predict these human preference rankings, learning to assign higher scores to outputs that humans considered more helpful, accurate, and well-formatted.
Phase 3: PPO Reinforcement Learning. The SFT model was then further optimized using proximal policy optimization (PPO), a reinforcement learning algorithm, with the reward model providing the training signal. The PPO training set contained additional prompts without human labels. OpenAI found that mixing in additional pre-training gradients during PPO training helped prevent the model from losing its general capabilities while being optimized for instruction-following.
The results were striking: the 1.3 billion parameter InstructGPT model was preferred by human evaluators over the 175 billion parameter base GPT-3 model, despite having roughly 100 times fewer parameters. The RLHF training process required less than 2% of the computation and data needed for GPT-3's pre-training, demonstrating that alignment techniques could be remarkably efficient. InstructGPT also showed improvements in truthfulness and reductions in toxic output generation while maintaining minimal performance regressions on public NLP datasets.[1][2]
This three-step process (SFT, reward modeling, PPO) became the template that OpenAI applied to create GPT-3.5 and, later, GPT-4. It also established the RLHF methodology that other AI labs, including Anthropic and Google, would adopt for their own models.
OpenAI applied these alignment techniques to newer base models that had been trained on more recent data and with architectural refinements. On November 28, 2022, OpenAI introduced text-davinci-003, a completion model that demonstrated improved long-form writing quality and instruction-following ability. Two days later, on November 30, OpenAI publicly launched ChatGPT, which was built on a model from what the company termed the "GPT-3.5 series" [3]. The chatbot was released as a free research preview and became a cultural phenomenon almost overnight, reaching 1 million users within five days and 100 million users within two months, making it the fastest-growing consumer application in history at the time [4].
OpenAI has not published a dedicated technical report for GPT-3.5, and the company has disclosed relatively little about the precise architecture and training details. What is known is that GPT-3.5 builds on the decoder-only transformer architecture established by GPT-3, which in its largest configuration uses 175 billion parameters, 96 attention layers, and a batch size of 3.2 million tokens [5]. The GPT-3.5 models are believed to retain this general architecture while benefiting from improvements in training data, training procedures, and the RLHF alignment process described above.
The original GPT-3 was trained on a dataset composed primarily of filtered Common Crawl data (60%), supplemented by WebText2 (22%), Books1 (8%), Books2 (8%), and Wikipedia (3%), totaling roughly 300 billion tokens of training text [5]. The GPT-3.5 models were trained on data with a more recent cutoff (September 2021 for earlier versions, and later cutoffs for subsequent updates), though OpenAI has not released specific details about the composition of the GPT-3.5 training corpus.
The initial context window for GPT-3.5 models was 4,096 tokens, quadrupling GPT-3's 2,048-token limit. This was later expanded to 16,384 tokens with the introduction of dedicated 16K-context variants in June 2023 [6].
The GPT-3.5 family encompasses several distinct model variants that OpenAI released over the course of 2022 and 2023. These ranged from completion-style models accessed through the legacy Completions API to chat-oriented models designed for the Chat Completions API.
| Model | Release Date | Type | Context Window | Key Features |
|---|---|---|---|---|
| text-davinci-003 | November 28, 2022 | Completion | 4,097 tokens | Improved instruction following and long-form generation; last major InstructGPT-style completion model |
| gpt-3.5-turbo-0301 | March 1, 2023 | Chat | 4,097 tokens | First chat-optimized GPT-3.5 model; introduced the Chat Completions API format |
| gpt-3.5-turbo-0613 | June 13, 2023 | Chat | 4,097 tokens | Added function calling support; improved steerability |
| gpt-3.5-turbo-16k-0613 | June 13, 2023 | Chat | 16,384 tokens | Extended context variant; 4x the context of the base model |
| gpt-3.5-turbo-1106 | November 6, 2023 | Chat | 16,385 tokens | JSON mode support; parallel function calling; 16K context as default |
| gpt-3.5-turbo-0125 | January 25, 2024 | Chat | 16,385 tokens | Improved format-following accuracy; fixed non-English function call encoding bug |
| gpt-3.5-turbo-instruct | September 2023 | Completion | 4,097 tokens | Drop-in replacement for text-davinci-003; uses Completions API |
Released on November 28, 2022, text-davinci-003 was the most capable model in the GPT-3 Completions API lineup before the shift to chat-based models. It used the older Completions API format (single prompt in, single completion out) and was trained with RLHF techniques from the InstructGPT program. It excelled at long-form content generation and was widely used for text summarization, creative writing, and complex instruction following. OpenAI deprecated text-davinci-003 on January 4, 2024, recommending gpt-3.5-turbo-instruct as a direct replacement [7].
The launch of gpt-3.5-turbo on March 1, 2023, was a pivotal moment for the OpenAI API ecosystem. This model introduced the Chat Completions API format, which structures interactions as a sequence of messages with roles (system, user, assistant), mirroring the conversational interface of ChatGPT. It was significantly cheaper than text-davinci-003 and became the recommended model for most use cases. The initial version (gpt-3.5-turbo-0301) had a 4,097-token context window and training data through September 2021 [8].
On June 13, 2023, OpenAI released gpt-3.5-turbo-16k-0613, which quadrupled the context window to 16,384 tokens. This allowed developers to process roughly 20 pages of text in a single request, opening up use cases like document summarization, long-form analysis, and extended conversations that previously required workarounds like chunking or summarization chains [6]. The 16K model was priced at twice the per-token rate of the standard 4K model. Starting with the November 2023 update (gpt-3.5-turbo-1106), all gpt-3.5-turbo models defaulted to 16K context, rendering the separate 16K variant unnecessary.
Released in September 2023, gpt-3.5-turbo-instruct was designed as a drop-in replacement for text-davinci-003 and other legacy completion models. Unlike the chat-optimized turbo variants, this model uses the Completions API format and is optimized for single-turn instruction following rather than multi-turn conversation [7].
ChatGPT's launch on November 30, 2022 was one of the most consequential product launches in the history of technology. Built on GPT-3.5, the chatbot was released as a free "research preview" with minimal marketing, yet it immediately captured global attention.[3][4]
ChatGPT shattered every previous record for consumer application adoption:
| Milestone | Time to Reach | Previous Record Holder | Time |
|---|---|---|---|
| 1 million users | 5 days | 75 days | |
| 100 million monthly active users | 2 months | TikTok | 9 months |
| 57 million users | 1 month | - | - |
UBS analysts reported that "in 20 years following the Internet space, we cannot recall a faster ramp in a consumer Internet app." By February 2025, ChatGPT had reached 400 million weekly active users, and by late 2025, the figure exceeded 800 million.[4][14]
The launch triggered a cascade of effects across the technology industry and broader society:
The success of ChatGPT also validated the RLHF approach. While GPT-3 had been available through the API since 2020, it was ChatGPT's conversational polish, derived from RLHF training on GPT-3.5, that made the technology accessible and appealing to a mass audience.
One of GPT-3.5's most significant competitive advantages was its price. At a time when GPT-4 was available but extremely expensive, GPT-3.5 Turbo offered a compelling balance of capability and cost that made it the default choice for most production applications.
| Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Period |
|---|---|---|---|
| text-davinci-003 | $20.00 | $20.00 | 2022-2023 |
| gpt-3.5-turbo (initial) | $2.00 | $2.00 | March 2023 |
| gpt-3.5-turbo-16k | $3.00 | $4.00 | June 2023 |
| gpt-3.5-turbo-1106 | $1.00 | $2.00 | November 2023 |
| gpt-3.5-turbo-0125 | $0.50 | $1.50 | January 2024 |
| GPT-4 (8K, for comparison) | $30.00 | $60.00 | March 2023 |
| GPT-4o-mini (for comparison) | $0.15 | $0.60 | July 2024 |
By January 2024, the final GPT-3.5 Turbo variant (gpt-3.5-turbo-0125) was priced at $0.50 per million input tokens and $1.50 per million output tokens. This was 60x cheaper than GPT-4 on input and 40x cheaper on output, making GPT-3.5 the only economically viable option for high-volume applications like customer support bots, content generation pipelines, and real-time chat experiences [9].
OpenAI achieved these dramatic price reductions through a combination of model distillation, inference optimization, and hardware improvements. Each successive version of gpt-3.5-turbo brought lower prices alongside modest quality improvements, demonstrating a consistent trend of making the technology more accessible over time.
On August 22, 2023, OpenAI announced the availability of fine-tuning for gpt-3.5-turbo, marking the first time developers could customize a chat-optimized model through the OpenAI API [10]. This was a highly anticipated feature, as it allowed businesses and developers to adapt GPT-3.5 to specific domains, writing styles, or output formats without relying solely on prompt engineering.
Fine-tuning pricing was set at $8.00 per million training tokens, with inference costs of $3.00 per million input tokens and $6.00 per million output tokens for fine-tuned models. A typical fine-tuning job with 100,000 training tokens (roughly 75,000 words) would cost approximately $0.80 [10]. OpenAI noted that in early tests, a fine-tuned GPT-3.5 Turbo model could match or exceed the performance of base GPT-4 on narrow, well-defined tasks, offering a cost-effective alternative for specialized applications.
The availability of GPT-3.5 fine-tuning catalyzed a significant ecosystem of specialized applications. Common fine-tuning use cases included:
| Use Case | Description | Typical Benefit |
|---|---|---|
| Customer support | Training on company-specific Q&A pairs | Consistent brand voice, domain accuracy |
| Content generation | Fine-tuning on a publication's writing style | Tone consistency, reduced editing |
| Code generation | Training on internal code patterns and conventions | Adherence to coding standards |
| Data extraction | Structured output from unstructured text | Higher accuracy on specific formats |
| Classification | Routing and categorization tasks | Near-GPT-4 accuracy at fraction of cost |
The fine-tuning feature was particularly valuable because it allowed organizations to bridge the gap between GPT-3.5's base capabilities and GPT-4's performance on specific tasks without paying GPT-4 inference costs. For companies processing millions of API calls per day, this price difference translated to savings of tens of thousands of dollars monthly.
Fine-tuning supported the standard Chat Completions format and could handle up to 4,096 tokens per training example. OpenAI later extended fine-tuning support to GPT-4 and subsequent models, but GPT-3.5 Turbo fine-tuning remained popular due to its lower cost and faster training times.
GPT-3.5 served as the default model powering ChatGPT from its launch in November 2022 through July 2024. During this period, the free tier of ChatGPT used GPT-3.5 exclusively, while paid subscribers on the ChatGPT Plus plan ($20/month, introduced February 2023) gained access to GPT-4 with usage limits. This two-tier structure meant that the vast majority of ChatGPT interactions worldwide were handled by GPT-3.5, as most users remained on the free tier.
The model's speed was another advantage in the ChatGPT context. GPT-3.5 generated responses noticeably faster than GPT-4, providing a more fluid conversational experience for everyday tasks like brainstorming, writing assistance, and general Q&A. For many users, the speed difference outweighed GPT-4's superior reasoning capabilities.
With the November 2023 DevDay announcements, the GPT-3.5 Turbo model powering ChatGPT was updated to default to 16K context, improving the chatbot's ability to maintain coherence over longer conversations [11].
During its period as the primary ChatGPT model, GPT-3.5 powered an extraordinary volume of interactions. By mid-2023, ChatGPT users were sending hundreds of millions of messages per day. OpenAI reported that more than 2 million developers were using the ChatGPT API to power chatbots and digital assistants, making it one of the most widely adopted AI platforms in the world. Over 92% of Fortune 500 companies were utilizing OpenAI's products by 2024, with a significant portion of that usage running on GPT-3.5 Turbo due to its cost advantages.[14][15]
OpenAI's revenue grew from approximately $1 billion in 2023 to $3.7 billion in 2024, with GPT-3.5 powering a substantial share of the API volume that contributed to that growth. Since the launch of custom GPTs (GPT Builder) in late 2023, over 3 million custom chatbots were created, many built on GPT-3.5 Turbo as their underlying model.[15]
GPT-3.5 demonstrated strong performance across a wide range of natural language processing tasks, including:
However, GPT-3.5 had notable limitations compared to its successors. It was more prone to hallucination (generating plausible-sounding but incorrect information), less reliable at complex reasoning and multi-step problem solving, and weaker at following intricate or nuanced instructions. On benchmarks like the MMLU (Massive Multitask Language Understanding), GPT-3.5 scored around 70%, while GPT-4 achieved approximately 86% [12]. On coding benchmarks like HumanEval, GPT-3.5 solved roughly 48% of problems compared to GPT-4's 67% [12].
These gaps were significant enough that professional users, researchers, and developers working on tasks requiring high accuracy or complex reasoning typically preferred GPT-4 despite its higher cost. But for general-purpose conversational use, content drafting, and straightforward text processing, GPT-3.5 remained more than adequate for most users' needs.
OpenAI has followed a systematic deprecation schedule for GPT-3.5 model variants, progressively retiring older snapshot versions while maintaining the generic gpt-3.5-turbo endpoint.
| Model | Deprecated | Shutdown Date | Replacement |
|---|---|---|---|
| text-davinci-003 | July 2023 | January 4, 2024 | gpt-3.5-turbo-instruct |
| gpt-3.5-turbo-0301 | June 2023 | June 13, 2024 | gpt-3.5-turbo-0125 |
| gpt-3.5-turbo-0613 | November 2023 | June 13, 2024 | gpt-3.5-turbo-0125 |
| gpt-3.5-turbo-16k-0613 | November 2023 | June 13, 2024 | gpt-3.5-turbo-0125 |
| gpt-3.5-turbo-1106 | November 2023 | Ongoing | gpt-3.5-turbo-0125 |
The deprecation of GPT-3.5 snapshot models reflects OpenAI's broader strategy of consolidating around the latest model versions and encouraging migration to newer model families [7].
The deprecation of GPT-3.5 models created significant work for developers who had built applications around specific model snapshots. OpenAI's deprecation policy typically provided a minimum of three months' notice before shutting down access to a model version, but in practice, developers often needed to retune prompts, update integration code, and retest applications with the replacement model. Each model version had subtle differences in behavior, tone, and capability that could affect production applications.
The transition was particularly challenging for organizations that had invested in fine-tuned GPT-3.5 models. Fine-tuned model snapshots followed the same deprecation schedule as their base models, meaning that organizations had to re-run their fine-tuning jobs on newer base models and validate that the fine-tuned outputs still met their quality requirements.
On July 18, 2024, OpenAI released GPT-4o-mini, a compact, cost-optimized variant of GPT-4o that was explicitly positioned as the successor to GPT-3.5 Turbo [13]. GPT-4o-mini matched or exceeded GPT-3.5 Turbo on virtually every benchmark while being significantly cheaper ($0.15 per million input tokens vs. $0.50) and adding multimodal capabilities (the ability to process images alongside text).
With the release of GPT-4o-mini, OpenAI replaced GPT-3.5 as the default model for free-tier ChatGPT users. This transition marked the effective end of GPT-3.5's role as the workhorse model for OpenAI's consumer product, though the API endpoint remains available for existing integrations.
OpenAI's official recommendation as of mid-2024 is to use GPT-4o-mini as a drop-in replacement for gpt-3.5-turbo in all new projects, citing better performance, lower cost, and additional capabilities [13].
GPT-3.5 occupies a unique position in the history of artificial intelligence. While GPT-3 had demonstrated the potential of large language models in 2020, and GPT-4 would later push the boundaries of model capability, it was GPT-3.5, through ChatGPT, that brought conversational AI into the mainstream. The model's combination of reasonable quality, fast response times, and eventual low cost made it accessible to hundreds of millions of people worldwide.
Several aspects of GPT-3.5's legacy stand out:
Democratization of AI access. By powering the free tier of ChatGPT, GPT-3.5 gave ordinary users their first hands-on experience with a capable language model. This drove an enormous wave of public interest and experimentation that accelerated AI adoption across industries.
Validation of RLHF. The success of ChatGPT demonstrated that RLHF could transform a base language model into a product that millions of people found genuinely useful. The InstructGPT-to-GPT-3.5 pipeline became a template that other AI labs, including Anthropic and Google, adopted for their own models.
Establishing the API economy. GPT-3.5 Turbo's combination of low cost and adequate quality spawned an entire ecosystem of applications, startups, and tools built on the OpenAI API. Companies like Jasper, Copy.ai, and countless others built businesses primarily on GPT-3.5 before later upgrading to newer models.
Setting the standard for model pricing. The successive price drops of GPT-3.5 Turbo, from $2.00/M tokens at launch to $0.50/M tokens in January 2024, established user expectations for rapid cost decreases in the LLM market. This pricing pressure influenced the strategies of competitors and contributed to the broader trend of falling inference costs across the industry.
Fine-tuning as a product feature. The availability of GPT-3.5 fine-tuning in August 2023 popularized the concept of customizing foundation models for specific business needs, paving the way for fine-tuning support across the industry.
Industry-wide adoption. By 2024, 92% of Fortune 500 companies were using OpenAI's products, with many having started on GPT-3.5 Turbo. The education sector led adoption, with 56% of universities and 42% of K-12 schools utilizing GPT-based tools. Marketing professionals (77%), consultants (71%), and advertisers (67%) reported using ChatGPT in their work.[15]
Though GPT-3.5 is no longer at the frontier of language model capability, its role in catalyzing the AI revolution of 2023-2024 ensures its place as one of the most consequential AI systems ever deployed.