Inflection 3
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,491 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,491 words
Add missing citations, update stale details, or suggest a clearer explanation.
Inflection 3 is the third-generation family of large language models developed by Inflection AI, announced on October 7, 2024. The release marked the company's first major model launch after the March 2024 departure of co-founders Mustafa Suleyman and Karen Simonyan to lead Microsoft AI, and its strategic pivot from a consumer chatbot company into an enterprise generative AI vendor. The family contains two sibling models: Pi 3.0, the empathetic conversational model that continues to power the Pi (AI chatbot) consumer assistant, and Productivity 3.0, a sister model tuned for instruction following and structured outputs such as JSON.
Both Inflection 3 models share an 8,000-token context window and were launched together with a commercial application programming interface, priced at $2.50 per million input tokens and $10.00 per million output tokens. The release coincided with the debut of Inflection for Enterprise, a packaged AI system built on the Inflection 3 models and powered by Intel Gaudi 3 accelerators running in Intel Tiber AI Cloud, with an on-premises appliance scheduled for early 2025. The launch was the first major product milestone delivered under chief executive Sean White, who had assumed the role in March 2024 after the Microsoft transition.
Inflection AI was founded in 2022 in Palo Alto, California, as a public benefit corporation by DeepMind co-founder Mustafa Suleyman, DeepMind chief scientist Karen Simonyan, and LinkedIn co-founder Reid Hoffman. The startup's mission was to build a class of AI it called "personal intelligence," delivered through a consumer companion named Pi. Pi was designed to function less like a search-style chatbot and more like a kind, supportive listener, with deliberate emphasis on emotional intelligence, conversational warmth, and safety guardrails.
The company became one of the most heavily funded private artificial intelligence labs of the era. In June 2023 it closed a $1.3 billion funding round backed by Microsoft, Nvidia, Reid Hoffman, Bill Gates, and Eric Schmidt, valuing the company at roughly $4 billion. Inflection AI used that capital to build a 22,000-GPU H100 supercomputer in partnership with CoreWeave and Microsoft Azure, and to train successively larger Pi-powering foundation models.
The first generation, Inflection-1, was released in June 2023 and was reported by the company to be competitive with GPT-3.5 on internal benchmarks. The second generation, Inflection-2, arrived in November 2023 and brought Pi closer to the performance of GPT-4 on language and reasoning benchmarks. In March 2024 Inflection released Inflection-2.5, which the company stated reached approximately 94% of GPT-4's performance on a battery of academic benchmarks. On the MMLU benchmark, Inflection-2.5 scored 85.5, compared to 87.3 for the contemporary GPT-4 model. Inflection-2.5 was trained with roughly 40% of the FLOPs that OpenAI was understood to have used for GPT-4, a claim the company highlighted as evidence that smaller, more efficient labs could approach frontier capability.
Pi launched in May 2023 as a free chatbot accessible through a web app, iOS app, WhatsApp, Messenger, Instagram, and short message service. The product was distinguished by its conversational tone, its tendency to ask follow-up questions rather than dump information, and its refusal to indulge confrontational or manipulative interactions. By March 2024, Inflection AI reported that Pi had reached approximately six million monthly active users. Pi was free and consumer-only at that stage; Inflection had not yet introduced an application programming interface or any priced enterprise tier.
On March 19, 2024, Mustafa Suleyman and Karen Simonyan announced their departure from Inflection AI to launch Microsoft AI, a new consumer AI division reporting to Satya Nadella. Suleyman became executive vice president and chief executive of Microsoft AI, with responsibility for Copilot, Bing, and Edge. Simonyan joined as chief scientist of the new division. Most of Inflection AI's 70-person workforce moved with them, an outcome the press described as an "acqui-hire" while leaving the underlying corporate entity intact.
In parallel, Microsoft entered into a licensing agreement reportedly worth $650 million for non-exclusive rights to Inflection AI's trained models and intellectual property. Reid Hoffman, who remained on Inflection's board, said the proceeds would be used to give every Inflection investor a positive return. The arrangement drew scrutiny from antitrust regulators in the United States, the United Kingdom, and the European Union, who examined whether the structure was, in effect, an acquisition. Investigators in each jurisdiction eventually closed their reviews without ordering remedies.
With the founding team gone, Inflection AI promoted board member Sean White, a former chief research and development officer at Mozilla and co-founder of Braingels, to chief executive. White inherited a company that had lost most of its research staff but retained its trained models, its hardware contracts, its corporate structure, and a significant remaining cash balance from the licensing payment. White, with chief operating officer Ted Shelton, announced that Pi would continue to operate while Inflection rebuilt itself as a vendor of enterprise AI systems and developer infrastructure. As an interim measure the company moved Inflection-2.5 hosting onto Microsoft Azure, freeing capital from the CoreWeave compute commitment.
Inflection 3 was the first model release built end-to-end under the post-transition leadership. Rather than a single flagship model, Inflection 3 is a family of two task-specialized siblings sharing a common base. Both models were trained to the same context length and both are exposed through the same application programming interface, but they differ in instruction-tuning data, behavioral defaults, and target use case.
Pi 3.0 is the continuation of the consumer Pi product line and the empathetic conversational model in the family. According to the company, Pi 3.0 was trained to preserve and extend the qualities that distinguished earlier Pi releases: a persistent personality, the willingness to ask follow-up questions, sensitivity to emotional cues, and refusal patterns tuned to avoid both unsafe content and abrupt, transactional answers. Inflection AI describes the model's distinguishing strengths as "backstory, emotional intelligence, productivity, and safety," and recommends it for scenarios such as customer support, employee assistance, coaching conversations, and any environment where rapport with a human user is part of the value proposition.
Pi 3.0 continued to power the public-facing Pi chatbot at pi.ai and across the messaging surfaces that had hosted earlier generations of Pi. Existing Pi users were transitioned automatically and at no cost. Inflection also positioned Pi 3.0 as the recommended model for any enterprise application that requires tonal control, such as healthcare patient-facing chat, mental wellness products, or branded customer service.
Productivity 3.0 is the new sibling, introduced for the first time in the Inflection 3 generation. It is positioned as the model to use when literal, structured, instruction-following behavior matters more than warmth. Inflection describes Productivity 3.0 as "optimized for following instructions," specifically calling out tasks that require JSON output, strict adherence to provided guidelines, retrieval-augmented question answering over enterprise documents, classification, and agentic tool calls. The model is the company's first model explicitly designed for use as the language layer inside larger automated workflows, rather than as a chat partner.
Productivity 3.0 was created in direct response to feedback from enterprise pilot customers who told Inflection that Pi's stylistic flourishes, including its tendency to elaborate, sometimes interfered with downstream systems expecting deterministic outputs. By splitting the family into Pi 3.0 for empathetic conversation and Productivity 3.0 for structured execution, Inflection AI allowed customers to mix and match within the same product. A retail company, for example, could use Pi 3.0 for the customer-facing chat interface and Productivity 3.0 for the underlying order lookup, refund classification, and policy retrieval steps.
The two models share the same context window and the same commercial pricing. Inflection AI did not publish the parameter counts, training token totals, or architectural details for either model, in line with the company's practice for the Inflection-1 and Inflection-2 generations. Inflection has stated that both models in the family share a common pretraining base and diverge only at the fine-tuning and reinforcement-learning stage.
| Specification | Pi 3.0 | Productivity 3.0 |
|---|---|---|
| Release date | October 7, 2024 | October 7, 2024 |
| Family | Inflection 3 | Inflection 3 |
| Context window | 8,000 tokens | 8,000 tokens |
| Primary tuning | Empathetic conversation, backstory, safety | Instruction following, structured output |
| Recommended use | Customer support, employee assistance, coaching, wellness | JSON output, classification, retrieval, agents, RPA |
| Input price | $2.50 per million tokens | $2.50 per million tokens |
| Output price | $10.00 per million tokens | $10.00 per million tokens |
| Consumer product | Powers Pi at pi.ai | Not exposed to consumers |
| Hardware backend | Intel Gaudi 3 (Tiber AI Cloud) | Intel Gaudi 3 (Tiber AI Cloud) |
| Parameter count | Not disclosed | Not disclosed |
| Modalities | Text in, text out | Text in, text out |
The Inflection 3 application programming interface was made available at developers.inflection.ai with OpenAI-compatible request and response schemas. Inflection AI also distributed the models through OpenRouter and via a UiPath marketplace component for robotic process automation workflows. Pi 3.0 continued to support voice conversation through the Pi consumer app, although the voice runtime was not exposed through the commercial application programming interface at the time of launch.
Inflection AI did not publish standardized benchmark scores for either Inflection 3 model. The company argued that benchmarks such as MMLU had been saturated by frontier laboratories and were no longer informative for enterprise selection. Instead Inflection promoted the family using customer pilot results and side-by-side response samples, particularly for empathy-sensitive conversational tasks where Pi 3.0 was claimed to outperform comparable transactional chatbots. The lack of public benchmark data drew critical commentary from outlets including The Register and IEEE Spectrum, which observed that the company was asking enterprise customers to trust internal evidence in place of independent measurement.
Sean White became chief executive of Inflection AI in March 2024. White had previously served as chief research and development officer at Mozilla, where he led the Firefox browser organization, and as a senior research executive at Nokia. He brought to Inflection a background in privacy-oriented consumer technology and academic research, rather than the deep learning research pedigree associated with Suleyman and Simonyan. Ted Shelton, an enterprise software veteran, joined as chief operating officer to lead the commercial and sales rebuild.
Under White, Inflection AI made three strategic decisions that shaped Inflection 3. First, the company committed to a dual-track product strategy in which Pi would continue as a free consumer assistant while the bulk of new revenue would come from enterprise contracts. Second, Inflection AI invested in a packaged enterprise system rather than just exposing the underlying models as a developer interface; Inflection for Enterprise was sold as a managed solution with reference architecture, integration partners, and a dedicated customer success function. Third, Inflection AI signed a hardware exclusivity with Intel Gaudi 3 rather than continuing on Nvidia H100 capacity from CoreWeave, a choice intended to reduce unit economics and to differentiate Inflection from the dozen other API vendors competing on Nvidia silicon.
In interviews around the Inflection 3 launch, White described the company's new posture as "empathetic enterprise AI," a phrase Inflection began to use in marketing material. He argued that Pi's heritage in emotional intelligence was a defensible product moat in business settings such as call centers, claims processing, internal employee help desks, and patient triage, where the Fortune 500 had been reluctant to deploy more transactional large language models because of brand risk.
Inflection 3 was launched together with Inflection for Enterprise, the company's first packaged business product. Inflection for Enterprise bundles the Inflection 3 models with a delivery stack the company calls a virtual coworker. The system is trained on the customer's own data, policies, and tone-of-voice guidelines, with the goal of producing an assistant that mirrors the company's culture rather than reading like a generic chatbot. Customer fine-tuning uses what Inflection calls "reinforcement learning from employee feedback," in which workers at the customer rate model responses and those signals are used to align the deployed instance.
The enterprise offering targets three concerns Inflection identified in pilot customers: data control, regulatory compliance, and total cost of ownership. Inflection for Enterprise can be deployed in three configurations: hosted in Intel Tiber AI Cloud, hosted in a customer's own cloud account, or installed on premises as an Intel Gaudi 3 appliance. The on-premises option was announced for delivery in the first quarter of 2025. By offering on-premises hosting, Inflection AI was able to compete in regulated industries that prohibit sending sensitive prompts to a third-party application programming interface, including parts of healthcare, financial services, and government.
On October 22, 2024, two weeks after the Inflection 3 launch, UiPath and Inflection AI announced a strategic partnership aimed at security-focused industries. The partnership produced an Inflection.Pi component on the UiPath Marketplace that allowed UiPath robotic process automation workflows to call Inflection 3 models. The partnership was positioned as an alternative to OpenAI-based automation flows for customers concerned about model provenance and data residency.
| Model | Vendor | Input ($ per 1M tokens) | Output ($ per 1M tokens) | Context |
|---|---|---|---|---|
| Inflection 3 Pi | Inflection AI | 2.50 | 10.00 | 8K |
| Inflection 3 Productivity | Inflection AI | 2.50 | 10.00 | 8K |
| Claude 3.5 Sonnet | Anthropic | 3.00 | 15.00 | 200K |
| GPT-4o | OpenAI | 2.50 | 10.00 | 128K |
| GPT-4o mini | OpenAI | 0.15 | 0.60 | 128K |
At launch, Inflection 3's API pricing matched the headline rate of GPT-4o, a deliberate positioning that allowed Inflection AI to compete on hardware and deployment flexibility rather than per-token cost. The shorter 8K context window was the main technical drawback of the family compared to contemporary frontier models from Anthropic and OpenAI.
The single largest infrastructure decision behind Inflection 3 was the switch from Nvidia H100 GPUs to Intel Gaudi 3 accelerators. The Gaudi 3 partnership was announced jointly by Inflection AI chief executive Sean White and Intel chief executive Pat Gelsinger on October 7, 2024, the same day Inflection 3 launched. Inflection 3 instances are served from Intel Tiber AI Cloud, Intel's managed AI cloud, and the same Gaudi 3 stack underpins the on-premises Inflection for Enterprise appliance scheduled for Q1 2025.
Intel's Gaudi 3 accelerator is a 128 GB high-bandwidth-memory part with 3.7 terabytes per second of memory bandwidth and roughly 1,835 dense BF16 teraFLOPS of compute. A reference Gaudi 3 server with eight accelerators was priced at approximately $125,000 at the time of launch, which Intel claimed was about two-thirds the list price of an equivalent Nvidia H100 system. White said publicly that the partnership produced "up to 2x improved price performance" for Inflection 3 workloads compared with competitive offerings, although the company did not publish the specific workloads behind that figure.
For Intel, the Inflection 3 partnership was significant beyond its dollar value, because it represented one of the first large-scale generative AI deployments that used Gaudi 3 instead of Nvidia silicon. The other notable Gaudi 3 customer announced in 2024 was IBM Cloud, with availability slated for early 2025. The Inflection AI win supported Intel's broader narrative that the AI accelerator market was not destined to be a one-vendor segment.
The Gaudi 3 stack is reserved for Inflection 3 and Inflection for Enterprise. Inflection AI's previous flagship, Inflection-2.5, was migrated to Microsoft Azure as part of the March 2024 reorganization. Older Pi conversation history and the Inflection-2.5 endpoint continued to live on Azure during the Inflection 3 rollout, with Pi 3.0 progressively taking over as the default model for new user sessions on pi.ai.
The Inflection 3 launch was covered as a recovery story rather than a frontier model release. Coverage in Maginative, The Register, HPCwire, and Fortune described the announcement as the clearest signal yet that Inflection AI had reconstituted itself as a viable independent company after the founders' departure. The Register highlighted the move away from Nvidia as the more newsworthy element, framing the Gaudi 3 partnership as a vote of confidence in Intel's AI accelerator roadmap from a credible buyer.
Reaction from the developer community was mixed. Reviewers noted that the 8K context window was small compared with the 128K and 200K context windows offered by GPT-4o and Claude 3.5 Sonnet, limiting Inflection 3's usefulness for long-document retrieval-augmented generation. The absence of public benchmark scores attracted criticism. IEEE Spectrum published a retrospective in late 2024 titled "What Happened to this $4 Billion Chatbot?" that questioned whether the empathetic-AI niche was large enough to support a standalone foundation model company.
Enterprise commentators were more receptive. The choice to ship an on-premises appliance, the UiPath partnership, and the willingness to fine-tune on customer-specific data sets were called out as differentiators in industry analyst notes covering the launch. Inflection AI did not publicly disclose its initial enterprise customer count, but executives told Fortune in December 2024 that the company had signed early pilots in financial services, healthcare, and the public sector.
At the time of the Inflection 3 launch the enterprise generative AI market was crowded with task-specialized model families. The closest peers were Claude 3.5 Sonnet from Anthropic, which was positioned as a high-quality reasoning model for business analysis, and GPT-4o from OpenAI, which was positioned as a general purpose flagship. Inflection 3's distinctive sales pitch was empathy plus deployment flexibility, not raw capability.
| Dimension | Inflection 3 | Claude 3.5 Sonnet | GPT-4o |
|---|---|---|---|
| Vendor | Inflection AI | Anthropic | OpenAI |
| Release | October 2024 | June 2024 (updated October 2024) | May 2024 |
| Models in family | Pi 3.0, Productivity 3.0 | Sonnet | GPT-4o, GPT-4o mini |
| Context window | 8K | 200K | 128K |
| Multimodal | Text only | Text, vision | Text, vision, audio |
| Hardware backend | Intel Gaudi 3 | Mixed (AWS Trainium, Nvidia) | Nvidia |
| Deployment | Cloud, on-prem appliance | API only | API only |
| Empathy specialization | Yes, primary | No, general purpose | No, general purpose |
The table illustrates the tradeoffs Inflection AI made. The company gave up the long-context advantage, multimodality, and broad benchmark leadership in exchange for an emotionally tuned base model, an enterprise-friendly deployment story, and lower hardware unit costs.