Azure OpenAI Service

AI Tools & Products Artificial Intelligence

39 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

22 citations

Revision

v8 · 7,849 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Azure OpenAI Service is Microsoft's enterprise cloud offering that delivers OpenAI's frontier large language models and generative artificial intelligence through the Microsoft Azure platform, pairing models such as GPT-5, GPT-4o, the o-series reasoning models, DALL-E, and Whisper with Azure's enterprise-grade security, compliance, data residency, and data-privacy guarantees. It reached general availability on January 16, 2023, and has become the dominant enterprise channel for production OpenAI workloads ^[1]. The single most important distinction from the direct OpenAI API is data handling: Microsoft guarantees that customer prompts and completions are not sent to OpenAI and are never used to train OpenAI models ^[5].

Since late 2024 the service has been progressively folded into Azure AI Foundry, Microsoft's broader AI development platform. The product is sometimes referred to as Microsoft Foundry Models or Azure OpenAI in Azure AI Foundry, but the underlying API surface, billing, and resource type remain the same. Customers continue to provision Azure OpenAI resources from the Azure portal, and the older *.openai.azure.com endpoints still work alongside newer Foundry endpoints. In practice, most documentation now treats Azure OpenAI Service as the OpenAI-branded slice of Foundry rather than a standalone product.

History and development

The roots of Azure OpenAI Service trace back to Microsoft's initial partnership with OpenAI, which began in 2019 with a $1 billion investment. Microsoft first debuted Azure OpenAI Service in limited preview in November 2021, allowing select customers to access large-scale generative AI models backed by Azure's cloud infrastructure ^[2]. At the time, the available models were GPT-3 and the original Codex.

On January 16, 2023, Microsoft announced the general availability of Azure OpenAI Service, expanding access to GPT-3.5, Codex, and DALL-E 2 with enterprise features like content filtering, role-based access control, and regional deployment options ^[1]. Eric Boyd, then corporate vice president of AI Platform at Microsoft, framed the launch around enterprise reach, writing that with general availability "more businesses can apply for access to the most advanced AI models in the world" ^[1]. Shortly after, ChatGPT was integrated as a fine-tuned version of GPT-3.5 running on Azure AI infrastructure, and on March 9, 2023 Microsoft made GPT-4 available in preview through Azure OpenAI, only days after OpenAI's own launch.

Microsoft's total investment in OpenAI has reached approximately $13 billion across multiple funding rounds ^[3]. Following a restructuring of the partnership in late 2025, Microsoft holds an investment in OpenAI Group PBC valued at approximately $135 billion, representing roughly 27 percent on an as-converted diluted basis ^[3]^[19]. Under the terms of the partnership, Azure remains the exclusive cloud provider for stateless OpenAI APIs, OpenAI committed to purchasing an incremental $250 billion in Azure services over the life of the agreement, and Microsoft's IP rights to OpenAI models and products were extended through 2032 ^[3]^[19].

In 2024 and 2025, Microsoft rebranded its broader AI development platform as Azure AI Foundry (formerly Azure AI Studio), positioning Azure OpenAI Service as a core component within a larger ecosystem of AI tools and services. The Foundry rollout was gradual: in November 2024 the company introduced "Azure AI Foundry" as the umbrella name at Microsoft Ignite, in early 2025 the Azure AI Studio web portal was redirected to ai.azure.com under the Foundry brand, and through 2025 the OpenAI-specific resource type was renamed inside the portal to "Azure OpenAI in Azure AI Foundry." Older API paths and SDK clients still referencing azure-ai-openai and *.openai.azure.com continue to function, which has caused some confusion about whether the brand has truly changed.

Stargate and infrastructure expansion

On January 21, 2025, OpenAI, SoftBank, Oracle, and MGX announced the Stargate Project at the White House, a venture intending to deploy $100 billion immediately and up to $500 billion over four years on AI compute capacity for OpenAI inside the United States ^[16]. Microsoft was named as one of the initial technology partners alongside Arm, NVIDIA, and Oracle, but it was not a co-investor in the venture and instead retained a right of first refusal on new OpenAI capacity orders ^[16]. OpenAI's announcement stated that the project "builds on the existing OpenAI partnership with Microsoft" and that "OpenAI will continue to increase its consumption of Azure" ^[16]. In the October 2025 partnership rework, that right of first refusal was relaxed, formally allowing OpenAI to contract directly with other cloud providers for training capacity while keeping Azure as the exclusive home for stateless OpenAI APIs ^[3].

The practical effect is that Azure data centers continue to host most of the inference traffic for hosted OpenAI models, including the workloads served through Azure OpenAI Service. Microsoft has been expanding capacity in step with that demand. Through 2025 and into early 2026, the company opened or expanded data center campuses in Wisconsin, Georgia, Iowa, Texas, Arizona, Wyoming, Spain, Sweden, Italy, and the United Arab Emirates, with several sites built out specifically to host AI accelerators for OpenAI workloads. In its January 2026 earnings, Microsoft reported quarterly capital expenditure above $37 billion, almost all of it on AI-capable infrastructure ^[13].

Available models

Azure OpenAI Service provides access to OpenAI's full range of production models, though new model releases may appear on OpenAI's direct API slightly before they become available on Azure. The lag is typically days to a few weeks for major launches, and Azure availability often starts in a small set of regions before expanding. Some preview models such as GPT-5 Pro and Sora reached Azure within hours of OpenAI's announcement; others such as o1-pro took longer because of capacity constraints.

Model family	Specific models	Key capabilities
GPT-5 series	GPT-5, GPT-5 Pro, GPT-5-mini, GPT-5-nano, GPT-5 chat	Frontier reasoning, generation, multimodal
GPT-4.5	GPT-4.5	High-quality conversational reasoning
GPT-4.1 series	GPT-4.1, GPT-4.1 mini, GPT-4.1 nano	1M token context, general purpose
GPT-4o series	GPT-4o, GPT-4o mini	Multimodal (text, vision, audio)
GPT-4 series	GPT-4 Turbo, GPT-4 32k (legacy)	Long-context text and vision
GPT-3.5 series (legacy)	GPT-3.5 Turbo, GPT-3.5 Turbo Instruct	Cost-efficient chat, slated for retirement
o-series (reasoning)	o1, o1-mini, o3, o3-mini, o4-mini	Advanced reasoning, math, science, code
Codex	codex-mini, codex-1	Coding-tuned variants for the Codex agent
Image generation	DALL-E 3, gpt-image-1	Text-to-image, edits, transparent backgrounds
Speech	Whisper, gpt-4o-transcribe, gpt-4o-mini-transcribe	Transcription and translation
Realtime	gpt-4o-realtime, gpt-realtime	Streaming voice agent stack
Video	Sora, Sora 2	Text-to-video generation, preview
Embeddings	text-embedding-3-large, text-embedding-3-small, ada-002 (legacy)	Text embeddings for search and RAG

When did GPT-5 reach Azure?

GPT-5 became generally available in Azure AI Foundry in 2025 (announced in August 2025, with broad rollout through September), and Microsoft describes it as "the most powerful large language model (LLM) ever released across key benchmarks" ^[4]^[20]. The flagship GPT-5 model exposes a 272,000-token context window for deep analysis and complex automation, while the GPT-5 chat variant supports a 128,000-token context for natural multimodal, multi-turn conversations ^[20]. It pairs frontier reasoning with high-performance generation and cost efficiency, and the GPT-5-mini and GPT-5-nano variants offer progressively lower costs for less demanding workloads ^[4]. Foundry also ships a model router, described by Microsoft as a fine-tuned small language model that "evaluates each prompt and decides the optimal model based on complexity, performance needs, and cost efficiency," which the company says can save up to 60 percent on inferencing cost across the GPT-5 family ^[20]. OpenAI's subsequent GPT-5.2 reached general availability in Microsoft Foundry in December 2025 ^[20]. GPT-4.5, marketed by OpenAI as its largest non-reasoning model when it launched in February 2025, was made available on Azure in March 2025 and remains an option for customers who prefer GPT-4.x style instruction following over the o-series reasoning chain.

The o-series reasoning models on Azure inherit OpenAI's tiered pricing for reasoning effort, exposing a reasoning_effort parameter that callers can set to low, medium, or high to trade latency against quality. The o3 and o4-mini models also support tool use and structured outputs, making them practical for agent workloads when paired with Foundry Agents.

Sora and Sora 2 reached Azure in preview in late 2025 in a small number of regions (initially East US 2 and Sweden Central) and remain limited-access. Realtime API support, including the gpt-4o-realtime preview and the gpt-realtime model that reached GA in late 2025, is available in the Sweden Central, East US 2, and West US regions for low-latency voice streaming.

The legacy GPT-3.5 Turbo and ada-002 models continue to be supported, but Microsoft has begun publishing retirement schedules for them on the Azure model lifecycle page; customers running on these models are encouraged to migrate to GPT-4o mini and text-embedding-3-small as direct replacements ^[14].

Regional deployments

Unlike the direct OpenAI API, where customers do not pick a region, Azure OpenAI requires customers to choose a region when creating an Azure OpenAI resource. Each region is part of an Azure geography (United States, Europe, Asia Pacific, and so on) and supports a subset of models. Microsoft maintains a public model availability matrix that customers consult to find the closest region offering the model and deployment type they want ^[14].

The table below lists the major Azure regions where Azure OpenAI is generally available, grouped by geography. Specific model availability varies by region; flagship models like GPT-5 and the o-series typically light up first in the United States and in Sweden Central before propagating to other geographies.

Geography	Region	Notes
United States	East US, East US 2	Frontier model launches, Sora preview
United States	West US, West US 3	GPT-4o, GPT-4.1, GPT-5 capacity
United States	North Central US, South Central US	Heavy enterprise demand, broad model coverage
United States	Central US, West Central US	Smaller capacity, fine-tuning region for select models
Europe	West Europe (Netherlands)	Broad coverage, EU data residency
Europe	North Europe (Ireland)	Broad coverage, EU data residency
Europe	Sweden Central	Frontier launches in Europe, Sora and Realtime preview
Europe	France Central, Switzerland North	Country-level data residency, financial sector demand
Europe	UK South	UK data residency, public sector
Asia Pacific	Japan East	Most popular APAC region for GPT-4 family
Asia Pacific	Australia East	ANZ data residency, government workloads
Asia Pacific	Korea Central, India South	Localized capacity, smaller model menus
Americas	Canada East	Canadian data residency
Americas	Brazil South	South American capacity
Middle East	UAE North	Sovereign-style availability for the region

The practical implication of regional deployment is that an application calling Azure OpenAI in eastus2 is hitting a different physical fleet than one calling swedencentral, with potentially different model versions, different latencies, and different rate limits. Customers running global products often deploy in two or three regions and route traffic with Azure Front Door or a custom load balancer.

Global deployment types, introduced in 2024, hide some of this complexity by routing requests to whichever Azure region has spare capacity for the chosen model. Global Standard and Global Provisioned deployments behave like a single logical endpoint while still associating data residency with the customer's home region for processing.

Key features

Enterprise security and compliance

Azure OpenAI Service inherits the full security stack of the Azure platform. This includes encryption of data at rest using FIPS 140-2 compliant 256-bit AES encryption, encryption of data in transit, virtual network isolation, and integration with Azure Key Vault for secrets management ^[5]. The service holds certifications for GDPR, HIPAA, ISO 27001, SOC 2, and other major compliance frameworks.

Organizations can deploy models in specific Azure regions to meet data residency requirements. Microsoft also introduced Data Zones in 2025, which give customers more granular control over where their data is processed and stored ^[5]^[6].

Network isolation and private endpoints

For organizations requiring strict network-level security, Azure OpenAI Service supports private endpoints that assign a private IP address within the customer's Azure Virtual Network (VNet). Traffic between applications and Azure OpenAI travels over the Microsoft backbone network and never traverses the public internet. Combined with the option to disable public network access entirely, this creates true network isolation for sensitive workloads ^[9].

Key network security capabilities include:

Feature	Description
Private endpoints	Assign a private IP within the customer's VNet
VNet integration	Keep all traffic on the Microsoft backbone network
Public access control	Option to disable public endpoint entirely
DNS configuration	Private DNS zones for transparent name resolution
Network Security Groups	Granular traffic filtering rules
Customer-managed keys	Optional encryption key control via Azure Key Vault

Managed identity authentication

Microsoft recommends using Microsoft Entra ID (formerly Azure Active Directory) managed identities for authentication instead of API keys. Managed identities eliminate the need to store, manage, and rotate secrets, reducing the attack surface for credential-based compromises. As of April 2025, permissions for managed identities must be assigned manually rather than being auto-granted, giving organizations more explicit control over access ^[9].

Entra ID authentication is the preferred path for production deployments because it ties API access to the same identity primitives that govern the rest of Azure: user accounts, service principals, role assignments, conditional access policies, and Privileged Identity Management. API keys remain available for backward compatibility and for use cases like quick prototyping, but Microsoft has been making them optional rather than default in Foundry resources.

Compliance certifications

The full list of certifications that apply to Azure OpenAI Service is broad, in part because the service inherits most of them from the Azure platform itself. The table below summarizes the certifications most often relevant to customers in regulated industries ^[17].

Standard or program	Status	Notes
SOC 1, SOC 2, SOC 3	In scope	Annual independent audit reports available via the Service Trust Portal
ISO 27001, 27017, 27018, 27701	In scope	Information security and privacy management systems
ISO 22301, 9001, 20000-1	In scope	Continuity, quality, and IT service management
HIPAA Business Associate Agreement	Available	Required for protected health information workloads
HITRUST CSF	In scope	Healthcare-focused control framework
PCI DSS	In scope	Payment card workloads with documented controls
FedRAMP High	In scope (Azure Government)	Federal civilian and DoD IL2 workloads
DoD Impact Level 4, IL5, IL6	Varies by region	DoD environments through Azure Government and Azure Government Secret
IRS 1075	In scope	Federal tax information
CJIS	In scope	Criminal justice information
GDPR, EU-US Data Privacy Framework	In scope	EU data protection
EU sovereign cloud (Microsoft Cloud for Sovereignty)	Available	Customer-controlled keys and operator restrictions
C5 (Germany), IRAP (Australia), CCCS (Canada)	Region-specific	Country compliance frameworks

Customer-managed keys (CMK) are supported through Azure Key Vault, which means a customer can revoke access to its own data by disabling the key. Audit logs are written to Azure Monitor and can be exported to a customer-controlled storage account or SIEM. Microsoft also publishes a quarterly transparency report on abuse monitoring and law enforcement requests for Azure OpenAI.

What data privacy guarantees does Azure OpenAI provide?

One of the most important distinctions between Azure OpenAI Service and the direct OpenAI API is data handling. Microsoft makes several explicit guarantees ^[5]:

Customer prompts and completions are not sent to OpenAI (the company)
Customer data is not used to train, retrain, or improve OpenAI models
Customer data is not used to improve Microsoft products or services without explicit permission
Customer data is not available to other Azure customers

These commitments make Azure OpenAI Service attractive to regulated industries, including healthcare, financial services, and government agencies, where data handling policies are strictly governed. It is worth noting that in March 2025, Microsoft updated its terms to disclose that fine-tuning operations may involve temporary data relocation outside the selected geography ^[5]. Customers who require strict in-region processing should verify the regional support matrix for fine-tuning before relying on it for sensitive data.

Content filtering and responsible AI

Azure OpenAI Service includes a built-in content filtering system that runs alongside model inference. The system evaluates both inputs and outputs for categories including hate speech, violence, sexual content, and self-harm. Filters can be configured to different severity levels, and organizations can request modified content filtering configurations through Microsoft for specific use cases ^[6].

The broader responsible AI framework includes abuse monitoring, which detects patterns of misuse, and Jailbreak detection, which identifies attempts to circumvent model safety instructions. Abuse monitoring writes a separate stream of metadata that Microsoft retains for up to 30 days; customers approved through the Limited Access Review for sensitive workloads (for example, journalism, research on hate speech, or law enforcement) can opt out of human review of abuse monitoring data.

Content filter configuration details

The content filtering system uses an ensemble of multi-class classification models to detect harmful content across four primary categories, each evaluated at four severity levels: safe, low, medium, and high. The default configuration filters at the medium severity threshold for both prompts and completions ^[10].

Filter category	Description	Default threshold
Hate and fairness	Content promoting discrimination or stereotypes	Medium
Sexual	Sexually explicit or suggestive content	Medium
Violence	Content depicting or promoting violence	Medium
Self-harm	Content related to self-harm or suicide	Medium

Beyond the four core categories, Azure OpenAI provides additional specialized filters ^[10]:

Special filter	Function
Prompt attack / jailbreak detection	Detects and blocks attempts to bypass content moderation
Protected material (text)	Identifies known copyrighted text (song lyrics, articles, recipes)
Protected material (code)	Detects source code matching public repositories
Groundedness detection	Verifies responses are grounded in provided source material
PII detection	Identifies personally identifiable information in inputs and outputs

All customers can create custom content filtering policies tailored to their use cases, with settings configurable separately for prompts and completions. Customers approved for modified content filtering through Microsoft's Limited Access Review process can turn filters off entirely for specific scenarios ^[10].

The filtering layer is one of the more visible behavioral differences between Azure OpenAI Service and the direct OpenAI API. Azure's filters are typically stricter and more configurable, while OpenAI's direct API relies on the moderation endpoint and the model's own refusal training. Developers porting code from OpenAI to Azure sometimes see prompts that worked on api.openai.com get blocked on Azure, particularly for classification or red-teaming use cases that touch policy categories.

Deployment options

Unlike the direct OpenAI API, where models are accessed through a shared endpoint, Azure OpenAI Service requires customers to create explicit deployments of specific model versions in their Azure subscription. This approach gives teams control over which model version their application uses and when to upgrade, avoiding unexpected behavior changes from automatic model updates.

Azure OpenAI Service offers several deployment types, each optimized for different workload characteristics ^[11]:

Deployment type	Description	Billing	Best for
Global Standard	Shared infrastructure across Azure global network	Per-token (pay-as-you-go)	Variable workloads, experimentation
Standard (Regional)	Shared infrastructure within a specific Azure region	Per-token (pay-as-you-go)	Data residency requirements
Data Zone Standard	Pay-as-you-go within EU or US data zones	Per-token	Multi-region pooled traffic with regulated zone
Global Provisioned (Provisioned-Managed)	Dedicated compute across Azure global network	Hourly (per PTU)	High-volume, latency-sensitive production
Regional Provisioned	Dedicated compute within a specific region	Hourly (per PTU)	Data residency plus guaranteed throughput
Data Zone Provisioned	Dedicated compute within EU or US data zones	Hourly (per PTU)	Regulatory compliance with throughput guarantees
Global Batch	Async processing across global network	Per-token (50% discount)	Large-scale, non-time-sensitive workloads
Provisioned-Managed Spillover	Spillover from PTU to standard during traffic spikes	Hybrid	Reservation-backed plus burst capacity

The Standard deployment types use shared infrastructure with pay-per-token pricing, making them suitable for variable or experimental workloads. Provisioned deployments use dedicated compute with reserved throughput, offering lower and more predictable latency than shared deployments. Global Batch is a separate path entirely; it bundles requests into a 24-hour processing window for a 50 percent discount and is the Azure equivalent of OpenAI's Batch API.

Provisioned throughput units (PTUs)

Provisioned Throughput Units are model-agnostic quota units that organizations allocate to deployments for guaranteed processing capacity. PTUs abstract away the underlying compute, allowing customers to allocate capacity across different models and rebalance as needs change ^[11].

Key characteristics of PTU deployments include:

Minimum deployment sizes vary by model. As of early 2026, GPT-5 Global Provisioned starts at 15 PTUs, while smaller models such as GPT-4o mini start at 25 PTUs.
Billing is hourly, with discounts available through monthly or annual Azure reservations.
Lower and more consistent latency compared to Standard deployments.
Capacity is guaranteed regardless of overall service demand.
PTUs can be reallocated between models within the same deployment type.

The internal scaling factor for a PTU varies by model and version: a single GPT-5 PTU yields a different tokens-per-minute throughput than a single GPT-4o PTU. Microsoft publishes a capacity calculator inside the Azure portal that estimates how many PTUs a workload will need based on observed input and output token rates.

As of early 2026, PTU provisioned throughput starts at approximately $2,448 per month for the smallest valid deployment of a flagship model. Organizations whose pay-as-you-go token costs exceed roughly $1,800 per month typically benefit from committing to a PTU reservation ^[11]. The break-even point depends heavily on the input-to-output ratio: workloads that produce long completions tend to favor PTUs earlier than workloads dominated by retrieval-augmented short answers.

Provisioned-Managed Spillover, introduced in 2025, lets customers absorb traffic spikes by overflowing from PTU capacity onto pay-as-you-go for the same model. The feature reduces the cost of overprovisioning, because operators can size PTU reservations for steady-state load instead of peak load. Spillover requests are billed at the standard per-token rate.

Fine-tuning

Azure OpenAI Service supports fine-tuning of GPT-4o, GPT-4o mini, GPT-4.1, GPT-3.5 Turbo, and selected o-series models in a subset of regions. Fine-tuning allows organizations to adapt a model's behavior using their own labeled training data, improving performance on domain-specific tasks while retaining the model's general capabilities. Fine-tuned models are hosted within the customer's Azure subscription and are not shared with other tenants.

The fine-tuning process in Azure follows several steps:

Data preparation: prepare a training dataset of prompt-completion pairs in JSONL format.
Upload: upload the training file to Azure OpenAI.
Job creation: create a fine-tuning job specifying the base model, training file, and hyperparameters.
Training: Azure trains the model on dedicated compute. Training data does not leave the customer's region with the exception noted earlier for certain geographies.
Deployment: deploy the fine-tuned model to a dedicated endpoint.
Inference: use the fine-tuned model through the same API as base models.

Three fine-tuning techniques are supported. Supervised fine-tuning is the default and uses labeled prompt-completion pairs. Direct Preference Optimization (DPO) trains the model from pairs of preferred and rejected outputs. Reinforcement fine-tuning, in preview for o-series models, lets customers shape the model's reasoning chain with a reward model. The DPO and reinforcement options arrived on Azure several months after the OpenAI direct API.

Fine-tuning is charged based on training compute hours and hosting costs for the resulting model. Hosting costs apply per hour, separately from training, which means a fine-tuned model continues to incur charges as long as it is deployed even if no inference traffic flows. Training data is retained for 30 days after job completion for debugging purposes, then deleted ^[5].

Assistants API and migration to Foundry Agents

The Azure OpenAI Assistants API provided capabilities for building conversational AI agents with persistent threads, file search, code interpretation, and function calling. However, Microsoft announced in 2025 that the Assistants API is deprecated and will be retired on August 26, 2026 ^[12]. Microsoft recommends migration to the Microsoft Foundry Agents service, which provides the same capabilities with deeper integration into the Azure AI Foundry ecosystem.

Foundry Agents offers enhanced features including multi-agent orchestration, improved tool integration, and native support for the evolving agentic AI paradigm that the industry has adopted since 2025. The migration story for callers using the OpenAI assistants beta is straightforward in spirit but mechanical in practice: the conceptual model maps cleanly but identifiers and SDKs change. In the new service, classic threads become Conversations, runs become Responses, and assistants become Agent Versions, so most customers schedule a code freeze and a single switchover rather than running both APIs in parallel ^[12]^[22]. The August 26, 2026 cutover is firm: after that date, calls to the legacy /v1/assistants, /v1/threads, and /v1/threads/runs endpoints return an error with no degraded mode or grace period ^[22].

Azure AI Foundry integration

Azure OpenAI Service is now a core component of Azure AI Foundry (the successor to Azure AI Studio), which provides a unified development environment for building AI applications. Through Foundry, developers can combine OpenAI models with other Azure services like Azure AI Search (for retrieval-augmented generation), Azure Cosmos DB, Power BI, and Azure Data Lake for end-to-end AI solutions ^[4].

Azure AI Foundry extends the platform beyond OpenAI models to include open-weight models from Meta (Llama), Mistral, Cohere, DeepSeek, and others through the Model Catalog. This positions Azure as an increasingly model-agnostic platform, though OpenAI models remain its primary differentiator. Some non-OpenAI models are offered through a Models-as-a-Service (MaaS) billing path that uses the same per-token meter as Azure OpenAI but bills the third party through Microsoft.

Key Foundry capabilities include:

Prompt flow: visual tool for designing, testing, and deploying prompt-based workflows.
Model Catalog: access to hundreds of models from multiple providers.
Evaluation tools: built-in model evaluation and comparison capabilities.
AI Search integration: managed RAG pipeline using Azure AI Search as the retrieval layer.
Model router: a fine-tuned small language model that automatically routes each prompt to the optimal GPT-5-family model, which Microsoft says can cut inferencing cost by up to 60 percent ^[20].
Agent Service: successor to the Assistants API with full agent lifecycle management.

The naming distinction between Azure OpenAI and Azure AI Foundry causes some friction in customer conversations. Azure OpenAI Service is the resource type and the contractual product. Azure AI Foundry is the developer experience layered on top. A customer can use Azure OpenAI without touching Foundry, but most new projects start in the Foundry portal, which provisions Azure OpenAI resources behind the scenes.

SDKs and APIs

Microsoft maintains first-party Azure OpenAI client libraries in .NET, Python, JavaScript and TypeScript, Java, and Go. Each SDK exposes the same chat, completions, embeddings, image generation, transcription, and assistants endpoints, with Azure-specific extensions for managed identity, content filter results, and deployment routing.

Language	Package	Notes
.NET	Azure.AI.OpenAI	Built on the Azure SDK conventions, supports Entra ID and DefaultAzureCredential
Python	azure-ai-openai (formerly openai with `api_type="azure"`)	The newer azure-ai-openai package is distinct from openai-python
JavaScript / TypeScript	@azure/openai	Browser and Node usage, integrates with @azure/identity
Java	com.azure:azure-ai-openai	Uses the Azure Java SDK reactor patterns
Go	github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai	First-class Go client

The Python story is the most confusing for newcomers. The openai-python library shipped Azure support natively for years through the api_type="azure" configuration option, and that path still works. Microsoft also publishes its own azure-ai-openai package that wraps the same endpoints with stricter type hints and Azure-style credential plumbing. Both are supported. Many production codebases use openai-python for portability between Azure and OpenAI direct, while azure-ai-openai is preferred when teams want tighter Azure integration.

The REST API uses Azure-specific base URLs of the form https://{resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version=2025-04-01-preview. The api-version query parameter is the most important difference from the OpenAI direct API: Azure freezes the API surface at specific versions and lets customers pin to a stable preview or GA version rather than tracking the live OpenAI API. As of May 2026, the recommended GA version is 2024-10-21, and the active preview version is 2025-04-01-preview. Microsoft typically supports each API version for at least 12 months after a successor ships.

The Responses API, OpenAI's agent-friendly successor to Chat Completions, reached Azure in preview in mid-2025 and went GA on most regions in early 2026. Customers building new agent workloads are pointed at Responses, while existing Chat Completions code continues to work and is not slated for retirement ^[18].

Pricing

Azure OpenAI Service pricing mirrors OpenAI's API pricing structure closely, with some differences for provisioned capacity and regional differentials. Per-token pricing is published in US dollars per 1 million tokens and is the same across most Azure geographies, although a small number of regions carry a 5 to 10 percent uplift to reflect local infrastructure costs ^[7].

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cached input (per 1M tokens)
GPT-5 Global	$1.25	$10.00	$0.13
GPT-5 Pro Global	$15.00	$120.00	N/A
GPT-5-mini	$0.25	$2.00	N/A
GPT-5-nano	$0.05	$0.40	N/A
GPT-4.1	$2.00	$8.00	$0.50
GPT-4o	$2.50	$10.00	$1.25
GPT-4o mini	$0.15	$0.60	$0.075
o3	$10.00	$40.00	$2.50
o4-mini	$1.10	$4.40	$0.275
GPT-3.5 Turbo (legacy)	$0.50	$1.50	N/A

Prompt caching, called "context caching" in some Azure docs, applies automatically to most chat models when prompt prefixes match recent traffic. The discount is typically 50 to 90 percent depending on model. Global Batch deployments earn a flat 50 percent discount on top of the per-token rate.

For organizations with predictable, high-volume workloads, Provisioned Throughput Units (PTUs) offer guaranteed capacity with monthly or annual reservations. For example, GPT-5 Global provisioned capacity starts at 15 PTUs minimum, priced at $1 per hour, $260 per month, or $2,652 per year per PTU ^[7].

PTU pricing economics

The decision between pay-as-you-go and provisioned capacity depends on usage volume and latency requirements. The following table summarizes the trade-offs ^[7]^[11]:

Pricing model	Monthly cost range	Best for	Latency
Pay-as-you-go (Standard)	Variable, based on token usage	Unpredictable or low-volume workloads	Variable, may spike under load
Pay-as-you-go (Global)	Variable, based on token usage	Multi-region workloads	Routed to lowest-latency region
PTU monthly	From ~$2,448/month	Predictable high-volume workloads	Consistent, low
PTU annual	From ~$2,652/year per PTU	Long-term production deployments	Consistent, low

Organizations should consider PTU reservations when monthly pay-as-you-go costs consistently exceed approximately $1,800, as the dedicated capacity provides both cost savings and performance benefits at that threshold ^[11]. Annual reservations carry an additional discount of roughly 15 percent over the monthly rate.

Reservations are managed in the Azure Reservations portal and can be exchanged or cancelled with prorated refunds, similar to how Azure Compute Reserved Instances work. This makes commitment less risky for customers who are uncertain about long-term volume.

How does Azure OpenAI differ from the direct OpenAI API?

The choice between Azure OpenAI Service and the direct OpenAI API is one of the most common decisions enterprises face when adopting OpenAI models.

Factor	Azure OpenAI Service	Direct OpenAI API
Authentication	API key plus optional Microsoft Entra ID (recommended)	API key only
Data privacy	Data stays in Azure, not sent to OpenAI	Data processed by OpenAI, opt-out available
SLA	99.9% guaranteed uptime	99.82% observed uptime, no formal SLA
Model availability	Slight delay after OpenAI releases, days to weeks	Immediate access to new models
Setup complexity	Requires Azure account and quota request	Simple email signup and credit card
Compliance	HIPAA, SOC 2, ISO 27001, GDPR, FedRAMP High	Limited compliance certifications
Pricing	Same per-token rates plus PTU options and Global Batch	Pay-per-token plus standard Batch API
Ecosystem	Deep Azure and Microsoft 365 integration	Standalone API and ChatGPT-Enterprise links
Version control	Explicit deployment of pinned model versions	Auto-updated model aliases
Network isolation	Private endpoints, VNet integration, sovereign cloud	Shared endpoints
Content filtering	Configurable, multi-category with custom policies	Standard safety system, less configurable
Region selection	Customer chooses deployment region	OpenAI chooses, US-centric
API versioning	Pinned to dated `api-version` query parameter	Live API surface
Fine-tuning availability	GPT-4o, GPT-4o mini, GPT-4.1, GPT-3.5, selected o-series in some regions	Broader model coverage, faster after launch
Realtime API	gpt-realtime in select regions	Available globally, sometimes ahead of Azure
Sora video	Preview in East US 2 and Sweden Central	Available through OpenAI direct earlier

For prototyping and experimentation, the direct OpenAI API offers a faster path to getting started. For production deployments in enterprise environments, Azure OpenAI Service provides stronger guarantees around uptime, data privacy, and compliance ^[8]. A common pattern is to develop against the OpenAI direct API for speed, then port to Azure OpenAI before going to production. The Python and Node SDKs are designed to make this swap mostly mechanical, requiring a configuration change rather than rewriting code.

Notable customers and use cases

Azure OpenAI Service has been adopted broadly across industries. Microsoft regularly publishes customer stories on the Azure blog and at Ignite. The list below is a representative sample rather than an exhaustive catalog.

Customer	Industry	Reported use case
Coca-Cola	Consumer goods	Marketing copy generation, customer support, product innovation
BMW Group	Automotive	Vehicle data analysis, internal knowledge agents, technician support
ICRC (International Committee of the Red Cross)	Nonprofit	Multilingual translation for humanitarian operations
NBA	Sports	Game commentary tools, fan engagement, content production
Volvo Cars	Automotive	Software documentation, customer support, internal copilots
Mercedes-Benz	Automotive	In-car voice assistant pilots and engineering productivity
KPMG	Professional services	Tax research, audit support, internal copilots
Bayer	Pharmaceuticals	Document analysis and research summarization
AT&T	Telecommunications	Customer service summarization and call analytics
Carlsberg Group	Beverages	Commercial planning, marketing, internal copilots
Heineken	Beverages	Sales operations, marketing copy
H&R Block	Financial services	Tax assistance chatbot for filers
Vodafone	Telecommunications	Customer support agent and analytics
Air India	Aviation	Customer service chatbot "Maharaja"
Mediterranean Shipping Company (MSC)	Logistics	Document processing and shipping operations

Microsoft itself is the largest single consumer of Azure OpenAI capacity. GitHub Copilot, Microsoft 365 Copilot, Copilot for Sales, Copilot for Service, the Bing search experience, the Microsoft Copilot consumer assistant, and the security copilot all run on top of Azure OpenAI Service. Their throughput volumes are large enough that Microsoft has historically reserved dedicated capacity in Azure regions before broader customer demand catches up.

In government and defense, the Department of Defense, the National Security Agency, NASA JPL, and several US federal civilian agencies have deployed workloads on Azure OpenAI through the FedRAMP High and Azure Government Secret offerings. UK government, Australian government, and several EU member states have run pilots through their respective sovereign clouds.

Operational considerations

Running Azure OpenAI in production raises a set of operational questions that do not apply on the OpenAI direct API.

Quotas and capacity. Azure OpenAI quotas are per-region and per-deployment, expressed in tokens per minute (TPM) and requests per minute (RPM). Customers request quota increases through the Azure portal, and the response time can range from minutes for small bumps to several days for capacity that requires fleet provisioning ^[15].

Monitoring and observability. Azure Monitor exposes metrics for token consumption, latency percentiles, content filter activations, and rate limit errors. Customers commonly forward these metrics to Application Insights, Log Analytics, or external SIEM systems. The content filter signal is particularly useful for tracking what kinds of prompts users are submitting and how often the filter is firing.

Availability and failover. The service publishes per-region availability in the Azure Service Health dashboard. Customers running multi-region deployments typically pair an eastus2 deployment with a swedencentral or westeurope deployment and use Azure Front Door or a custom router to fail over. Global Standard and Global Provisioned deployments hide some of this complexity but still warrant a backup region for disaster recovery.

Quota management for PTU. PTU reservations sit on top of the Azure quota system and are not transferable across regions. A customer with PTU reservations in eastus cannot fail over to westus3 without a separate reservation. This makes capacity planning a multi-region exercise for PTU customers.

Cost management. Microsoft Cost Management groups Azure OpenAI charges by deployment, model, and resource group, which makes it relatively easy to track spend per application or business unit. Cost anomalies tend to come from runaway agent loops and unbounded max_tokens settings; teams that ship agents on Azure OpenAI typically wire up budget alerts and per-tenant rate limits before going to production.

Differences from OpenAI direct in practice

Beyond the headline differences in privacy and SLA, several smaller behaviors trip up developers porting from openai.com to Azure.

Deployment indirection. On openai.com, callers reference a model by name (gpt-4o). On Azure, callers reference a deployment that points to a specific model and version (gpt-4o-2024-08-06). The deployment indirection is what enables customers to pin model versions, but it means callers must know the deployment name configured by their Azure administrator.

Api-version pinning. Each Azure call carries an api-version query parameter. Calling without one returns an error. Some preview features, such as the Responses API and reasoning effort parameter, are gated behind specific preview versions and are not visible to callers on older versions.

Content filter responses. Azure adds a prompt_filter_results and content_filter_results block to many responses, with severity scores per category. Code that strictly validates against the OpenAI response schema can break on Azure unless it allows extra fields.

Function calling and tool use. Tool use and structured outputs are supported on Azure for GPT-4o, GPT-4.1, the o-series, and GPT-5, but specific tool features (parallel tool calls, strict JSON schema enforcement) sometimes lag the OpenAI direct API by an api-version or two.

Fine-tuning availability. Fine-tuning availability on Azure tracks the OpenAI catalog with a delay. As of May 2026, GPT-5 fine-tuning is not yet available on Azure (it shipped on the OpenAI direct API in late 2025), and reinforcement fine-tuning for o-series models is in preview only on Azure.

Realtime API. The Realtime API on Azure is available in three regions and supports the gpt-realtime model that reached GA in late 2025. WebRTC support, voice activity detection, and built-in safety classifiers are at parity with OpenAI direct.

Relationship to Microsoft's OpenAI investment

Microsoft's partnership with OpenAI is one of the defining business relationships in the AI industry. Beginning with a $1 billion investment in 2019, Microsoft has invested a cumulative total of approximately $13 billion in OpenAI ^[3]. This investment gave Microsoft exclusive rights to serve OpenAI's models through Azure and played a central role in establishing Azure as a leading platform for AI workloads.

In October 2025, the two companies announced a restructured partnership as OpenAI transitioned from a capped-profit structure to a public benefit corporation. Under the new terms, Microsoft holds approximately 27 percent of OpenAI on a diluted basis, with a stake valued at around $135 billion ^[3]^[19]. Azure remains the exclusive cloud provider for stateless OpenAI APIs, OpenAI committed to $250 billion in Azure spending, and Microsoft retains exclusive IP rights to OpenAI models and products until AGI is declared, with those rights extended through 2032 ^[3]^[19]. In Microsoft's own framing, "what began as an investment in a research organization has grown into one of the most successful partnerships in our industry" ^[19].

The partnership has been financially significant for both sides. In Q2 FY2026 (reported January 28, 2026), Microsoft's net income was increased by $7.6 billion (and diluted earnings per share by $1.02) from net gains on its OpenAI investment, against total quarterly revenue of $81.3 billion, up 17 percent year over year ^[13]. Azure and other cloud services grew 39 percent that quarter ^[13]. The investment also carried costs in earlier periods; in Q1 FY2026, Microsoft reported a $3.1 billion drag on net income tied to its OpenAI investment accounting ^[3].

The restructuring also formally allowed OpenAI to contract with other clouds for training capacity. OpenAI subsequently announced a $50 billion compute partnership with Amazon Web Services and a smaller deal with Google Cloud, while keeping Azure as the host for the API customers consume. For Azure OpenAI Service, the restructuring is mostly invisible: the contract terms with Azure customers are unchanged, and the API endpoints and deployment types continue to behave the same way.

Comparison with other hosted-frontier offerings

Azure OpenAI Service competes with a small set of cloud-hosted frontier model offerings. The closest analogues are Amazon Bedrock and Google Cloud Vertex AI, which package frontier models from Anthropic, Google, Mistral, Cohere, and others under similar enterprise terms.

Feature	Azure OpenAI Service	AWS Bedrock	Google Vertex AI
Primary models	OpenAI GPT-5, GPT-4o, o-series	Claude, Llama, Nova, Mistral	Gemini, Claude (selected)
Frontier exclusivity	OpenAI hosted-API exclusive on Azure	Anthropic primary cloud	Gemini exclusive
Authentication	Microsoft Entra ID, API keys	AWS IAM	Google Cloud IAM
Data residency	Region selection plus Data Zones	Region selection	Region selection
Provisioned capacity	Provisioned Throughput Units (PTUs)	Provisioned Throughput	Reserved capacity
Pricing parity with direct API	Tracks OpenAI rates closely	Tracks Anthropic rates closely	Tracks Google rates closely
Compliance	SOC 2, HIPAA, FedRAMP High, ISO 27001, IRS 1075	SOC, HIPAA, FedRAMP	SOC, HIPAA, FedRAMP
Sovereign cloud	Microsoft Cloud for Sovereignty	AWS Top Secret region	Google sovereign clouds (with partners)

The core trade-off is which set of frontier models a customer needs. Customers committed to OpenAI models for product reasons usually pick Azure OpenAI; customers committed to Anthropic Claude usually pick Bedrock; customers committed to Gemini usually pick Vertex. In practice, larger enterprises run all three, with Azure OpenAI as the default for Microsoft-aligned workloads.

Current state (2025-2026)

As of May 2026, Azure OpenAI Service is the dominant enterprise channel for accessing OpenAI models. The general availability of GPT-5 in Azure AI Foundry brought the platform's most powerful model yet, with a 272,000-token context window, alongside the cost-optimized GPT-5-mini and GPT-5-nano variants and the December 2025 arrival of GPT-5.2 ^[20]. The Realtime API for low-latency voice and the Responses API for agent workloads have become the recommended endpoints for new applications, while Chat Completions remains supported for existing code.

Microsoft has expanded the Azure AI Foundry ecosystem to include a broader range of models beyond OpenAI's, including open-weight models from Meta, Mistral, Cohere, DeepSeek, and others, positioning it as an increasingly model-agnostic platform. The integration of AI agents, code interpreters, and tool use capabilities reflects the industry's shift toward agentic AI workflows. The deprecation of the Assistants API in favor of the Foundry Agents service signals Microsoft's commitment to a more integrated, platform-level approach to agent development.

The relationship between Microsoft and OpenAI continues to evolve. Reports in early 2026 suggested tensions over OpenAI's separate cloud deals, including the reported $50 billion agreement with Amazon Web Services, which Microsoft viewed as potentially conflicting with Azure exclusivity provisions ^[3]. Despite these dynamics, Azure OpenAI Service remains central to both companies' strategies and continues to serve as the primary enterprise on-ramp for OpenAI's technology.

Looking forward, the most likely changes through 2026 are wider regional availability for Sora 2 and the Realtime API, broader fine-tuning support for the GPT-5 family, and the gradual phase-out of the GPT-3.5 Turbo deployments in favor of GPT-4o mini and GPT-5-nano. The naming question (Azure OpenAI versus Azure AI Foundry Models) is unlikely to be resolved in the short term, because the legacy resource type continues to ship and the brand-level Foundry rollout has not displaced it.

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

7 revisions by 1 contributors · full history

Suggest edit

Azure OpenAI Service

History and development

Stargate and infrastructure expansion

Available models

When did GPT-5 reach Azure?

Regional deployments

Key features

Enterprise security and compliance

Network isolation and private endpoints

Managed identity authentication

Compliance certifications

What data privacy guarantees does Azure OpenAI provide?

Content filtering and responsible AI

Content filter configuration details

Deployment options

Provisioned throughput units (PTUs)

Fine-tuning

Assistants API and migration to Foundry Agents

Azure AI Foundry integration

SDKs and APIs

Pricing

PTU pricing economics

How does Azure OpenAI differ from the direct OpenAI API?

Notable customers and use cases

Operational considerations

Differences from OpenAI direct in practice

Relationship to Microsoft's OpenAI investment

Comparison with other hosted-frontier offerings

Current state (2025-2026)

References

Improve this article

What links here (24 of 37)

What links here (24 of 37)

History and development

Stargate and infrastructure expansion

Available models

When did GPT-5 reach Azure?

Regional deployments

Key features

Enterprise security and compliance

Network isolation and private endpoints

Managed identity authentication

Compliance certifications

What data privacy guarantees does Azure OpenAI provide?

Content filtering and responsible AI

Content filter configuration details

Deployment options

Provisioned throughput units (PTUs)

Fine-tuning

Assistants API and migration to Foundry Agents

Azure AI Foundry integration

SDKs and APIs

Pricing

PTU pricing economics

How does Azure OpenAI differ from the direct OpenAI API?

Notable customers and use cases

Operational considerations

Differences from OpenAI direct in practice

Relationship to Microsoft's OpenAI investment

Comparison with other hosted-frontier offerings

Current state (2025-2026)

References

Improve this article

Related Articles

Claude Sonnet 4.5

Microsoft 365 Copilot

Model Context Protocol

Apple Intelligence

AI in Healthcare

AI Drug Discovery

What links here (24 of 37)

Related Articles

Claude Sonnet 4.5

Microsoft 365 Copilot

Model Context Protocol

Apple Intelligence

AI in Healthcare

AI Drug Discovery

What links here (24 of 37)