Amazon Bedrock is a fully managed service from Amazon Web Services (AWS) that enables developers and enterprises to build, scale, and deploy generative artificial intelligence applications using foundation models from leading AI providers. Rather than requiring users to manage infrastructure, train models from scratch, or handle complex deployment pipelines, Bedrock provides access to a curated selection of high-performing models through a single, unified API. It reached general availability on September 28, 2023, and has since expanded into one of the most comprehensive AI development platforms in the cloud computing landscape [1].
AWS first previewed Amazon Bedrock in April 2023 at its annual summit, positioning it as the company's answer to growing demand for enterprise-grade generative AI tools. The service became generally available on September 28, 2023, initially launching in three AWS regions: US East (N. Virginia), US West (Oregon), and Asia Pacific (Tokyo) [2]. At launch, the platform offered models from AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon's own Titan family.
Since then, AWS has steadily expanded Bedrock's capabilities and model catalog. In December 2025, AWS announced its largest single expansion, adding 18 fully managed open-weight models to the platform. That same month, AWS introduced the Amazon Nova 2 family of foundation models, which offer advanced reasoning capabilities with competitive price-performance characteristics [3]. By early 2026, Bedrock provides access to nearly 100 serverless models, cementing its position as one of the broadest multi-provider model platforms available.
In October 2025, AWS launched Amazon Bedrock AgentCore, a platform-level service for building, deploying, and operating AI agents at scale without managing infrastructure. AgentCore marked a significant evolution in the platform's capabilities, reflecting the broader industry shift toward agentic AI architectures [9]. By early 2026, Amazon Bedrock powers generative AI for more than 100,000 organizations worldwide, spanning startups to global enterprises across every industry [10].
One of Bedrock's primary differentiators is its multi-provider approach. Instead of locking customers into a single model family, it offers a marketplace of foundation models spanning text generation, image generation, and embeddings.
| Provider | Notable Models | Capabilities |
|---|---|---|
| Anthropic | Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 4 | Text generation, analysis, code, vision |
| Meta | Llama 3.3 (8B, 70B, 405B) | Text generation, code, multilingual |
| Amazon | Titan Text, Titan Embeddings, Amazon Nova 2 Lite, Nova 2 Pro | Text generation, embeddings, reasoning |
| Mistral AI | Mistral Large 3, Magistral Small 1.2, Ministral 3 series, Voxtral | Text generation, code, multilingual, audio |
| Cohere | Command R, Command R+, Embed | Text generation, RAG-optimized, embeddings |
| Stability AI | Stable Diffusion XL, SDXL Turbo | Image generation |
| AI21 Labs | Jamba-Instruct | Text generation, long context |
| DeepSeek | DeepSeek R1 | Reasoning, code |
| Gemma 3 (4B, 12B, 27B) | Text generation, lightweight | |
| NVIDIA | Nemotron Nano 2 series | Text generation, vision |
AWS continues to add new providers and models on a regular basis. The platform also includes Amazon's own Nova 2 family, which was announced in December 2025 and includes Nova 2 Lite for cost-effective everyday workloads and Nova 2 Pro (Preview) for complex, multi-step reasoning tasks [3].
Bedrock prices models on a per-token basis, with costs varying significantly across providers and model sizes. The following table provides representative on-demand pricing for commonly used models as of early 2026.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Claude 3.5 Sonnet | $6.00 | $30.00 | Batch: $3.00 / $15.00 |
| Claude 3.5 Haiku | $1.00 | $5.00 | Fastest Claude model |
| Llama 2 Chat 70B | $1.95 | $2.56 | Open-weight |
| DeepSeek v3.2 | $0.62 | $1.85 | Reasoning-focused |
| Gemma 3 4B | $0.04 | $0.08 | Lightweight, budget |
| Gemma 3 12B | $0.09 | $0.29 | Mid-range |
| Gemma 3 27B | $0.23 | $0.38 | Higher capability |
| Ministral 3B | $0.10 | $0.10 | Budget option |
| Voxtral Mini | $0.04 | $0.04 | Audio processing |
Output tokens typically cost 3 to 5 times more than input tokens across most models, reflecting the higher computational cost of text generation versus processing [7]. Budget-conscious deployments can leverage smaller models like Gemma 3 4B or Ministral 3B at a fraction of the cost of frontier models.
Bedrock provides a unified API that lets developers switch between different foundation models with minimal code changes. This abstraction layer means that teams can experiment with multiple models, benchmark their performance on specific tasks, and select the best fit without rebuilding their application logic. All API calls are made through standard AWS SDKs, making integration straightforward for organizations already using AWS infrastructure.
Retrieval-augmented generation (RAG) is a technique that enhances the accuracy of AI-generated responses by grounding them in external data sources. Bedrock Knowledge Bases is a fully managed capability that handles the entire RAG workflow, from data ingestion and indexing to retrieval and prompt augmentation [4]. Users connect their data sources (Amazon S3 buckets, web crawlers, or databases), select an embeddings model, and Bedrock automatically chunks, indexes, and stores the data in a vector store. When a query comes in, the system retrieves relevant information and passes it to the generation model as context.
According to AWS, RAG with Knowledge Bases can reduce hallucinated outputs by 50 to 90 percent compared to relying solely on a model's parametric knowledge [4]. New documents can be reindexed automatically without retraining the model, and the system supports both structured and unstructured data with metadata filtering.
The Knowledge Bases pipeline operates in two phases. During the ingestion phase, Bedrock accepts documents from connected data sources, splits them into configurable chunk sizes, generates vector embeddings using a selected embeddings model (such as Amazon Titan Embeddings or Cohere Embed), and stores the resulting vectors in a supported vector store. Supported vector stores include Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL, Pinecone, and Redis Enterprise Cloud [4].
During the retrieval phase, when a user submits a query, the system generates a query embedding, performs a semantic similarity search against the vector store, retrieves the most relevant document chunks, augments the prompt with the retrieved context, and passes the augmented prompt to the selected generation model. The system also supports in-built session context management and source attribution, returning citations that trace each part of the response back to its source document [11].
Knowledge Bases can be integrated with Bedrock Guardrails for content filtering and with Bedrock Agents for multi-step task execution that requires grounding in enterprise data.
Bedrock Agents allows developers to create AI-powered assistants that can reason through multi-step tasks, call external APIs, and interact with Knowledge Bases. An agent takes a user request, breaks it down into subtasks, decides which tools or data sources to query, executes those steps, and returns a coherent response. This makes it possible to build applications like customer service bots that can look up order status, process refunds, and answer product questions in a single conversation.
Agents are defined through action groups (the APIs and functions they can call) and can be integrated with Knowledge Bases and Guardrails for controlled, grounded responses.
Bedrock Agents use a managed orchestration architecture where a foundation model serves as the reasoning engine. When a user sends a request, the agent invokes the FM to create a reasoning trace, which determines the sequence of actions required to fulfill the request. At each step, the agent decides whether to call an action group, query a Knowledge Base, or return a response to the user.
Multi-agent collaboration, introduced in 2025, allows developers to build systems where multiple specialized agents work together under the coordination of a supervisor agent. The supervisor agent breaks complex processes into manageable steps, assigns tasks to domain-specialist sub-agents, and aggregates their results. Each sub-agent can focus on a specific capability, such as database queries, API calls, or document analysis, enabling a separation of concerns that improves both reliability and maintainability [9].
AgentCore, launched in October 2025, provides the infrastructure layer for deploying and operating agents at scale. Its components include [9]:
| Component | Function |
|---|---|
| Runtime | Secure, serverless environment for deploying agents and tools; supports any framework |
| Memory | Session persistence and context retention across interactions |
| Gateway | Transforms existing tools and services into agent-ready capabilities |
| Identity | Authentication and authorization for agent-to-service communication |
| Observability | Monitoring, logging, and debugging for agent behavior |
AgentCore supports the Agent-to-Agent (A2A) protocol, announced in late 2025, which enables interoperability between agents built on different frameworks, including AWS Strands Agents, OpenAI Agents SDK, LangGraph, Google ADK, and Claude Agents SDK. The A2A protocol allows agents to share context, capabilities, and reasoning in a standardized, verifiable format [12].
Bedrock Guardrails provides configurable safety filters that sit between the user and the model. Organizations can define policies to block harmful content, enforce topic boundaries, redact personally identifiable information, and check for hallucinations. AWS reports that Guardrails can block up to 88 percent of harmful content and identify correct model responses with up to 99 percent accuracy through its Automated Reasoning checks [5].
Guardrails also include contextual grounding checks, which verify whether a model's response is actually supported by the source material provided. This is especially useful in RAG applications where responses should be traceable back to retrieved documents.
The Guardrails system provides several distinct categories of protection [5][11]:
| Feature | Description |
|---|---|
| Content filters | Block harmful or inappropriate text and images across categories (hate, sexual, violence, self-harm) with configurable severity thresholds |
| Prompt attack detection | Identify and block malicious prompts attempting to bypass moderation or alter model behavior |
| Denied topics | Define topics the model should refuse to discuss (e.g., illegal advice, competitor analysis) |
| Word filters | Block specific words or phrases such as profanity, competitor names, or internal terminology |
| PII redaction | Automatically detect and redact personally identifiable information from inputs and outputs |
| Contextual grounding | Verify that responses are supported by retrieved source material |
| Automated Reasoning | Logic-based verification that checks model outputs against defined business rules |
Guardrails can be applied to any model available through Bedrock, including custom and fine-tuned models. They are also integrated with Knowledge Bases and Agents, creating layered protection across the full application stack.
Bedrock supports fine-tuning of select models using labeled training data. Organizations can adapt a foundation model to their specific domain, terminology, or response style without building a model from scratch. Bedrock also supports continued pre-training with unlabeled data for deeper customization. All fine-tuning happens within the AWS environment, and training data never leaves the customer's account [6].
Bedrock offers four distinct approaches to model customization [13]:
| Method | Description | Data Required | Best For |
|---|---|---|---|
| Supervised fine-tuning | Adapt model behavior using labeled prompt-completion pairs | Labeled examples | Domain-specific tasks, tone adjustment |
| Continued pre-training | Train on unlabeled domain data to deepen knowledge | Unlabeled text corpus | Industry jargon, proprietary knowledge |
| Distillation | Transfer capabilities from a large "teacher" model to a smaller "student" model | Prompt dataset (automated synthesis) | Cost reduction while maintaining quality |
| Reinforcement fine-tuning | Optimize using reward functions with rule-based or AI-based graders | Evaluation criteria | Code generation, math, instruction following |
Reinforcement fine-tuning, introduced in December 2025, is particularly notable. It enables developers to improve model accuracy without deep machine learning expertise or large volumes of labeled data. At launch, the feature supports Amazon Nova 2 Lite, with additional models planned. AWS reports that reinforcement fine-tuning delivers average accuracy gains of 66 percent over base models [13].
Bedrock provides built-in model evaluation tools that let teams compare different models on their own datasets. Automatic evaluations charge only for the model inference used, with no minimum usage commitments. This is particularly valuable for organizations that want to rigorously benchmark models before committing to a specific one in production.
Introduced in 2025, Intelligent Prompt Routing allows Bedrock to automatically direct requests to different models within the same model family based on prompt complexity. For example, simple queries might be routed to Claude 3 Haiku (cheaper, faster), while complex queries go to Claude 3.5 Sonnet (more capable). AWS claims this can reduce costs by up to 30 percent without compromising accuracy [6]. The routing system also supports Llama (routing between Llama 3.3 70B and 3.1 8B) and Nova (routing between Nova Pro and Nova Lite) model families.
Bedrock Flows is a visual workflow authoring tool that lets users orchestrate multiple components, including foundation models, prompts, agents, Knowledge Bases, Guardrails, and AWS services, into coherent pipelines. Teams can design, test, and iterate on multi-step AI workflows using a drag-and-drop interface. Flows pricing is $0.035 per 1,000 node transitions [7].
Bedrock uses a pay-per-use pricing structure with several options designed to match different workload patterns.
| Pricing Tier | Description | Best For |
|---|---|---|
| On-Demand (Standard) | Pay per input/output token at base rates | Variable or experimental workloads |
| Priority | 75% premium over standard rates for guaranteed low latency | Latency-sensitive production workloads |
| Flex | 50% discount vs. on-demand; best-effort processing | Cost-sensitive, latency-tolerant workloads |
| Batch Inference | 50% discount vs. on-demand; results within 24 hours | Large-scale, non-time-sensitive processing |
| Provisioned Throughput | Reserved model units for guaranteed capacity (1-month and 6-month terms) | High-volume production workloads |
| Bedrock Flows | $0.035 per 1,000 node transitions | Multi-step orchestrated pipelines |
| Model Evaluation | Charged only for inference used | Benchmarking and model selection |
The introduction of the Priority and Flex tiers in 2025 added flexibility for workloads with different latency and cost requirements. Batch inference is particularly attractive for offline processing tasks, offering the same model quality at half the on-demand price [7].
Amazon Bedrock has attracted a diverse range of enterprise customers across industries. Several notable deployments illustrate the platform's production capabilities.
| Customer | Industry | Use Case | Results |
|---|---|---|---|
| Robinhood | Financial services | AI-first financial analysis and customer support | Scaled from 500M to 5B tokens/day in 6 months; 80% AI cost reduction [10] |
| Toyota Motor North America | Automotive | RAG-driven dealer assistant for vehicle information | Over 7,000 dealer interactions per month [10] |
| Apex Fintech Solutions | Financial services | Financial crime investigation with agent-to-agent communication | Automated complex investigation workflows [10] |
| Epsilon | Marketing | Intelligent agents for campaign workflow automation | Enterprise-grade campaign management with security compliance [10] |
| CloudZero | Cloud FinOps | AI-powered cloud cost advisor platform | 50x growth; 75% reduction in developer cognitive load [10] |
| Fujitsu | Supply chain | Agentic supply chain workflows with guardian agent monitoring | Continuous monitoring and correction of agent drift [10] |
These deployments demonstrate that Bedrock is being used in production at significant scale, with customers processing billions of tokens daily and integrating AI into mission-critical business processes.
Bedrock encrypts all data in transit and at rest. Customer prompts and outputs are not stored by AWS or used to train or improve foundation models. All data processing occurs within the customer's own AWS account, and Bedrock integrates with AWS Identity and Access Management (IAM), AWS PrivateLink, and AWS CloudTrail for access control and auditing. This makes it suitable for industries with strict compliance requirements, including healthcare, finance, and government.
Bedrock competes primarily with Azure OpenAI Service (Microsoft) and Google Vertex AI (Google Cloud).
| Feature | Amazon Bedrock | Azure OpenAI Service | Google Vertex AI |
|---|---|---|---|
| Model Providers | 10+ providers | Primarily OpenAI | Primarily Google (Gemini) + Model Garden |
| Approach | Multi-vendor marketplace | Deep OpenAI integration | Data-first, analytics-driven |
| RAG Support | Knowledge Bases (managed) | Azure AI Search integration | Vertex AI Search |
| Agent Framework | Bedrock Agents + AgentCore | Azure AI Agent Service | Vertex AI Agents |
| Safety Tools | Bedrock Guardrails | Content filtering + Responsible AI | Responsible AI toolkit |
| Pricing Model | Per-token, batch, flex, provisioned | Per-token, PTUs | Per-token, compute-hour |
| Ecosystem | AWS services integration | Microsoft/Office 365 integration | BigQuery, Dataflow integration |
| Multi-Agent | A2A protocol, supervisor agents | Agent orchestration | Agent Engine |
| Fine-Tuning | Supervised, continued pre-training, distillation, reinforcement | Supervised fine-tuning | Supervised, RLHF |
Bedrock's main advantage is breadth of model choice. While Azure centers on OpenAI's models and Vertex AI focuses on Google's Gemini family, Bedrock offers the widest selection of third-party providers in a single managed platform. For enterprises already invested in AWS infrastructure, Bedrock also provides the smoothest integration path [8].
For typical enterprise applications processing 10 to 50 million tokens monthly, Bedrock generally offers 15 to 25 percent lower costs than Azure, though Azure becomes more competitive at scale with reserved capacity [8]. Bedrock's Flex tier offers a unique advantage for latency-tolerant workloads, providing 50 percent discounts that have no direct equivalent in Azure or Vertex AI.
As of early 2026, Amazon Bedrock has grown into one of the most feature-rich AI platforms in the cloud market. The addition of nearly 100 serverless models, including open-weight models from Google, NVIDIA, MiniMax, and Moonshot AI, has broadened its appeal beyond the initial set of providers. The Nova 2 family positions Amazon's own models as competitive alternatives for reasoning tasks.
AWS has also invested heavily in agentic AI capabilities, with Bedrock Agents, AgentCore, and Flows forming the backbone of increasingly sophisticated multi-step AI workflows. The support for the A2A protocol reflects a commitment to interoperability in a market where enterprises often use multiple AI frameworks and providers. The platform's Guardrails feature has matured into a comprehensive responsible AI solution that addresses enterprise concerns about safety, hallucination, and compliance.
Looking ahead, AWS continues to expand regional availability, add new model providers, and deepen integration with the broader AWS ecosystem. Bedrock's position as a model-agnostic platform gives it a structural advantage in a market where model leadership shifts rapidly between providers.