Amazon Bedrock
Last reviewed
Sources
16 citations
Review status
Source-backed
Revision
v6 ยท 3,991 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
16 citations
Review status
Source-backed
Revision
v6 ยท 3,991 words
Add missing citations, update stale details, or suggest a clearer explanation.
Amazon Bedrock is a fully managed service from Amazon Web Services (AWS) that offers a choice of high-performing foundation models from leading AI companies, accessible through a single unified API, so developers and enterprises can build, scale, and deploy generative artificial intelligence applications and agents without managing infrastructure or training models from scratch [1]. AWS describes it as "the platform for building generative AI applications and agents at production scale," and reports that Bedrock powers generative AI for more than 100,000 organizations worldwide, from startups to global enterprises across every industry [6][10]. It reached general availability on September 28, 2023, and by early 2026 offered access to nearly 100 serverless models, making it one of the broadest multi-provider model platforms in cloud computing [1][14].
Bedrock is used to add generative AI capabilities to applications without operating model infrastructure. Common workloads include text generation and summarization, conversational assistants and chatbots, retrieval-augmented generation over private data, code generation, image generation, document and video analysis, and increasingly the building and operation of autonomous AI agents. Because models are exposed through one API and standard AWS SDKs, teams can prototype with several models, benchmark them on their own tasks, and move the best fit into production without rebuilding application logic. The official AWS definition states that "Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies" [1].
AWS first previewed Amazon Bedrock in April 2023 at its annual summit, positioning it as the company's answer to growing demand for enterprise-grade generative AI tools. The service became generally available on September 28, 2023, initially launching in two AWS regions: US East (N. Virginia) and US West (Oregon) [1][2]. At launch, the platform offered models from AI21 Labs, Anthropic, Cohere, Stability AI, and Amazon's own Titan family, with Meta's Llama 2 (in fine-tuned 13B and 70B versions) added shortly after as the first fully managed Llama 2 API [1][2]. Bedrock achieved HIPAA eligibility and GDPR compliance at general availability [2].
Since then, AWS has steadily expanded Bedrock's capabilities and model catalog. In December 2025, AWS announced its largest single expansion, adding 18 fully managed open-weight models to the platform, with new entries from Mistral, Google, NVIDIA, OpenAI, MiniMax, Moonshot, and Qwen; with that launch Bedrock reached nearly 100 serverless models [14]. That same month, on December 2, 2025, AWS introduced the Amazon Nova 2 family of foundation models, which offer advanced reasoning with competitive price-performance characteristics [3].
In October 2025, AWS launched Amazon Bedrock AgentCore, a platform-level service for building, deploying, and operating AI agents at scale without managing infrastructure. AgentCore became generally available on October 13, 2025 and marked a significant evolution in the platform's capabilities, reflecting the broader industry shift toward agentic AI architectures [9][15]. By early 2026, Amazon Bedrock powers generative AI for more than 100,000 organizations worldwide, spanning startups to global enterprises across every industry [10].
Amazon Bedrock was previewed in April 2023 and reached general availability on September 28, 2023. The following timeline summarizes major milestones [1][3][9][14][15].
| Date | Milestone |
|---|---|
| April 2023 | Bedrock previewed at the AWS Summit |
| September 28, 2023 | General availability in US East (N. Virginia) and US West (Oregon) |
| 2025 | Intelligent Prompt Routing, Priority and Flex pricing tiers, multi-agent collaboration introduced |
| July 2025 | Bedrock AgentCore released in preview |
| October 13, 2025 | Bedrock AgentCore reaches general availability |
| December 2, 2025 | Amazon Nova 2 family announced; 18 open-weight models added (largest expansion to date) |
| December 3, 2025 | Reinforcement fine-tuning announced |
| Early 2026 | Nearly 100 serverless models; 100,000+ organizations using Bedrock |
One of Bedrock's primary differentiators is its multi-provider approach. Instead of locking customers into a single model family, it offers a marketplace of foundation models spanning text generation, image generation, and embeddings.
| Provider | Notable Models | Capabilities |
|---|---|---|
| Anthropic | Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 4 | Text generation, analysis, code, vision |
| Meta | Llama 3.3 (8B, 70B, 405B) | Text generation, code, multilingual |
| Amazon | Titan Text, Titan Embeddings, Amazon Nova 2 Lite, Nova 2 Pro | Text generation, embeddings, reasoning |
| Mistral AI | Mistral Large 3, Magistral Small 1.2, Ministral 3 series, Voxtral | Text generation, code, multilingual, audio |
| Cohere | Command R, Command R+, Embed | Text generation, RAG-optimized, embeddings |
| Stability AI | Stable Diffusion XL, SDXL Turbo | Image generation |
| AI21 Labs | Jamba-Instruct | Text generation, long context |
| DeepSeek | DeepSeek R1 | Reasoning, code |
| Gemma 3 (4B, 12B, 27B) | Text generation, lightweight | |
| NVIDIA | Nemotron Nano 2 series | Text generation, vision |
AWS continues to add new providers and models on a regular basis. The platform also includes Amazon's own Nova 2 family, which was announced on December 2, 2025 and includes Nova 2 Lite, a fast, cost-effective reasoning model for everyday workloads, and Nova 2 Pro (Preview), positioned as Amazon's most intelligent model for complex, multi-step tasks [3]. Both Nova 2 models support extended thinking with step-by-step reasoning at three thinking-intensity levels (low, medium, and high), include built-in tools such as a code interpreter and web grounding, support remote Model Context Protocol (MCP) tools, and provide a one-million-token context window [3].
Bedrock prices models on a per-token basis, with costs varying significantly across providers and model sizes. The following table provides representative on-demand pricing for commonly used models as of early 2026.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Claude 3.5 Sonnet | $6.00 | $30.00 | Batch: $3.00 / $15.00 |
| Claude 3.5 Haiku | $1.00 | $5.00 | Fastest Claude model |
| Llama 2 Chat 70B | $1.95 | $2.56 | Open-weight |
| DeepSeek v3.2 | $0.62 | $1.85 | Reasoning-focused |
| Gemma 3 4B | $0.04 | $0.08 | Lightweight, budget |
| Gemma 3 12B | $0.09 | $0.29 | Mid-range |
| Gemma 3 27B | $0.23 | $0.38 | Higher capability |
| Ministral 3B | $0.10 | $0.10 | Budget option |
| Voxtral Mini | $0.04 | $0.04 | Audio processing |
Output tokens typically cost 3 to 5 times more than input tokens across most models, reflecting the higher computational cost of text generation versus processing [7]. Budget-conscious deployments can leverage smaller models like Gemma 3 4B or Ministral 3B at a fraction of the cost of frontier models.
Bedrock provides a unified API that lets developers switch between different foundation models with minimal code changes. This abstraction layer means that teams can experiment with multiple models, benchmark their performance on specific tasks, and select the best fit without rebuilding their application logic. All API calls are made through standard AWS SDKs, making integration straightforward for organizations already using AWS infrastructure.
Retrieval-augmented generation (RAG) is a technique that enhances the accuracy of AI-generated responses by grounding them in external data sources. Bedrock Knowledge Bases is a fully managed capability that handles the entire RAG workflow, from data ingestion and indexing to retrieval and prompt augmentation [4]. Users connect their data sources (Amazon S3 buckets, web crawlers, or databases), select an embeddings model, and Bedrock automatically chunks, indexes, and stores the data in a vector store. When a query comes in, the system retrieves relevant information and passes it to the generation model as context.
According to AWS, RAG with Knowledge Bases can reduce hallucinated outputs by 50 to 90 percent compared to relying solely on a model's parametric knowledge [4]. New documents can be reindexed automatically without retraining the model, and the system supports both structured and unstructured data with metadata filtering.
The Knowledge Bases pipeline operates in two phases. During the ingestion phase, Bedrock accepts documents from connected data sources, splits them into configurable chunk sizes, generates vector embeddings using a selected embeddings model (such as Amazon Titan Embeddings or Cohere Embed), and stores the resulting vectors in a supported vector store. Supported vector stores include Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL, Pinecone, and Redis Enterprise Cloud [4].
During the retrieval phase, when a user submits a query, the system generates a query embedding, performs a semantic similarity search against the vector store, retrieves the most relevant document chunks, augments the prompt with the retrieved context, and passes the augmented prompt to the selected generation model. The system also supports in-built session context management and source attribution, returning citations that trace each part of the response back to its source document [11].
Knowledge Bases can be integrated with Bedrock Guardrails for content filtering and with Bedrock Agents for multi-step task execution that requires grounding in enterprise data.
Bedrock Agents allows developers to create AI-powered assistants that can reason through multi-step tasks, call external APIs, and interact with Knowledge Bases. An agent takes a user request, breaks it down into subtasks, decides which tools or data sources to query, executes those steps, and returns a coherent response. This makes it possible to build applications like customer service bots that can look up order status, process refunds, and answer product questions in a single conversation.
Agents are defined through action groups (the APIs and functions they can call) and can be integrated with Knowledge Bases and Guardrails for controlled, grounded responses.
Bedrock Agents use a managed orchestration architecture where a foundation model serves as the reasoning engine. When a user sends a request, the agent invokes the FM to create a reasoning trace, which determines the sequence of actions required to fulfill the request. At each step, the agent decides whether to call an action group, query a Knowledge Base, or return a response to the user.
Multi-agent collaboration, introduced in 2025, allows developers to build systems where multiple specialized agents work together under the coordination of a supervisor agent. The supervisor agent breaks complex processes into manageable steps, assigns tasks to domain-specialist sub-agents, and aggregates their results. Each sub-agent can focus on a specific capability, such as database queries, API calls, or document analysis, enabling a separation of concerns that improves both reliability and maintainability [9].
AgentCore, which reached general availability on October 13, 2025, provides the infrastructure layer for deploying and operating agents at scale. AWS describes it as "an agentic platform to build, deploy and operate highly capable agents securely at scale using any framework, model, or protocol" [15]. Its components include [9][15]:
| Component | Function |
|---|---|
| Runtime | Secure, serverless environment for deploying agents and tools; supports any framework, complete session isolation, and execution windows up to 8 hours |
| Memory | Session persistence and context retention across interactions |
| Gateway | Transforms existing tools, APIs, and Lambda functions into agent-ready capabilities and connects to MCP servers |
| Identity | Authentication and authorization for agent-to-service communication, with OAuth integration and secure token storage |
| Observability | Monitoring, logging, and debugging for agent behavior via Amazon CloudWatch dashboards, OpenTelemetry compatible |
AgentCore supports the Agent-to-Agent (A2A) protocol, announced in late 2025, which enables interoperability between agents built on different frameworks, including AWS Strands Agents, OpenAI Agents SDK, LangGraph, Google ADK, and Claude Agents SDK. The A2A protocol allows agents to share context, capabilities, and reasoning in a standardized, verifiable format [12].
Bedrock Guardrails provides configurable safety filters that sit between the user and the model. Organizations can define policies to block harmful content, enforce topic boundaries, redact personally identifiable information, and check for hallucinations. AWS reports that Guardrails can block up to 88 percent of harmful content, including multimodal image and text content, and that its Automated Reasoning checks identify correct model responses with up to 99 percent accuracy [5][6]. AWS notes that the Automated Reasoning checks "mathematically verify natural language content against your defined policies," providing provable explanations rather than pattern matching [5].
Guardrails also include contextual grounding checks, which verify whether a model's response is actually supported by the source material provided. This is especially useful in RAG applications where responses should be traceable back to retrieved documents.
The Guardrails system provides several distinct categories of protection [5][11]:
| Feature | Description |
|---|---|
| Content filters | Block harmful or inappropriate text and images across categories (hate, sexual, violence, self-harm) with configurable severity thresholds |
| Prompt attack detection | Identify and block malicious prompts attempting to bypass moderation or alter model behavior |
| Denied topics | Define topics the model should refuse to discuss (e.g., illegal advice, competitor analysis) |
| Word filters | Block specific words or phrases such as profanity, competitor names, or internal terminology |
| PII redaction | Automatically detect and redact personally identifiable information from inputs and outputs |
| Contextual grounding | Verify that responses are supported by retrieved source material |
| Automated Reasoning | Logic-based verification that checks model outputs against defined business rules |
Guardrails can be applied to any model available through Bedrock, including custom and fine-tuned models. They are also integrated with Knowledge Bases and Agents, creating layered protection across the full application stack.
Bedrock supports fine-tuning of select models using labeled training data. Organizations can adapt a foundation model to their specific domain, terminology, or response style without building a model from scratch. Bedrock also supports continued pre-training with unlabeled data for deeper customization. All fine-tuning happens within the AWS environment, and training data never leaves the customer's account [6].
Bedrock offers four distinct approaches to model customization [13]:
| Method | Description | Data Required | Best For |
|---|---|---|---|
| Supervised fine-tuning | Adapt model behavior using labeled prompt-completion pairs | Labeled examples | Domain-specific tasks, tone adjustment |
| Continued pre-training | Train on unlabeled domain data to deepen knowledge | Unlabeled text corpus | Industry jargon, proprietary knowledge |
| Distillation | Transfer capabilities from a large "teacher" model to a smaller "student" model | Prompt dataset (automated synthesis) | Cost reduction while maintaining quality |
| Reinforcement fine-tuning | Optimize using reward functions with rule-based or AI-based graders | Evaluation criteria | Code generation, math, instruction following |
Distilled models run up to 500 percent faster and cost up to 75 percent less than the larger models they are derived from, while preserving comparable accuracy [6]. Reinforcement fine-tuning, introduced on December 3, 2025, is particularly notable: it enables developers to improve model accuracy without deep machine learning expertise or large volumes of labeled data. It supports two complementary approaches: Reinforcement Learning with Verifiable Rewards (RLVR), which uses rule-based graders for objective tasks such as code generation and mathematical reasoning, and Reinforcement Learning from AI Feedback (RLAIF), which uses AI-based judges for subjective tasks such as instruction following and tone consistency. At launch the feature supported Amazon Nova 2 Lite, with additional models planned, and AWS reports that reinforcement fine-tuning delivers average accuracy gains of 66 percent over base models [13].
Bedrock provides built-in model evaluation tools that let teams compare different models on their own datasets. Automatic evaluations charge only for the model inference used, with no minimum usage commitments. This is particularly valuable for organizations that want to rigorously benchmark models before committing to a specific one in production.
Introduced in 2025, Intelligent Prompt Routing allows Bedrock to automatically direct requests to different models within the same model family based on prompt complexity. For example, simple queries might be routed to Claude 3 Haiku (cheaper, faster), while complex queries go to Claude 3.5 Sonnet (more capable). AWS claims this can cut costs by up to 30 percent while maintaining quality [6]. The routing system also supports Llama (routing between Llama 3.3 70B and 3.1 8B) and Nova (routing between Nova Pro and Nova Lite) model families.
Bedrock Flows is a visual workflow authoring tool that lets users orchestrate multiple components, including foundation models, prompts, agents, Knowledge Bases, Guardrails, and AWS services, into coherent pipelines. Teams can design, test, and iterate on multi-step AI workflows using a drag-and-drop interface. Flows pricing is $0.035 per 1,000 node transitions [7].
Bedrock uses a pay-per-use pricing structure with several options designed to match different workload patterns.
| Pricing Tier | Description | Best For |
|---|---|---|
| On-Demand (Standard) | Pay per input/output token at base rates | Variable or experimental workloads |
| Priority | 75% premium over standard rates for guaranteed low latency | Latency-sensitive production workloads |
| Flex | 50% discount vs. on-demand; best-effort processing | Cost-sensitive, latency-tolerant workloads |
| Batch Inference | 50% discount vs. on-demand; results within 24 hours | Large-scale, non-time-sensitive processing |
| Provisioned Throughput | Reserved model units for guaranteed capacity (1-month and 6-month terms) | High-volume production workloads |
| Bedrock Flows | $0.035 per 1,000 node transitions | Multi-step orchestrated pipelines |
| Model Evaluation | Charged only for inference used | Benchmarking and model selection |
The introduction of the Priority and Flex tiers in 2025 added flexibility for workloads with different latency and cost requirements. Batch inference is particularly attractive for offline processing tasks, offering the same model quality at half the on-demand price [7].
Amazon Bedrock has attracted a diverse range of enterprise customers across industries. Several notable deployments illustrate the platform's production capabilities.
| Customer | Industry | Use Case | Results |
|---|---|---|---|
| Robinhood | Financial services | AI-first financial analysis and customer support | Scaled from 500M to 5B tokens/day in 6 months; 80% AI cost reduction; development time cut by half [10][16] |
| Toyota Motor North America | Automotive | RAG-driven dealer assistant for vehicle information | Over 7,000 dealer interactions per month [10] |
| Apex Fintech Solutions | Financial services | Financial crime investigation with agent-to-agent communication | Automated complex investigation workflows [10] |
| Epsilon | Marketing | Intelligent agents for campaign workflow automation | Enterprise-grade campaign management with security compliance [10] |
| CloudZero | Cloud FinOps | AI-powered cloud cost advisor platform | 50x growth; 75% reduction in developer cognitive load [10] |
| Fujitsu | Supply chain | Agentic supply chain workflows with guardian agent monitoring | Continuous monitoring and correction of agent drift [10] |
These deployments demonstrate that Bedrock is being used in production at significant scale, with customers processing billions of tokens daily and integrating AI into mission-critical business processes. Robinhood, which built a secure "FinCrimes Agent" on Bedrock to synthesize customer and transactional data into investigative summaries, has made the platform central to its operations. According to Dev Tagare, Robinhood's Head of AI, "Amazon Bedrock's model diversity, security, and compliance features are purpose-built for regulated industries" [16].
Bedrock encrypts all data in transit and at rest. Customer prompts and outputs are not stored by AWS or used to train or improve foundation models. All data processing occurs within the customer's own AWS account, and Bedrock integrates with AWS Identity and Access Management (IAM), AWS PrivateLink, and AWS CloudTrail for access control and auditing. Bedrock achieved HIPAA eligibility and GDPR compliance at general availability, making it suitable for industries with strict compliance requirements, including healthcare, finance, and government [2].
Bedrock competes primarily with Azure OpenAI Service (Microsoft) and Google Vertex AI (Google Cloud).
| Feature | Amazon Bedrock | Azure OpenAI Service | Google Vertex AI |
|---|---|---|---|
| Model Providers | 10+ providers | Primarily OpenAI | Primarily Google (Gemini) + Model Garden |
| Approach | Multi-vendor marketplace | Deep OpenAI integration | Data-first, analytics-driven |
| RAG Support | Knowledge Bases (managed) | Azure AI Search integration | Vertex AI Search |
| Agent Framework | Bedrock Agents + AgentCore | Azure AI Agent Service | Vertex AI Agents |
| Safety Tools | Bedrock Guardrails | Content filtering + Responsible AI | Responsible AI toolkit |
| Pricing Model | Per-token, batch, flex, provisioned | Per-token, PTUs | Per-token, compute-hour |
| Ecosystem | AWS services integration | Microsoft/Office 365 integration | BigQuery, Dataflow integration |
| Multi-Agent | A2A protocol, supervisor agents | Agent orchestration | Agent Engine |
| Fine-Tuning | Supervised, continued pre-training, distillation, reinforcement | Supervised fine-tuning | Supervised, RLHF |
Bedrock's main advantage is breadth of model choice. While Azure centers on OpenAI's models and Vertex AI focuses on Google's Gemini family, Bedrock offers the widest selection of third-party providers in a single managed platform. For enterprises already invested in AWS infrastructure, Bedrock also provides the smoothest integration path [8].
For typical enterprise applications processing 10 to 50 million tokens monthly, Bedrock generally offers 15 to 25 percent lower costs than Azure, though Azure becomes more competitive at scale with reserved capacity [8]. Bedrock's Flex tier offers a unique advantage for latency-tolerant workloads, providing 50 percent discounts that have no direct equivalent in Azure or Vertex AI.
As of early 2026, Amazon Bedrock has grown into one of the most feature-rich AI platforms in the cloud market, powering generative AI for more than 100,000 organizations worldwide [10]. The addition of nearly 100 serverless models, including open-weight models from Google, NVIDIA, MiniMax, and Moonshot AI, has broadened its appeal beyond the initial set of providers [14]. The Nova 2 family positions Amazon's own models, with their one-million-token context windows and tiered reasoning, as competitive alternatives for agentic and reasoning tasks [3].
AWS has also invested heavily in agentic AI capabilities, with Bedrock Agents, AgentCore, and Flows forming the backbone of increasingly sophisticated multi-step AI workflows. The support for the A2A protocol reflects a commitment to interoperability in a market where enterprises often use multiple AI frameworks and providers. The platform's Guardrails feature has matured into a comprehensive responsible AI solution that addresses enterprise concerns about safety, hallucination, and compliance.
Looking ahead, AWS continues to expand regional availability, add new model providers, and deepen integration with the broader AWS ecosystem. Bedrock's position as a model-agnostic platform gives it a structural advantage in a market where model leadership shifts rapidly between providers.