Google Vertex AI

Enterprise AI Google MLOps

29 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

46 citations

Revision

v4 · 5,711 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Google Vertex AI is the unified machine learning and generative artificial intelligence platform offered by Google Cloud, announced at Google I/O on May 18, 2021 as a consolidation of the legacy AI Platform with the AutoML product family.^[1]^[2] The platform spans the full machine learning lifecycle, including managed notebooks, custom training, MLOps tooling such as pipelines and a feature store, online and batch prediction, and a generative AI stack centered on Model Garden, Vertex AI Studio, and the Agent Builder.^[3]^[4] It serves as the primary surface for enterprise access to Google's first-party Gemini, Imagen, Veo, and Lyria foundation models, as well as a managed Model-as-a-Service catalog of third-party models including Anthropic's Claude family, Meta's Llama models, and Mistral AI offerings.^[5]^[6] Vertex AI competes directly with Amazon Bedrock and Microsoft's Azure OpenAI Service, and at Google Cloud Next 2026 was rebranded as the Gemini Enterprise Agent Platform while preserving its underlying APIs and SDKs.^[7]^[8]

At launch, Google Cloud stated that "Vertex AI requires nearly 80% fewer lines of code to train a model versus competitive platforms," a claim that became the platform's defining marketing pitch.^[9] By 2026 the platform had grown from that MLOps console into a catalog of more than 200 enterprise-ready foundation models, an open-source agent framework downloaded over 7 million times, and the centerpiece of Google Cloud's enterprise AI strategy.^[28]^[19]^[8]

Attribute	Value
Type	Managed ML and generative AI platform
Operator	Google Cloud
Announced	May 18, 2021 (Google I/O)
Predecessor	AI Platform, AutoML
Foundation models	Gemini, Imagen, Veo, Lyria, MedLM, Embeddings
Third-party MaaS	Claude (Anthropic), Llama (Meta), Mistral, AI21, Cohere
Model Garden size	200+ enterprise-ready models (2026)
Agent framework	Agent Development Kit (ADK), 7M+ Python downloads
Pipelines runtime	Kubeflow Pipelines SDK
2026 rebrand	Gemini Enterprise Agent Platform (Cloud Next 2026, Apr 22, 2026)

What is Vertex AI used for?

Vertex AI is used to build, train, deploy, monitor, and serve machine learning models, and to access and orchestrate foundation models for generative AI applications, all on managed Google Cloud infrastructure. In practice it covers two broad workloads: classic MLOps (managed notebooks, custom training across CPUs, GPUs, and Cloud TPUs, pipelines, a feature store, model registry, drift monitoring, and online and batch prediction) and generative AI (prompt design, fine-tuning, retrieval-augmented generation, and production agents built on Gemini and third-party models).^[3]^[4] Enterprises adopt it for use cases such as catalog enrichment, demand forecasting, fraud detection, clinical documentation, code assistance, and conversational agents, while consuming Google's first-party Gemini, Imagen, Veo, and Lyria models alongside a managed catalog of Claude, Llama, Mistral, AI21, and Cohere models.^[5]^[28]

History

Origins as AI Platform

Before Vertex AI, Google Cloud sold machine learning services under several disjoint brands. Cloud AI Platform offered notebooks, training jobs, and prediction endpoints aimed at custom ML engineers, while Cloud AutoML targeted users who lacked deep ML expertise by exposing point-and-click model creation for tabular data, vision, video, language, and translation use cases. The split forced enterprise teams to stitch together two distinct toolchains for adjacent problems, which Google Cloud's product management team eventually framed as a barrier to production deployment.^[1]

According to Andrew Moore, who at the time served as vice president and general manager of Cloud AI and Industry Solutions, the unifying motivation for the new platform was to "get data scientists and engineers out of the orchestration weeds, and create an industry-wide shift that would make everyone get serious about moving AI out of pilot purgatory and into full-scale production."^[1] Craig Wiley, then director of product management, described the prevailing situation in starker terms, calling enterprise machine learning "in crisis" because most companies investing in ML were "not getting value from it."^[2]

When was Vertex AI released?

Vertex AI was announced on May 18, 2021 at Google I/O and became generally available the same day.^[9]^[2] At launch, Google Cloud framed the platform as a single managed surface bringing together the prior AI Platform notebooks, training, and prediction services with the AutoML family. The marketing claim that Vertex AI required "nearly 80% fewer lines of code to train a model versus competitive platforms" appeared verbatim in the original press release and was repeated across launch coverage, though the precise comparison method was not formally specified.^[9]^[1]

The launch components fell into two groups. The first set covered MLOps primitives: Vertex Vizier for hyperparameter optimization, Vertex Feature Store as a managed feature serving system, Vertex Experiments for run tracking, Vertex Model Monitoring for production drift detection, Vertex ML Metadata for lineage tracking, Vertex Pipelines for ML workflow orchestration, and Vertex ML Edge Manager (then experimental) for edge deployment.^[1] The second set covered general-purpose modeling tools: managed notebooks, custom training jobs, AutoML for vision, language, structured data, and forecasting, plus prebuilt models for vision, language, conversation, and structured data accessible through APIs.^[9]

Launch customers cited by Google Cloud included L'Oreal subsidiary ModiFace (virtual try-on and skin diagnostics), WPP agency Essence (collaborative data science), Sabre Labs (travel personalization), and Iron Mountain (records-related ML).^[1] Implementation partners named in the announcement included Accenture and Deloitte.^[1]

Generative AI pivot (2023)

Vertex AI's most significant evolution after launch was its transformation into a generative AI platform during 2023. On June 7, 2023, Google Cloud made generative AI support on Vertex AI generally available, exposing foundation models including PaLM 2, Imagen, and Codey through the new Generative AI Studio for prompt design and tuning, and through Model Garden, then a catalog of more than 60 first-party and third-party models.^[10] PaLM 2 access was extended to a 32,000-token context window and grounding capability against enterprise data sources.^[11]

At Google Cloud Next 2023 in late August 2023, Model Garden grew past 100 models with the addition of Meta's Llama 2 and Code Llama, the Technology Innovation Institute's Falcon LLM, and the pre-announced inclusion of Anthropic's Claude 2.^[11] The same announcement introduced Vertex AI Extensions for connecting models to external APIs (including BigQuery, AlloyDB, and partner databases via DataStax, MongoDB, and Redis), Vertex AI Data Connectors for read-only data ingestion, the general availability of Vertex AI Search and Conversation, SynthID watermarking for generated images via Google DeepMind, and Colab Enterprise as a managed notebook environment.^[11]

On December 13, 2023, Gemini 1.0 Pro was made available to Google Cloud customers through Vertex AI and Google AI Studio shortly after the model family's December 6 announcement, with Gemini 1.0 Ultra following on an allowlist basis in early 2024.^[12] On December 14, 2023, Google Cloud introduced MedLM, a family of foundation models fine-tuned for healthcare workflows, available in a large variant for complex tasks and a medium variant tuned for scalable adaptation to specific use cases, built on Med-PaLM 2.^[13]

Claude on Vertex AI

Anthropic's Claude 3 Sonnet and Claude 3 Haiku models became generally available on Vertex AI on March 20, 2024, with Claude 3 Opus following in subsequent weeks.^[5] Successive Claude releases were added to the Model Garden in tandem with Anthropic's own launches, including Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude Sonnet 4 and Claude Opus 4 (announced as available on Vertex AI alongside their general availability in 2025), and Claude Sonnet 4.5.^[14]^[15] Anthropic documented Vertex AI as one of three official enterprise distribution channels alongside the Anthropic API and Amazon Bedrock.^[14]

Agent Builder, RAG Engine, and the ADK

At Google Cloud Next in April 2024, Google Cloud introduced Vertex AI Agent Builder, a no-code product for assembling conversational agents on Gemini that consolidated Vertex AI Search and Vertex AI Conversation with new RAG APIs and vector search primitives.^[16] Then-Google Cloud chief executive Thomas Kurian described the goal as allowing customers to "very easily and quickly build conversational agents."^[16]

On January 10, 2025, Google made Vertex AI RAG Engine generally available as a fully managed retrieval-augmented generation backend. The engine supported parsing with configurable chunking; retrieval against Vertex AI Vector Search, Vertex AI Search, Pinecone, or Weaviate; generation against Gemini, Llama, and Claude; and data connectors for Cloud Storage, Google Drive, Jira, Slack, websites, and BigQuery.^[17]

At Google Cloud Next 2025 on April 9, 2025, the company announced the Agent Development Kit (ADK), an open-source Python framework for multi-agent system development described as the same framework used to power agents inside Google products such as Agentspace.^[18] ADK shipped with bidirectional audio and video streaming, hierarchical multi-agent orchestration, built-in evaluation, and a deployment path to Vertex AI Agent Engine. The framework supported models from Gemini, the Vertex AI Model Garden, and third-party providers including Anthropic, Meta, Mistral, and AI21 via LiteLLM.^[18] In a subsequent Google Cloud blog update, the company reported that "since Agent Builder's public inception earlier this year, we've seen tremendous traction with components such as our Python Agent Development Kit (ADK), which has been downloaded over 7 million times."^[19]

Generative media expansion (2025)

On May 21, 2025, Google Cloud announced the next generation of generative media models on Vertex AI: Imagen 4 (text-to-image, public preview), Veo 3 (text-to-video with native speech and audio, private preview at launch), and Lyria 2 (text-to-music, generally available).^[20] All three integrated SynthID watermarking and configurable safety filters and were accessible through the Vertex AI Media Studio console and Vertex AI API.^[20] On April 8, 2026, Lyria 3 and Lyria 3 Pro arrived, with Lyria 3 Pro supporting compositions up to three minutes with structural elements such as intros, verses, choruses, and bridges, and vocal generation with timed lyrics.^[21]

Gemini 3 and the 2026 rebrand

On November 19, 2025, Google Cloud announced that Gemini 3 was available in Gemini Enterprise and on Vertex AI in preview for developers. Gemini 3 reported a top score of 1501 Elo on LMArena, a one-million-token context window, and improved tool-calling accuracy that Geotab reported reduced its tool-calling mistakes by roughly 30%.^[22] Access channels included Vertex AI, Google Antigravity, the Gemini CLI, and third-party integrations such as Cursor, GitHub Copilot, JetBrains, Figma, and Replit.^[22] As of March 26, 2026, the initial gemini-3-pro-preview alias was retired in favor of gemini-3.1-pro-preview.^[23]

Is Vertex AI being discontinued?

No. Vertex AI was rebranded, not retired. At Google Cloud Next 2026, held April 22, 2026 in Las Vegas, Google Cloud retired the Vertex AI brand and rebranded the platform as Gemini Enterprise Agent Platform, absorbing the Agentspace product into a unified Gemini Enterprise offering.^[7]^[8] Google CEO Thomas Kurian framed the strategy as owning the full stack from custom silicon to the enterprise inbox, telling the Cloud Next 2026 audience that "the experimental phase is behind us, and now the real challenge begins."^[46] The rebrand was described as architectural rather than purely cosmetic: existing Vertex AI workloads continued to run unchanged under the new namespace, with SDKs, billing, and APIs migrated forward, while a June 24, 2026 deadline applied to migration away from certain deprecated SDK modules.^[7] Many engineering teams and third-party platforms continued to refer to the platform by its Vertex AI name through 2026, and Google Cloud's documentation retained "Vertex AI" in product page headings such as "Gemini Enterprise Agent Platform (formerly Vertex AI)."^[8]

Core MLOps Components

Vertex AI Workbench

Workbench is Vertex AI's managed JupyterLab notebook environment, available in user-managed and fully managed variants. Notebook instances integrate with Cloud Storage, BigQuery, and the rest of Google Cloud's data stack, and a notebook executor announced in November 2021 allowed teams to schedule notebooks for ad hoc or recurring execution.^[24] Workbench was joined in 2023 by Colab Enterprise, a managed Colab variant launched at Google Cloud Next 2023 for organizations preferring the Colab interface.^[11]

Custom Training

Custom Training provides managed compute for arbitrary training scripts, with built-in support for PyTorch, TensorFlow, JAX, and scikit-learn. Jobs can scale across CPUs, GPUs, and Cloud TPU pods. The platform offers managed hyperparameter tuning (originally branded Vertex Vizier and based on Google's internal Vizier optimizer) and distributed training primitives.^[9]^[4]

Vertex AI Pipelines

Vertex AI Pipelines is a serverless orchestrator for ML workflows. It executes pipelines authored against the open-source Kubeflow Pipelines SDK and the TensorFlow Extended (TFX) SDK, removing the need to operate Kubeflow clusters directly while preserving SDK portability.^[25] Pipelines are typically used to chain preprocessing, training, evaluation, and deployment components, and integrate natively with Model Registry, Model Monitoring, and Feature Store.

Vertex AI Feature Store

The Feature Store manages features for both training and online serving. It uses BigQuery as the offline store and optimized online nodes for low-latency lookups, exposing a single feature definition for both training and serving to prevent training-serving skew.^[4]

Model Registry

Vertex AI Model Registry is a searchable repository for model versions and metadata. It tracks lineage from training runs, supports model versioning and aliasing, and allows direct deployment from a registered version to an endpoint.^[4]

Model Monitoring

Vertex AI Model Monitoring detects distributional drift and training-serving skew on production endpoints. It supports both feature distribution monitoring and feature attribution monitoring, which tracks how each input feature's contribution to predictions changes over time using Vertex Explainable AI. For AutoML tabular models, explainability is automatically configured, so users can enable skew and drift detection with a single gcloud command and configure per-feature thresholds.^[26]

Vertex AI Prediction and Endpoints

The prediction service supports two modes. Online prediction serves models behind low-latency HTTPS endpoints with autoscaling and traffic splitting between model versions. Batch prediction processes large input files asynchronously against managed compute, billed per job. Endpoints can be public or private (using Private Service Connect), regional or multi-region.^[4]^[27]

Generative AI Components

Vertex AI Studio

Vertex AI Studio (originally launched as Generative AI Studio in June 2023) is a console workspace for prompt design, prompt versioning, evaluation, and supervised fine-tuning across the supported foundation models.^[10] The Studio includes Media Studio sub-experiences for image, video, and music generation against Imagen, Veo, and Lyria, alongside text-oriented chat and completion playgrounds.^[20]

Model Garden

Model Garden is the curated catalog of foundation and open models accessible from Vertex AI. At launch in June 2023 it contained more than 60 models; by late 2025 and into 2026 it cataloged more than 200 enterprise-ready models spanning Google's own Gemini, Imagen 3, Imagen 4, Veo, Veo 3, Lyria, Gemma, and MedLM; open models such as Llama and Gemma; and third-party proprietary models including Anthropic's Claude, Mistral variants, AI21 Labs Jamba, and Cohere Command models.^[28]^[11]

Models are made available through several deployment patterns. Some, including Gemini and most third-party proprietary models, are Model-as-a-Service (MaaS) APIs that customers consume without provisioning infrastructure. Others, such as many Llama variants and certain Mistral models, are served through customer-managed virtual machine deployment from Model Garden templates. Beginning in 2025, Vertex AI added a pattern for "self-deploying" partner proprietary models inside customers' own VPCs, supporting AI21 Labs, CAMB.AI, Contextual AI, CSM, Mistral, Qodo, Virtue AI, and WRITER.^[29]

Generative AI APIs

Vertex AI exposes generative APIs for several model families:

Gemini (text, image, video, audio, code), including current production models such as Gemini 2.5 Flash, Gemini 2.5 Pro, Gemini 3 Pro, and Gemini 3 Flash.^[30]^[22]
Imagen and Imagen 4 for text-to-image and image editing.^[20]
Veo and Veo 3 for text-to-video generation, with Veo 3 introducing native dialogue, sound effects, and music alongside video frames.^[20]
Lyria, Lyria 2, Lyria 3, and Lyria 3 Pro for text-to-music generation, with reference image conditioning and timed lyric generation in Lyria 3.^[21]
Chirp speech models for speech recognition and synthesis, and text and multimodal embedding APIs.^[31]
MedLM (large and medium variants) for healthcare workflows including patient summary generation and clinical question answering.^[13]

Agent Builder and Agent Engine

Vertex AI Agent Builder (launched April 2024) provides no-code and low-code construction of conversational agents on Gemini, with managed retrieval and grounding via Vertex AI Search.^[16] Agent Engine is the managed runtime for production agents, providing autoscaling, observability, and integration with more than 100 prebuilt connectors and enterprise data systems.^[18] The Agent Development Kit (April 2025) gave developers an open-source Python framework that can deploy to Agent Engine or to other container runtimes, with model-agnostic support across Vertex AI Model Garden and external providers via LiteLLM.^[18]^[19] By Cloud Next 2026 the ADK had reached a version 1.0 stable release across four languages (Python, Go, Java, and TypeScript).^[8]

Vertex AI Search and RAG Engine

Vertex AI Search (formerly Enterprise Search, previously sold within "Vertex AI Search and Conversation") is a managed search-and-retrieval service that abstracts the ingestion, OCR, chunking, embedding, indexing, and access control layers of a search system, and it is positioned as a turnkey backend for retrieval-augmented generation pipelines.^[32] Industry-tuned variants exist for retail commerce, media, and healthcare and life sciences, including Vertex AI Search for Healthcare which reached general availability in 2024.^[32]^[33]

The Vertex AI RAG Engine, announced as generally available January 10, 2025, complements Vertex AI Search by providing programmable RAG primitives: configurable parsing and chunking, a choice of vector backends (Vertex AI Vector Search, Pinecone, Weaviate), and direct integration with Gemini, Claude, and Llama generation models.^[17]

Gemini Code Assist

Gemini Code Assist is Google Cloud's AI coding assistant, available in free, Standard, and Enterprise editions. It evolved from the earlier Duet AI for Developers brand, which transitioned to Gemini Code Assist in 2024; Duet AI for Developers was available at no cost on a one-user-per-billing-account basis until May 11, 2024.^[34] Code Assist Standard provides AI coding assistance with enterprise security controls, while the Enterprise edition supports private source code customization and deep Google Cloud integration.^[34]

Third-Party Model Access

A defining feature of Vertex AI relative to its early MLOps positioning is its catalog of third-party models served through Google Cloud infrastructure. The following table summarizes Vertex AI's notable third-party Model-as-a-Service offerings as of mid-2026.

Provider	Representative models on Vertex AI	First Vertex availability
Anthropic	Claude 2; Claude 3 Sonnet, Haiku, Opus; Claude 3.5 Sonnet; Claude 3.7 Sonnet; Claude Sonnet 4; Claude Opus 4; Claude Sonnet 4.5	Claude 2 pre-announced at Cloud Next 2023; Claude 3 Sonnet and Haiku GA March 20, 2024 ^[5]^[11]
Meta	Llama 2; Llama 3; Llama 3.1 (including 405B); Llama 4	Llama 2 at Cloud Next 2023; Llama 3.1 family added July 24, 2024 ^[11]^[28]
Mistral AI	Mixtral 8x7B and managed Mistral variants	Available via MaaS by 2024 ^[28]^[11]
AI21 Labs	Jamba 1.5	Available via MaaS ^[28]
Cohere	Command series	Available via Model Garden ^[28]
Falcon (TII)	Falcon LLM	Added at Cloud Next 2023 ^[11]

To support cross-cloud Anthropic deployments, Vertex AI introduced multi-region endpoints for Claude available in US and EU geographies for resilience while respecting data residency, and a global endpoint that dynamically routes Claude requests across regions for capacity, separate from regional quotas and without data residency guarantees.^[27]

Pricing Model

How is Vertex AI priced?

Vertex AI uses a multi-dimensional pricing model that depends on which product surface is being consumed.^[35]^[36]

Foundation model inference is priced per token (typically per 1 million input tokens and per 1 million output tokens) for generative APIs. As of early 2026, Gemini 2.5 Flash-Lite was listed at $0.10 per million input tokens and $0.40 per million output tokens, Gemini 2.5 Flash at $0.30/$2.50, and Gemini 2.5 Pro at $1.25/$10.00 for prompts up to 200,000 tokens, doubling above that threshold. Gemini 3 Flash for enterprises was listed at $0.50/$3.00, and Gemini 3.1 Pro at $2.00/$12.00 (with the same long-context doubling).^[36]
Custom training is billed per node-hour by machine type, with separate pricing tiers for CPU, GPU, and Cloud TPU configurations.^[35]
Supervised fine-tuning for foundation models is billed per training token; AutoML image training was published at $3.465 per hour, with deployed AutoML image endpoints at $1.375 per hour.^[35]
Online prediction for custom models is billed per node-hour of deployed compute; batch prediction is billed per job.^[35]
Agent Builder and Vertex AI Search carry separate per-query and per-document pricing, and several products such as Vertex AI Studio and Agent Builder include an Express Mode for evaluation without enabling billing.^[36]

For embedding and grounding APIs, Google Cloud transitioned to computation-based metrics on April 14, 2025, billing $0.00003 per 1,000 characters of input and $0.00009 per 1,000 characters of output for affected services.^[36]

Security, Compliance, and Multi-Region

Vertex AI integrates with Google Cloud's broader security stack. Identity and access management is handled through Google Cloud IAM, with predefined roles such as Vertex AI User and Vertex AI Admin and the option for custom roles with conditional IAM bindings.^[37] Data at rest is encrypted by Google-managed keys by default, with Customer-Managed Encryption Keys (CMEK) available via Cloud KMS for training datasets, models, pipeline artifacts, and Gemini model resources; data in transit is encrypted with TLS 1.2 or higher.^[37]

VPC Service Controls allow administrators to create a service perimeter around Vertex AI resources to mitigate data exfiltration, protecting training data, models, online inference requests, batch inference results, and Gemini model traffic.^[37] Cloud Logging captures Vertex AI audit events including Data Access logs for Data Read and Data Write operations, which are typically required for regulated workloads such as HIPAA-covered healthcare deployments.^[37]

Generative AI on Vertex AI supports CMEK, VPC Service Controls, and data residency.^[37] Regional deployment is supported across more than thirty Google Cloud regions; for Anthropic Claude models, the platform also offers multi-region endpoints in the United States and European Union for cross-region failover within a single residency geography, and a global endpoint with no residency guarantees but dynamic capacity routing.^[27]

Adoption and Case Studies

By Google Cloud Next 2023, Google Cloud reported that Vertex AI customer accounts had grown 15-fold quarter over quarter and named customers including GE Appliances, Typeface, GitLab, Omnicom, Workiva, and Connected Stories among its early generative AI adopters.^[11]

Wayfair has documented multiple Vertex AI deployments. In January 2025 the retailer expanded its Google Cloud partnership to apply Gemini on Vertex AI across its catalog of roughly 30 million product listings, reporting a 67% reduction in the time required to curate new and updated product listings and a 5x faster update cadence for product attributes.^[38]^[44] Wayfair's Supply Chain Science team migrated to Vertex AI Pipelines, Hyperparameter Tuning, and Experiments for its production ML stack, reducing the time to bring a new real-time model from approximately one month to about one hour by leveraging automated processes and a streamlined deployment architecture.^[39]

In healthcare, HCA Healthcare and Augmedix used MedLM on Vertex AI to convert ambient clinician-patient conversations into electronic health record drafts; BenchSci used MedLM to accelerate biomarker discovery across more than 100 million experiments; and Accenture and Deloitte used MedLM for claims processing and provider directory chatbot deployments.^[13] L'Oreal subsidiary ModiFace used Vertex AI for virtual try-on and skin diagnostic computer vision applications from launch.^[1]

Competitive Position

How does Vertex AI differ from Amazon Bedrock and Azure OpenAI?

Vertex AI competes with several other managed AI platforms for enterprise foundation-model workloads. Independent analysis generally frames Amazon Bedrock as offering the widest third-party catalog under a single API at the lowest cost for high-volume committed usage, Azure OpenAI as strongest for Microsoft-centric and regulated environments, and Vertex AI as strongest for Google Cloud-native data workloads, very-long-context inference, and integrated multimodal media generation.^[40]^[41]

Platform	Operator	Primary first-party models	Notable third-party MaaS	Distinct features
Vertex AI / Gemini Enterprise Agent Platform	Google Cloud	Gemini, Imagen, Veo, Lyria, MedLM, Gemma	Claude, Llama, Mistral, AI21, Cohere	Long-context Gemini (1M+ tokens), tight BigQuery integration, native multimodal media generation ^[40]^[22]
Amazon Bedrock	Amazon Web Services	Titan, Nova	Claude (Anthropic), Llama, Mistral, Cohere, Stability, AI21	Broad partner catalog, deep AWS service integration, Savings Plans ^[40]
Azure OpenAI Service	Microsoft	OpenAI GPT-4o, GPT-4o-mini, o-series reasoning models	OpenAI models exclusively in this surface (Azure AI Foundry handles wider catalog)	Microsoft 365 Copilot integration, regulated-industry certifications ^[40]
Anthropic API (and Claude on AWS)	Anthropic	Claude family	(First-party only)	Direct from model vendor; reference deployment for Claude features ^[14]

Independent industry analysis has repeatedly framed the three hyperscaler platforms as not strictly interchangeable. Reviewers note that Bedrock provides the widest third-party catalog under a single API and is generally the lowest cost for high-volume usage with committed spend, that Azure OpenAI wins for Microsoft-centric and regulated environments, and that Vertex AI is strongest for Google Cloud-native data workloads, very-long-context inference (the Gemini 1.5 Pro 2-million-token window and the Gemini 3 1-million-token window), and integrated multimodal generation across text, image, video, and music.^[40]^[41]

Beyond the hyperscaler trio, Vertex AI also overlaps with offerings such as the OpenAI Responses API for stateful generative pipelines, as well as with Microsoft's broader Azure AI Foundry (the post-2024 evolution of Azure ML and Azure AI Studio) which catalogs third-party models in a manner analogous to Model Garden.

Limitations and Criticisms

Reviewer feedback and analyst reports identify several recurring criticisms of Vertex AI.^[42]^[43]

Pricing complexity is the most frequently cited issue. The multi-dimensional billing model (training hours, prediction nodes, AutoML usage, storage, foundation-model tokens, agent and search per-query charges) makes accurate cost estimation difficult, and the absence of a hard ceiling on spend has been noted as a source of operational anxiety for new teams.^[42]^[36]

Learning curve and tooling depth are also commonly noted. The platform's breadth (Workbench, Pipelines, Feature Store, Model Registry, Model Monitoring, Agent Builder, Search, Studio, RAG Engine, ADK) can feel overwhelming to first-time users, and onboarding often requires existing Google Cloud expertise.^[42]^[43] Reviewers have observed that not all components are equally mature, with newer features occasionally undergoing API changes that disrupt downstream workflows, and that AutoML capabilities offer less granular control than fully custom training pipelines.^[42]

Operational concerns include slow lifecycle operations (instance start, stop, restart) for managed notebooks and occasional regional capacity exhaustion for specific accelerator types.^[42] Lock-in is a concern raised in independent reviews: as workflows accumulate dependencies on Vertex AI's MLOps services and proprietary connectors, migration to other providers becomes harder despite the use of open SDKs such as Kubeflow Pipelines.^[42]

The 2026 rebrand to Gemini Enterprise Agent Platform introduced a further source of friction: existing customers face a June 24, 2026 deadline to migrate from certain deprecated Vertex AI SDK modules.^[7]

Vertex AI in the Generative AI Stack

Within Google Cloud's product line, Vertex AI sits adjacent to several related surfaces. The consumer-facing Gemini app and the developer-focused Google AI Studio expose Gemini models for individual end users and rapid prototyping respectively, while Vertex AI targets enterprise application development with managed deployment, monitoring, security perimeters, and committed-use pricing. NotebookLM (Google's notebook reasoning product) and Gemini Enterprise (the business productivity surface that absorbed Agentspace at the 2026 rebrand) are end-user applications layered on the same model substrate.^[7]

Inside Vertex AI itself, the Agent2Agent (A2A) protocol and the Model Context Protocol integration in ADK enable interoperability with agent systems hosted on other platforms. ADK's LangChain and LlamaIndex integrations let developers compose Vertex-hosted agents with components developed against the broader open-source ecosystem.^[18] A2A, which Google contributed and later donated to the Linux Foundation's Agentic AI Foundation, reached "more than 150 organizations supporting the standard, deep integration across Google, Microsoft and AWS platforms, and active production deployments across multiple industries" by its first-anniversary milestone in April 2026.^[45] Rao Surapaneni, a vice president and general manager in Google Cloud's Business Applications Platform, said of the protocol that "this momentum has quickly moved the project into production-ready use."^[45]

Significance

As a platform, Vertex AI marks Google Cloud's transition from a portfolio of disjoint ML services to a unified enterprise Large Language Model and machine learning system. Its consolidation of MLOps primitives with a foundation-model catalog and a generative agent runtime mirrors a pattern adopted by all major cloud providers between 2023 and 2026, with Vertex AI's long-context Gemini variants, integrated multimodal generation, and tight coupling to BigQuery serving as its primary points of differentiation.^[40] By the time of its 2026 rebrand to Gemini Enterprise Agent Platform, Vertex AI had grown from an MLOps console for traditional ML practitioners into the centerpiece of Google Cloud's enterprise AI strategy.^[7]^[8]

References

Craig Wiley and Henry Tappen, "Google Cloud Unveils Vertex AI, One Platform, Every ML Tool You Need", Google Cloud Blog, 2021-05-18. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-vertex-ai-unified-platform-for-mlops. Accessed 2026-05-20. ↩
Frederic Lardinois, "Google Cloud launches Vertex AI, a new managed machine learning platform", TechCrunch, 2021-05-18. https://techcrunch.com/2021/05/18/google-cloud-launches-vertex-a-new-managed-machine-learning-platform/. Accessed 2026-05-20. ↩
Google Cloud, "Vertex AI Platform", Google Cloud product page, 2026. https://cloud.google.com/vertex-ai. Accessed 2026-05-20. ↩
Google Cloud Documentation, "Vertex AI overview and components", Google Cloud, 2026. https://docs.cloud.google.com/vertex-ai/docs/release-notes. Accessed 2026-05-20. ↩
Warren Barkley, "Anthropic's Claude 3 models go GA on Vertex AI", Google Cloud Blog, 2024-03-20. https://cloud.google.com/blog/products/ai-machine-learning/anthropics-claude-3-models-go-ga-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud Documentation, "Vertex AI partner models for MaaS", Google Cloud, 2026. https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-partner-models. Accessed 2026-05-20. ↩
Suresh Kumar Ariya Gowder, "Vertex AI Is Dead. Long Live Gemini Enterprise Agent Platform.", Medium (Think in AI Agents), 2026-04-23. https://medium.com/system-design-mastery-series/vertex-ai-is-dead-long-live-gemini-enterprise-agent-platform-15e44986ca20. Accessed 2026-05-20. ↩
Google Cloud, "Gemini Enterprise Agent Platform (formerly Vertex AI)", Google Cloud product page, 2026. https://cloud.google.com/products/gemini-enterprise-agent-platform. Accessed 2026-05-20. ↩
Google Cloud, "Google Cloud Launches Vertex AI, Making Machine Learning More Accessible and Useful For Developers and Businesses", Google Cloud Press Corner, 2021-05-18. https://www.googlecloudpresscorner.com/2021-05-18-Google-Cloud-Launches-Vertex-AI,-Making-Machine-Learning-More-Accessible-and-Useful-For-Developers-and-Businesses. Accessed 2026-05-20. ↩
June Yang, "Generative AI support on Vertex AI is now generally available", Google Cloud Blog, 2023-06-07. https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-support-on-vertexai. Accessed 2026-05-20. ↩
Google Cloud, "Vertex AI: Next 2023 announcements", Google Cloud Blog, 2023-08-29. https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-next-2023-announcements. Accessed 2026-05-20. ↩
Google Cloud, "Gemini support on Vertex AI", Google Cloud Blog, 2023-12-13. https://cloud.google.com/blog/products/ai-machine-learning/gemini-support-on-vertex-ai. Accessed 2026-05-20. ↩
Aashima Gupta and Amy Waldron, "Introducing MedLM for the healthcare industry", Google Cloud Blog, 2023-12-13. https://cloud.google.com/blog/topics/healthcare-life-sciences/introducing-medlm-for-the-healthcare-industry. Accessed 2026-05-20. ↩
Anthropic, "Claude on Vertex AI", Claude API Documentation, 2026. https://platform.claude.com/docs/en/build-with-claude/claude-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud, "Anthropic's Claude Opus 4 and Claude Sonnet 4 on Vertex AI", Google Cloud Blog, 2025-05-22. https://cloud.google.com/blog/products/ai-machine-learning/anthropics-claude-opus-4-and-claude-sonnet-4-on-vertex-ai. Accessed 2026-05-20. ↩
Frederic Lardinois, "With Vertex AI Agent Builder, Google Cloud aims to simplify agent creation", TechCrunch, 2024-04-09. https://techcrunch.com/2024/04/09/with-vertex-ai-agent-builder-google-cloud-aims-to-simplify-agent-creation/. Accessed 2026-05-20. ↩
Google Cloud, "Vertex AI RAG Engine: Build and deploy RAG implementations with your data", Google Cloud Blog, 2025-01-10. https://cloud.google.com/blog/products/ai-machine-learning/introducing-vertex-ai-rag-engine. Accessed 2026-05-20. ↩
Google Developers Blog, "Agent Development Kit: Making it easy to build multi-agent applications", Google Developers Blog, 2025-04-09. https://developers.googleblog.com/en/agent-development-kit-easy-to-build-multi-agent-applications/. Accessed 2026-05-20. ↩
Google Cloud, "More ways to build and scale AI agents with Vertex AI Agent Builder", Google Cloud Blog, 2025. https://cloud.google.com/blog/products/ai-machine-learning/more-ways-to-build-and-scale-ai-agents-with-vertex-ai-agent-builder. Accessed 2026-05-20. ↩
Google Cloud, "Announcing Veo 3, Imagen 4, and Lyria 2 on Vertex AI", Google Cloud Blog, 2025-05-21. https://cloud.google.com/blog/products/ai-machine-learning/announcing-veo-3-imagen-4-and-lyria-2-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud, "Lyria 3 and Lyria 3 Pro on Vertex AI", Google Cloud Blog, 2026-04-08. https://cloud.google.com/blog/products/ai-machine-learning/lyria-3-and-lyria-3-pro-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud, "Gemini 3 is available for enterprise", Google Cloud Blog, 2025-11-19. https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-is-available-for-enterprise. Accessed 2026-05-20. ↩
Google Cloud Documentation, "Gemini 3.1 Pro", Gemini Enterprise Agent Platform documentation, 2026. https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-1-pro. Accessed 2026-05-20. ↩
Google Cloud, "Schedule and execute notebooks with Vertex AI Workbench", Google Cloud Blog, 2021-11. https://cloud.google.com/blog/products/ai-machine-learning/schedule-and-execute-notebooks-with-vertex-ai-workbench. Accessed 2026-05-20. ↩
Google Cloud Documentation, "Build a pipeline", Vertex AI Pipelines documentation, 2026. https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline. Accessed 2026-05-20. ↩
Google Cloud Documentation, "Monitor feature skew and drift", Vertex AI Model Monitoring documentation, 2026. https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring. Accessed 2026-05-20. ↩
Google Cloud, "Global endpoint for Claude models generally available on Vertex AI", Google Cloud Blog, 2025. https://cloud.google.com/blog/products/ai-machine-learning/global-endpoint-for-claude-models-generally-available-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud, "Model Garden on Gemini Enterprise Agent Platform", Google Cloud product page, 2026. https://cloud.google.com/model-garden. Accessed 2026-05-20. ↩
Google Cloud, "More choice, more control: self-deploy proprietary models in your VPC with Vertex AI", Google Cloud Blog, 2025. https://cloud.google.com/blog/products/ai-machine-learning/new-proprietary-models-vertex-model-garden. Accessed 2026-05-20. ↩
Google Cloud, "Gemini 2.5 on Vertex AI: Pro, Flash and Model Optimizer Live", Google Cloud Blog, 2025-06. https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-pro-flash-on-vertex-ai. Accessed 2026-05-20. ↩
Google Cloud, "Generative AI on Vertex AI documentation", Google Cloud Documentation, 2026. https://docs.cloud.google.com/vertex-ai/generative-ai/docs. Accessed 2026-05-20. ↩
Google Cloud, "Agent Search on Gemini Enterprise Agent Platform (formerly Vertex AI Search)", Google Cloud product page, 2026. https://cloud.google.com/enterprise-search. Accessed 2026-05-20. ↩
Google Cloud, "Google Cloud Launches General Availability of Vertex AI Search for Healthcare and Healthcare Data Engine", PR Newswire, 2024-10. https://www.prnewswire.com/news-releases/google-cloud-launches-general-availability-of-vertex-ai-search-for-healthcare-and-healthcare-data-engine-302278827.html. Accessed 2026-05-20. ↩
Google for Developers, "Gemini Code Assist overview", Google for Developers, 2026. https://developers.google.com/gemini-code-assist/docs/overview. Accessed 2026-05-20. ↩
Google Cloud, "Gemini Enterprise Agent Platform pricing", Google Cloud, 2026. https://cloud.google.com/vertex-ai/pricing. Accessed 2026-05-20. ↩
CloudZero, "Google Vertex AI Pricing: Complete Enterprise Guide (2026)", CloudZero Blog, 2026. https://www.cloudzero.com/blog/google-vertex-ai-pricing/. Accessed 2026-05-20. ↩
Google Cloud Documentation, "VPC Service Controls with Vertex AI", Google Cloud, 2026. https://docs.cloud.google.com/vertex-ai/docs/general/vpc-service-controls. Accessed 2026-05-20. ↩
Google Cloud, "Wayfair case study: catalog enrichment with Gemini on Vertex AI", Google Cloud Customers, 2024. https://cloud.google.com/customers/wayfair-ai. Accessed 2026-05-20. ↩
Google Cloud, "How Vertex AI helps Wayfair achieve real-time model serving", Google Cloud Blog, 2024. https://cloud.google.com/blog/products/ai-machine-learning/how-vertex-ai-helps-wayfair-achieve-real-time-model-serving/. Accessed 2026-05-20. ↩
Internative, "Enterprise AI Platform: Vertex vs Bedrock vs Foundry (2026)", Internative Insights, 2026. https://internative.net/insights/blog/enterprise-ai-platform-comparison-vertex-bedrock-foundry-2026. Accessed 2026-05-20. ↩
CloudOptimo, "Amazon Bedrock vs Azure OpenAI vs Google Vertex AI: An In-Depth Analysis", CloudOptimo Blog, 2026. https://www.cloudoptimo.com/blog/amazon-bedrock-vs-azure-openai-vs-google-vertex-ai-an-in-depth-analysis/. Accessed 2026-05-20. ↩
The CTO Club, "Vertex AI Review 2026: Pros, Cons, Features, and Pricing", The CTO Club, 2026. https://thectoclub.com/tools/vertex-ai-review/. Accessed 2026-05-20. ↩
G2, "Vertex AI Reviews 2026: Details, Pricing, and Features", G2 Crowd, 2026. https://www.g2.com/products/google-vertex-ai/reviews. Accessed 2026-05-20. ↩
Wayfair and Google Cloud, "Wayfair and Google Cloud Announce Expanded Partnership to Transform Online Retail with Gemini", Google Cloud Press Corner, 2025-01-12. https://www.googlecloudpresscorner.com/2025-01-12-Wayfair-and-Google-Cloud-Announce-Expanded-Partnership-to-Transform-Online-Retail-with-Gemini. Accessed 2026-06-20. ↩
The Linux Foundation, "A2A Protocol Surpasses 150 Organizations, Lands in Major Cloud Platforms, and Sees Enterprise Production Use in First Year", Linux Foundation Press Release, 2026-04-09. https://www.linuxfoundation.org/press/a2a-protocol-surpasses-150-organizations-lands-in-major-cloud-platforms-and-sees-enterprise-production-use-in-first-year. Accessed 2026-06-20. ↩
Ivan Mehta, "Google Cloud Next 2026: AI agents, A2A protocol, Workspace Studio, and the full-stack bet against OpenAI and Anthropic", The Next Web, 2026-04-22. https://thenextweb.com/news/google-cloud-next-ai-agents-agentic-era. Accessed 2026-06-20. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributor · full history

Suggest edit