Google Vertex AI
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 5,153 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 5,153 words
Add missing citations, update stale details, or suggest a clearer explanation.
Google Vertex AI is the unified machine learning and generative artificial intelligence platform offered by Google Cloud, announced at Google I/O on May 18, 2021 as a consolidation of the legacy AI Platform with the AutoML product family.[^1][^2] The platform spans the full machine learning lifecycle, including managed notebooks, custom training, MLOps tooling such as pipelines and a feature store, online and batch prediction, and a generative AI stack centered on Model Garden, Vertex AI Studio, and the Agent Builder.[^3][^4] It serves as the primary surface for enterprise access to Google's first-party Gemini, Imagen, Veo, and Lyria foundation models, as well as a managed Model-as-a-Service catalog of third-party models including Anthropic's Claude family, Meta's Llama models, and Mistral AI offerings.[^5][^6] Vertex AI competes directly with Amazon Bedrock and Microsoft's Azure OpenAI Service, and at Google Cloud Next 2026 was rebranded as the Gemini Enterprise Agent Platform while preserving its underlying APIs and SDKs.[^7][^8]
| Attribute | Value |
|---|---|
| Type | Managed ML and generative AI platform |
| Operator | Google Cloud |
| Announced | May 18, 2021 (Google I/O) |
| Predecessor | AI Platform, AutoML |
| Foundation models | Gemini, Imagen, Veo, Lyria, MedLM, Embeddings |
| Third-party MaaS | Claude (Anthropic), Llama (Meta), Mistral, AI21, Cohere |
| Pipelines runtime | Kubeflow Pipelines SDK |
| 2026 rebrand | Gemini Enterprise Agent Platform |
Before Vertex AI, Google Cloud sold machine learning services under several disjoint brands. Cloud AI Platform offered notebooks, training jobs, and prediction endpoints aimed at custom ML engineers, while Cloud AutoML targeted users who lacked deep ML expertise by exposing point-and-click model creation for tabular data, vision, video, language, and translation use cases. The split forced enterprise teams to stitch together two distinct toolchains for adjacent problems, which Google Cloud's product management team eventually framed as a barrier to production deployment.[^1]
According to Andrew Moore, who at the time served as vice president and general manager of Cloud AI and Industry Solutions, the unifying motivation for the new platform was to "get data scientists and engineers out of the orchestration weeds, and create an industry-wide shift that would make everyone get serious about moving AI out of pilot purgatory and into full-scale production."[^1] Craig Wiley, then director of product management, described the prevailing situation in starker terms, calling enterprise machine learning "in crisis" because most companies investing in ML were "not getting value from it."[^2]
Vertex AI was announced on May 18, 2021 at Google I/O and became generally available the same day.[^9][^2] At launch, Google Cloud framed the platform as a single managed surface bringing together the prior AI Platform notebooks, training, and prediction services with the AutoML family. The marketing claim that Vertex AI required "roughly 80% fewer lines of code to train a model" compared to competitive platforms appeared in the original press release and was repeated across launch coverage, though the precise comparison method was not formally specified.[^9][^1]
The launch components fell into two groups. The first set covered MLOps primitives: Vertex Vizier for hyperparameter optimization, Vertex Feature Store as a managed feature serving system, Vertex Experiments for run tracking, Vertex Model Monitoring for production drift detection, Vertex ML Metadata for lineage tracking, Vertex Pipelines for ML workflow orchestration, and Vertex ML Edge Manager (then experimental) for edge deployment.[^1] The second set covered general-purpose modeling tools: managed notebooks, custom training jobs, AutoML for vision, language, structured data, and forecasting, plus prebuilt models for vision, language, conversation, and structured data accessible through APIs.[^9]
Launch customers cited by Google Cloud included L'Oreal subsidiary ModiFace (virtual try-on and skin diagnostics), WPP agency Essence (collaborative data science), Sabre Labs (travel personalization), and Iron Mountain (records-related ML).[^1] Implementation partners named in the announcement included Accenture and Deloitte.[^1]
Vertex AI's most significant evolution after launch was its transformation into a generative AI platform during 2023. On June 7, 2023, Google Cloud made generative AI support on Vertex AI generally available, exposing foundation models including PaLM 2, Imagen, and Codey through the new Generative AI Studio for prompt design and tuning, and through Model Garden, then a catalog of more than 60 first-party and third-party models.[^10] PaLM 2 access was extended to a 32,000-token context window and grounding capability against enterprise data sources.[^11]
At Google Cloud Next 2023 in late August 2023, Model Garden grew past 100 models with the addition of Meta's Llama 2 and Code Llama, the Technology Innovation Institute's Falcon LLM, and the pre-announced inclusion of Anthropic's Claude 2.[^11] The same announcement introduced Vertex AI Extensions for connecting models to external APIs (including BigQuery, AlloyDB, and partner databases via DataStax, MongoDB, and Redis), Vertex AI Data Connectors for read-only data ingestion, the general availability of Vertex AI Search and Conversation, SynthID watermarking for generated images via Google DeepMind, and Colab Enterprise as a managed notebook environment.[^11]
On December 13, 2023, Gemini 1.0 Pro was made available to Google Cloud customers through Vertex AI and Google AI Studio shortly after the model family's December 6 announcement, with Gemini 1.0 Ultra following on an allowlist basis in early 2024.[^12] On December 14, 2023, Google Cloud introduced MedLM, a family of foundation models fine-tuned for healthcare workflows, available in a large variant for complex tasks and a medium variant tuned for scalable adaptation to specific use cases, built on Med-PaLM 2.[^13]
Anthropic's Claude 3 Sonnet and Claude 3 Haiku models became generally available on Vertex AI on March 20, 2024, with Claude 3 Opus following in subsequent weeks.[^5] Successive Claude releases were added to the Model Garden in tandem with Anthropic's own launches, including Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude Sonnet 4 and Claude Opus 4 (announced as available on Vertex AI alongside their general availability in 2025), and Claude Sonnet 4.5.[^14][^15] Anthropic documented Vertex AI as one of three official enterprise distribution channels alongside the Anthropic API and Amazon Bedrock.[^14]
At Google Cloud Next in April 2024, Google Cloud introduced Vertex AI Agent Builder, a no-code product for assembling conversational agents on Gemini that consolidated Vertex AI Search and Vertex AI Conversation with new RAG APIs and vector search primitives.[^16] Then-Google Cloud chief executive Thomas Kurian described the goal as allowing customers to "very easily and quickly build conversational agents."[^16]
On January 10, 2025, Google made Vertex AI RAG Engine generally available as a fully managed retrieval-augmented generation backend. The engine supported parsing with configurable chunking; retrieval against Vertex AI Vector Search, Vertex AI Search, Pinecone, or Weaviate; generation against Gemini, Llama, and Claude; and data connectors for Cloud Storage, Google Drive, Jira, Slack, websites, and BigQuery.[^17]
At Google Cloud Next 2025 on April 9, 2025, the company announced the Agent Development Kit (ADK), an open-source Python framework for multi-agent system development described as the same framework used to power agents inside Google products such as Agentspace.[^18] ADK shipped with bidirectional audio and video streaming, hierarchical multi-agent orchestration, built-in evaluation, and a deployment path to Vertex AI Agent Engine. The framework supported models from Gemini, the Vertex AI Model Garden, and third-party providers including Anthropic, Meta, Mistral, and AI21 via LiteLLM.[^18] By a subsequent Google Cloud blog update, the Python ADK had been downloaded more than 7 million times.[^19]
On May 21, 2025, Google Cloud announced the next generation of generative media models on Vertex AI: Imagen 4 (text-to-image, public preview), Veo 3 (text-to-video with native speech and audio, private preview at launch), and Lyria 2 (text-to-music, generally available).[^20] All three integrated SynthID watermarking and configurable safety filters and were accessible through the Vertex AI Media Studio console and Vertex AI API.[^20] On April 8, 2026, Lyria 3 and Lyria 3 Pro arrived, with Lyria 3 Pro supporting compositions up to three minutes with structural elements such as intros, verses, choruses, and bridges, and vocal generation with timed lyrics.[^21]
On November 19, 2025, Google Cloud announced that Gemini 3 was available in Gemini Enterprise and on Vertex AI in preview for developers. Gemini 3 reported a top score of 1501 Elo on LMArena, a one-million-token context window, and improved tool-calling accuracy that Geotab reported reduced its tool-calling mistakes by roughly 30%.[^22] Access channels included Vertex AI, Google Antigravity, the Gemini CLI, and third-party integrations such as Cursor, GitHub Copilot, JetBrains, Figma, and Replit.[^22] As of March 26, 2026, the initial gemini-3-pro-preview alias was retired in favor of gemini-3.1-pro-preview.[^23]
At Google Cloud Next 2026, held April 22, 2026 in Las Vegas, Google Cloud retired the Vertex AI brand and rebranded the platform as Gemini Enterprise Agent Platform, absorbing the Agentspace product into a unified Gemini Enterprise offering.[^7][^8] The rebrand was described as architectural rather than purely cosmetic: existing Vertex AI workloads continued to run unchanged under the new namespace, with SDKs, billing, and APIs migrated forward, while a June 24, 2026 deadline applied to migration away from certain deprecated SDK modules.[^7] Many engineering teams and third-party platforms continued to refer to the platform by its Vertex AI name through 2026, and Google Cloud's documentation retained "Vertex AI" in product page headings such as "Gemini Enterprise Agent Platform (formerly Vertex AI)."[^8]
Workbench is Vertex AI's managed JupyterLab notebook environment, available in user-managed and fully managed variants. Notebook instances integrate with Cloud Storage, BigQuery, and the rest of Google Cloud's data stack, and a notebook executor announced in November 2021 allowed teams to schedule notebooks for ad hoc or recurring execution.[^24] Workbench was joined in 2023 by Colab Enterprise, a managed Colab variant launched at Google Cloud Next 2023 for organizations preferring the Colab interface.[^11]
Custom Training provides managed compute for arbitrary training scripts, with built-in support for PyTorch, TensorFlow, JAX, and scikit-learn. Jobs can scale across CPUs, GPUs, and Cloud TPU pods. The platform offers managed hyperparameter tuning (originally branded Vertex Vizier and based on Google's internal Vizier optimizer) and distributed training primitives.[^9][^4]
Vertex AI Pipelines is a serverless orchestrator for ML workflows. It executes pipelines authored against the open-source Kubeflow Pipelines SDK and the TensorFlow Extended (TFX) SDK, removing the need to operate Kubeflow clusters directly while preserving SDK portability.[^25] Pipelines are typically used to chain preprocessing, training, evaluation, and deployment components, and integrate natively with Model Registry, Model Monitoring, and Feature Store.
The Feature Store manages features for both training and online serving. It uses BigQuery as the offline store and optimized online nodes for low-latency lookups, exposing a single feature definition for both training and serving to prevent training-serving skew.[^4]
Vertex AI Model Registry is a searchable repository for model versions and metadata. It tracks lineage from training runs, supports model versioning and aliasing, and allows direct deployment from a registered version to an endpoint.[^4]
Vertex AI Model Monitoring detects distributional drift and training-serving skew on production endpoints. It supports both feature distribution monitoring and feature attribution monitoring, which tracks how each input feature's contribution to predictions changes over time using Vertex Explainable AI. For AutoML tabular models, explainability is automatically configured, so users can enable skew and drift detection with a single gcloud command and configure per-feature thresholds.[^26]
The prediction service supports two modes. Online prediction serves models behind low-latency HTTPS endpoints with autoscaling and traffic splitting between model versions. Batch prediction processes large input files asynchronously against managed compute, billed per job. Endpoints can be public or private (using Private Service Connect), regional or multi-region.[^4][^27]
Vertex AI Studio (originally launched as Generative AI Studio in June 2023) is a console workspace for prompt design, prompt versioning, evaluation, and supervised fine-tuning across the supported foundation models.[^10] The Studio includes Media Studio sub-experiences for image, video, and music generation against Imagen, Veo, and Lyria, alongside text-oriented chat and completion playgrounds.[^20]
Model Garden is the curated catalog of foundation and open models accessible from Vertex AI. At launch in June 2023 it contained more than 60 models; by late 2025 and into 2026 it cataloged more than 200 enterprise-ready models spanning Google's own Gemini, Imagen 3, Imagen 4, Veo, Veo 3, Lyria, Gemma, and MedLM; open models such as Llama and Gemma; and third-party proprietary models including Anthropic's Claude, Mistral variants, AI21 Labs Jamba, and Cohere Command models.[^28][^11]
Models are made available through several deployment patterns. Some, including Gemini and most third-party proprietary models, are Model-as-a-Service (MaaS) APIs that customers consume without provisioning infrastructure. Others, such as many Llama variants and certain Mistral models, are served through customer-managed virtual machine deployment from Model Garden templates. Beginning in 2025, Vertex AI added a pattern for "self-deploying" partner proprietary models inside customers' own VPCs, supporting AI21 Labs, CAMB.AI, Contextual AI, CSM, Mistral, Qodo, Virtue AI, and WRITER.[^29]
Vertex AI exposes generative APIs for several model families:
Vertex AI Agent Builder (launched April 2024) provides no-code and low-code construction of conversational agents on Gemini, with managed retrieval and grounding via Vertex AI Search.[^16] Agent Engine is the managed runtime for production agents, providing autoscaling, observability, and integration with more than 100 prebuilt connectors and enterprise data systems.[^18] The Agent Development Kit (April 2025) gave developers an open-source Python framework that can deploy to Agent Engine or to other container runtimes, with model-agnostic support across Vertex AI Model Garden and external providers via LiteLLM.[^18][^19]
Vertex AI Search (formerly Enterprise Search, previously sold within "Vertex AI Search and Conversation") is a managed search-and-retrieval service that abstracts the ingestion, OCR, chunking, embedding, indexing, and access control layers of a search system, and it is positioned as a turnkey backend for retrieval-augmented generation pipelines.[^32] Industry-tuned variants exist for retail commerce, media, and healthcare and life sciences, including Vertex AI Search for Healthcare which reached general availability in 2024.[^32][^33]
The Vertex AI RAG Engine, announced as generally available January 10, 2025, complements Vertex AI Search by providing programmable RAG primitives: configurable parsing and chunking, a choice of vector backends (Vertex AI Vector Search, Pinecone, Weaviate), and direct integration with Gemini, Claude, and Llama generation models.[^17]
Gemini Code Assist is Google Cloud's AI coding assistant, available in free, Standard, and Enterprise editions. It evolved from the earlier Duet AI for Developers brand, which transitioned to Gemini Code Assist in 2024; Duet AI for Developers was available at no cost on a one-user-per-billing-account basis until May 11, 2024.[^34] Code Assist Standard provides AI coding assistance with enterprise security controls, while the Enterprise edition supports private source code customization and deep Google Cloud integration.[^34]
A defining feature of Vertex AI relative to its early MLOps positioning is its catalog of third-party models served through Google Cloud infrastructure. The following table summarizes Vertex AI's notable third-party Model-as-a-Service offerings as of mid-2026.
| Provider | Representative models on Vertex AI | First Vertex availability |
|---|---|---|
| Anthropic | Claude 2; Claude 3 Sonnet, Haiku, Opus; Claude 3.5 Sonnet; Claude 3.7 Sonnet; Claude Sonnet 4; Claude Opus 4; Claude Sonnet 4.5 | Claude 2 pre-announced at Cloud Next 2023; Claude 3 Sonnet and Haiku GA March 20, 2024 [^5][^11] |
| Meta | Llama 2; Llama 3; Llama 3.1 (including 405B); Llama 4 | Llama 2 at Cloud Next 2023; Llama 3.1 family added July 24, 2024 [^11][^28] |
| Mistral AI | Mixtral 8x7B and managed Mistral variants | Available via MaaS by 2024 [^28][^11] |
| AI21 Labs | Jamba 1.5 | Available via MaaS [^28] |
| Cohere | Command series | Available via Model Garden [^28] |
| Falcon (TII) | Falcon LLM | Added at Cloud Next 2023 [^11] |
To support cross-cloud Anthropic deployments, Vertex AI introduced multi-region endpoints for Claude available in US and EU geographies for resilience while respecting data residency, and a global endpoint that dynamically routes Claude requests across regions for capacity, separate from regional quotas and without data residency guarantees.[^27]
Vertex AI uses a multi-dimensional pricing model that depends on which product surface is being consumed.[^35][^36]
For embedding and grounding APIs, Google Cloud transitioned to computation-based metrics on April 14, 2025, billing $0.00003 per 1,000 characters of input and $0.00009 per 1,000 characters of output for affected services.[^36]
Vertex AI integrates with Google Cloud's broader security stack. Identity and access management is handled through Google Cloud IAM, with predefined roles such as Vertex AI User and Vertex AI Admin and the option for custom roles with conditional IAM bindings.[^37] Data at rest is encrypted by Google-managed keys by default, with Customer-Managed Encryption Keys (CMEK) available via Cloud KMS for training datasets, models, pipeline artifacts, and Gemini model resources; data in transit is encrypted with TLS 1.2 or higher.[^37]
VPC Service Controls allow administrators to create a service perimeter around Vertex AI resources to mitigate data exfiltration, protecting training data, models, online inference requests, batch inference results, and Gemini model traffic.[^37] Cloud Logging captures Vertex AI audit events including Data Access logs for Data Read and Data Write operations, which are typically required for regulated workloads such as HIPAA-covered healthcare deployments.[^37]
Generative AI on Vertex AI supports CMEK, VPC Service Controls, and data residency.[^37] Regional deployment is supported across more than thirty Google Cloud regions; for Anthropic Claude models, the platform also offers multi-region endpoints in the United States and European Union for cross-region failover within a single residency geography, and a global endpoint with no residency guarantees but dynamic capacity routing.[^27]
By Google Cloud Next 2023, Google Cloud reported that Vertex AI customer accounts had grown 15-fold quarter over quarter and named customers including GE Appliances, Typeface, GitLab, Omnicom, Workiva, and Connected Stories among its early generative AI adopters.[^11]
Wayfair has documented multiple Vertex AI deployments. The retailer's product catalog team used Gemini on Vertex AI to automate catalog enrichment, reporting a 67% reduction in the time required to curate new and updated product listings and a 5x faster update cadence for product attributes.[^38] Wayfair's Supply Chain Science team migrated to Vertex AI Pipelines, Hyperparameter Tuning, and Experiments for its production ML stack, reducing the time to bring a new real-time model from approximately one month to about one hour by leveraging automated processes and a streamlined deployment architecture.[^39]
In healthcare, HCA Healthcare and Augmedix used MedLM on Vertex AI to convert ambient clinician-patient conversations into electronic health record drafts; BenchSci used MedLM to accelerate biomarker discovery across more than 100 million experiments; and Accenture and Deloitte used MedLM for claims processing and provider directory chatbot deployments.[^13] L'Oreal subsidiary ModiFace used Vertex AI for virtual try-on and skin diagnostic computer vision applications from launch.[^1]
Vertex AI competes with several other managed AI platforms for enterprise foundation-model workloads.[^40][^41]
| Platform | Operator | Primary first-party models | Notable third-party MaaS | Distinct features |
|---|---|---|---|---|
| Vertex AI / Gemini Enterprise Agent Platform | Google Cloud | Gemini, Imagen, Veo, Lyria, MedLM, Gemma | Claude, Llama, Mistral, AI21, Cohere | Long-context Gemini (1M+ tokens), tight BigQuery integration, native multimodal media generation [^40][^22] |
| Amazon Bedrock | Amazon Web Services | Titan, Nova | Claude (Anthropic), Llama, Mistral, Cohere, Stability, AI21 | Broad partner catalog, deep AWS service integration, Savings Plans [^40] |
| Azure OpenAI Service | Microsoft | OpenAI GPT-4o, GPT-4o-mini, o-series reasoning models | OpenAI models exclusively in this surface (Azure AI Foundry handles wider catalog) | Microsoft 365 Copilot integration, regulated-industry certifications [^40] |
| Anthropic API (and Claude on AWS) | Anthropic | Claude family | (First-party only) | Direct from model vendor; reference deployment for Claude features [^14] |
Independent industry analysis has repeatedly framed the three hyperscaler platforms as not strictly interchangeable. Reviewers note that Bedrock provides the widest third-party catalog under a single API and is generally the lowest cost for high-volume usage with committed spend, that Azure OpenAI wins for Microsoft-centric and regulated environments, and that Vertex AI is strongest for Google Cloud-native data workloads, very-long-context inference (the Gemini 1.5 Pro 2-million-token window and the Gemini 3 1-million-token window), and integrated multimodal generation across text, image, video, and music.[^40][^41]
Beyond the hyperscaler trio, Vertex AI also overlaps with offerings such as the OpenAI Responses API for stateful generative pipelines, as well as with Microsoft's broader Azure AI Foundry (the post-2024 evolution of Azure ML and Azure AI Studio) which catalogs third-party models in a manner analogous to Model Garden.
Reviewer feedback and analyst reports identify several recurring criticisms of Vertex AI.[^42][^43]
Pricing complexity is the most frequently cited issue. The multi-dimensional billing model (training hours, prediction nodes, AutoML usage, storage, foundation-model tokens, agent and search per-query charges) makes accurate cost estimation difficult, and the absence of a hard ceiling on spend has been noted as a source of operational anxiety for new teams.[^42][^36]
Learning curve and tooling depth are also commonly noted. The platform's breadth (Workbench, Pipelines, Feature Store, Model Registry, Model Monitoring, Agent Builder, Search, Studio, RAG Engine, ADK) can feel overwhelming to first-time users, and onboarding often requires existing Google Cloud expertise.[^42][^43] Reviewers have observed that not all components are equally mature, with newer features occasionally undergoing API changes that disrupt downstream workflows, and that AutoML capabilities offer less granular control than fully custom training pipelines.[^42]
Operational concerns include slow lifecycle operations (instance start, stop, restart) for managed notebooks and occasional regional capacity exhaustion for specific accelerator types.[^42] Lock-in is a concern raised in independent reviews: as workflows accumulate dependencies on Vertex AI's MLOps services and proprietary connectors, migration to other providers becomes harder despite the use of open SDKs such as Kubeflow Pipelines.[^42]
The 2026 rebrand to Gemini Enterprise Agent Platform introduced a further source of friction: existing customers face a June 24, 2026 deadline to migrate from certain deprecated Vertex AI SDK modules.[^7]
Within Google Cloud's product line, Vertex AI sits adjacent to several related surfaces. The consumer-facing Gemini app and the developer-focused Google AI Studio expose Gemini models for individual end users and rapid prototyping respectively, while Vertex AI targets enterprise application development with managed deployment, monitoring, security perimeters, and committed-use pricing. NotebookLM (Google's notebook reasoning product) and Gemini Enterprise (the business productivity surface that absorbed Agentspace at the 2026 rebrand) are end-user applications layered on the same model substrate.[^7]
Inside Vertex AI itself, the Agent2Agent (A2A) protocol and the Model Context Protocol integration in ADK enable interoperability with agent systems hosted on other platforms. ADK's LangChain and LlamaIndex integrations let developers compose Vertex-hosted agents with components developed against the broader open-source ecosystem.[^18]
As a platform, Vertex AI marks Google Cloud's transition from a portfolio of disjoint ML services to a unified enterprise Large Language Model and machine learning system. Its consolidation of MLOps primitives with a foundation-model catalog and a generative agent runtime mirrors a pattern adopted by all major cloud providers between 2023 and 2026, with Vertex AI's long-context Gemini variants, integrated multimodal generation, and tight coupling to BigQuery serving as its primary points of differentiation.[^40] By the time of its 2026 rebrand to Gemini Enterprise Agent Platform, Vertex AI had grown from an MLOps console for traditional ML practitioners into the centerpiece of Google Cloud's enterprise AI strategy.[^7][^8]