# Machine learning terms/Google Cloud

> Source: https://aiwiki.ai/wiki/machine_learning_terms_google_cloud
> Updated: 2026-06-21
> Categories: AI Infrastructure, Google, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms), [Vertex AI](/wiki/vertex_ai), [Cloud TPU](/wiki/cloud_tpu), [TPU](/wiki/tpu), [Google](/wiki/google), [Google Cloud](/wiki/google_cloud_terms)*

**Google Cloud** is the public cloud arm of [Google](/wiki/google) and one of the three dominant providers of [machine learning](/wiki/machine_learning) infrastructure, alongside [Amazon Web Services](/wiki/aws) and [Microsoft Azure](/wiki/microsoft_azure). Its machine learning portfolio is built around three layers: a managed ML and generative AI service called [Vertex AI](/wiki/vertex_ai), a custom silicon family called the [Tensor Processing Unit](/wiki/tpu) (TPU), and an integrated supercomputing fabric called AI Hypercomputer that ties accelerators, networking, storage, and software into a single offering. What distinguishes the platform is that it is the same stack [Google DeepMind](/wiki/google_deepmind) and Google's own product teams use to train and serve flagship models such as [Gemini](/wiki/gemini), [Imagen](/wiki/imagen), [Veo](/wiki/veo), [Lyria](/wiki/lyria), [Chirp](/wiki/chirp), and [AlphaFold](/wiki/alphafold). Vertex AI launched as a unified platform on May 18, 2021, and Google said at launch it required roughly 80% fewer lines of code to train a model than competing platforms [9]. The TPU underneath it has shipped seven public generations since the first chips went into Google data centers in 2015 [3][10], and in October 2025 [Anthropic](/wiki/anthropic) committed to up to one million Google Cloud TPUs and "well over a gigawatt of capacity" coming online in 2026, the single largest disclosed TPU deal to date [11].

This page is the gateway hub for Google Cloud machine learning entries on the AI Wiki. It introduces the major products, surveys the TPU generations and their pricing, compares the platform with [AWS](/wiki/aws) and [Azure](/wiki/microsoft_azure) equivalents, and provides a curated index of every Google Cloud term with its own dedicated wiki page.

## what is google cloud machine learning?

Google Cloud's ML offerings span the full lifecycle of [artificial intelligence](/wiki/artificial_intelligence) work: data ingestion, feature engineering, custom training, [hyperparameter tuning](/wiki/hyperparameter_tuning), serving, monitoring, and the building of [generative AI](/wiki/generative_ai) applications and [agents](/wiki/ai_agent). Customers can either use prebuilt APIs that wrap a single capability (vision, speech, translation, document parsing) or take full control through Vertex AI's notebooks, training jobs, pipelines, and inference endpoints. Most of the same accelerators that power Google's first-party products are exposed to enterprises through Vertex AI, the [Google Kubernetes Engine](/wiki/gke) (GKE), and the [Compute Engine](/wiki/compute_engine) virtual machine surface.

The portfolio reached its current shape through a series of consolidations. In 2017 Google released [Cloud ML Engine](/wiki/cloud_ml_engine) for managed [TensorFlow](/wiki/tensorflow) training, then in 2018 added [AutoML](/wiki/automl) for no-code modeling. On May 18, 2021, at Google I/O, Google merged the two into a single managed service called Vertex AI, which it described as unifying its AutoML and Cloud AI Platform capabilities behind one API [9]. After the launch of [Bard](/wiki/bard) and [Gemini](/wiki/gemini), Vertex AI became the primary path for enterprise customers to consume frontier Google models alongside third-party models from [Anthropic](/wiki/anthropic), [Mistral AI](/wiki/mistral), and Meta. At [Google Cloud Next](/wiki/google_cloud_next) 2026, Google rebranded the surface as the Gemini Enterprise Agent Platform while continuing to ship the underlying Vertex AI services [1][2].

## vertex ai

[Vertex AI](/wiki/vertex_ai) is the unified machine learning and generative AI platform offered by Google Cloud. It consolidates data labeling, [AutoML](/wiki/automl) training, custom training, [hyperparameter tuning](/wiki/hyperparameter_tuning), pipelines, [feature stores](/wiki/feature_store), [model registries](/wiki/model_registry), online and batch prediction, model evaluation, and monitoring into a single API and console. Customers do not run the underlying training schedulers or inference endpoints themselves: they call Vertex AI services, and Google manages the queues, autoscalers, and accelerator fleets behind the scenes. At launch Google framed the design goal directly: "Vertex AI requires nearly 80% fewer lines of code to train a model versus competitive platforms, enabling data scientists and ML engineers across all levels of expertise the ability to implement Machine Learning Operations (MLOps)" [9].

The service has three broad surfaces:

| Surface | Purpose | Typical entry points |
|---|---|---|
| **Predictive AI** | Train and serve classical [supervised learning](/wiki/supervised_learning) models, [regression](/wiki/regression), [classification](/wiki/classification), forecasting, and ranking | AutoML, custom training jobs, [hyperparameter tuning](/wiki/hyperparameter_tuning), [feature store](/wiki/feature_store), batch prediction, online endpoints |
| **Generative AI** | Use [foundation models](/wiki/foundation_model) and large language models for text, code, image, video, audio, and multimodal applications | Vertex AI Studio, Model Garden, tuning jobs, evaluation, RAG Engine, prompt management |
| **Agentic AI** | Build [agents](/wiki/ai_agent) that call tools, retrieve documents, and reason in multi-step workflows | Vertex AI Agent Builder (now Gemini Enterprise Agent Platform), Agent Development Kit, Agent Engine, Agent Starter Pack |

### automl, custom training, and the model registry

[AutoML](/wiki/automl) is Vertex AI's no-code path. Users upload a labeled tabular, image, text, or video dataset and Vertex AI runs an architecture search and hyperparameter search to produce a deployable model. Custom training, in contrast, runs arbitrary container images on managed [GPUs](/wiki/gpu) or [TPUs](/wiki/tpu). It supports [TensorFlow](/wiki/tensorflow), [PyTorch](/wiki/pytorch), [JAX](/wiki/jax), [scikit-learn](/wiki/scikit_learn), and [XGBoost](/wiki/xgboost) out of the box, and integrates with Vertex AI Pipelines for orchestration. The model registry is a versioned catalog of trained artifacts that ties model lineage to deployments, evaluations, and tuning jobs.

### model garden

Model Garden is Vertex AI's curated catalog of models. It exposes Google's first-party models (Gemini, [Gemma](/wiki/gemma), Imagen, Veo, Lyria, Chirp, [PaLM](/wiki/palm), [Codey](/wiki/codey)), third-party models (Anthropic's [Claude](/wiki/claude) family, Mistral models, [Llama](/wiki/llama)), and open models from [Hugging Face](/wiki/hugging_face). Each entry can be deployed to a Vertex AI endpoint, fine-tuned, or queried via the Vertex AI API, with billing and quotas handled by Google Cloud.

## gemini api on vertex

The [Gemini](/wiki/gemini) family of multimodal models is available on Google Cloud through two complementary surfaces: [Google AI Studio](/wiki/google_ai_studio) for prototyping with personal accounts, and the Vertex AI Gemini API for production use with enterprise governance, regional endpoints, [VPC Service Controls](/wiki/vpc_service_controls), and [Customer-Managed Encryption Keys](/wiki/cmek). Vertex AI was the first commercial surface for Gemini when it launched in December 2023 and remains the default channel for enterprise deployments.

The table below summarizes the main Gemini variants available on Vertex AI as of 2026.

| Model | Context window | Strengths | Typical use |
|---|---|---|---|
| [Gemini 3 Pro](/wiki/gemini_3) | 1,000,000 tokens | Most advanced reasoning, agentic workflows, code, multimodal | Complex enterprise reasoning, long-document analysis |
| [Gemini 3 Flash](/wiki/gemini_3) | 1,000,000 tokens | Lower latency, lower cost, multimodal | High-throughput consumer and customer-support apps |
| [Gemini 2.5 Pro](/wiki/gemini_2_5_pro) | 1,000,000 tokens | Adaptive thinking, deep reasoning, code | Coding agents, RAG, structured extraction |
| [Gemini 2.5 Flash](/wiki/gemini_2_5) | 1,000,000 tokens | Cheap, fast, capable enough for most tasks | Bulk inference, classification, summarization |

Vertex AI exposes Gemini through several inference modes, including standard online inference, [provisioned throughput](/wiki/provisioned_throughput) for predictable capacity, [batch prediction](/wiki/batch_prediction) for large offline jobs, and [global endpoints](/wiki/global_endpoint) that route requests to the nearest healthy region. The same models can also be tuned with [supervised fine-tuning](/wiki/fine_tuning), [reinforcement learning from human feedback](/wiki/rlhf), or [distillation](/wiki/knowledge_distillation) directly inside Vertex AI tuning jobs.

## tensor processing unit (tpu)

The [Tensor Processing Unit](/wiki/tpu) is Google's family of custom [application-specific integrated circuits](/wiki/asic) (ASICs) for [deep learning](/wiki/deep_learning) workloads. TPUs are organized around a [systolic array](/wiki/systolic_array) for matrix multiplication and use [bfloat16](/wiki/bfloat16) as their primary low-precision number format. The first TPU was deployed inside Google data centers in 2015 to serve voice search and was publicly disclosed by CEO Sundar Pichai at Google I/O on May 18, 2016, where he said the chips had been running in production for more than a year [10]. Google's 2017 paper reported the first-generation TPU delivering 15 to 30 times higher performance and 30 to 80 times higher performance per watt than contemporary CPUs and GPUs on inference [12]. Google has since shipped seven public generations, of which TPU v5e, TPU v5p, Trillium (TPU v6e), and Ironwood (TPU v7) are still actively sold through [Cloud TPU](/wiki/cloud_tpu) [3].

### when did each tpu generation ship?

The table below summarizes the main TPU generations on Google Cloud.

| Generation | Year | Focus | Peak performance per chip | HBM | Max pod size | Notable customers |
|---|---|---|---|---|---|---|
| [TPU v1](/wiki/tpu_v1) | 2015 | Inference (8-bit integer) | 92 TOPS (INT8) | 8 GB DDR3 | Single chip in server | Google internal only |
| [TPU v2](/wiki/tpu_v2) | 2017 | Training, [bfloat16](/wiki/bfloat16) | 45 TFLOPS (BF16) | 16 GB HBM | 256 chips per pod | Cloud TPU launch customers |
| [TPU v3](/wiki/tpu_v3) | 2018 | Larger training | 123 TFLOPS (BF16) | 32 GB HBM | 1,024 chips per pod | Liquid-cooled |
| [TPU v4](/wiki/tpu_v4) | 2021 | Optical reconfigurable pods | 275 TFLOPS (BF16) | 32 GB HBM | 4,096 chips per pod | Used to train [PaLM](/wiki/palm) and early [Gemini](/wiki/gemini) |
| [TPU v5e](/wiki/tpu_v5e) | 2023 | Cost-efficient training and inference | 197 TFLOPS (BF16) | 16 GB HBM | 256 chips per pod | Anthropic, Hugging Face, AssemblyAI |
| [TPU v5p](/wiki/tpu_v5p) | 2023 | Highest training performance | 459 TFLOPS (BF16) | 95 GB HBM | 8,960 chips per pod | Used to train [Gemini 1.0 Ultra](/wiki/gemini_ultra) |
| [Trillium (TPU v6e)](/wiki/trillium) | 2024 | High performance per dollar, third-gen SparseCore | 918 TFLOPS (BF16) | 32 GB HBM | 256 chips per pod | Used to train [Gemini 2.0](/wiki/gemini_2) |
| [Ironwood (TPU v7)](/wiki/tpu_ironwood) | 2025 | Inference-first | 4,614 TFLOPS (FP8) | 192 GB HBM3E | 9,216-chip superpods | Anthropic, Lightricks, Essential AI |

Trillium reached general availability on December 11, 2024, providing a 4.7x increase in peak compute performance per chip over TPU v5e, double the HBM capacity and bandwidth, double the interchip interconnect bandwidth, three times the host DRAM, scaling to 256 chips per pod, and a 67% gain in energy efficiency [3]. Google reports Trillium delivering up to 2.1x better performance per dollar than TPU v5e and 2.5x better than TPU v5p when training dense LLMs such as [Llama 2 70B](/wiki/llama_2) and [Llama 3.1 405B](/wiki/llama_3), and up to 4x faster training for those dense models than v5e [3]. Ironwood is the first TPU generation explicitly designed for inference, reflecting the industry shift toward serving rather than training-dominated compute. Each Ironwood chip delivers 4,614 teraFLOPS at FP8 precision and carries 192 GB of HBM3E with about 7.4 TB/s of bandwidth, and a full Ironwood superpod links 9,216 liquid-cooled chips through optical circuit switching to reach 42.5 exaFLOPS of FP8 performance and 1.77 PB of shared memory [13].

### cloud tpu vocabulary

Cloud TPU exposes a specific vocabulary that maps physical hardware to logical resources. Each term has its own dedicated wiki page (linked in the index at the bottom of this article).

| Term | Meaning |
|---|---|
| [TPU chip](/wiki/tpu_chip) | A single ASIC with one or more TensorCores |
| [TPU device](/wiki/tpu_device) | A board containing four TPU chips |
| [TPU node](/wiki/tpu_node) | A logical Cloud TPU resource exposed to a user, possibly spanning many chips |
| [TPU slice](/wiki/tpu_slice) | A subset of a TPU pod allocated to one job |
| [TPU pod](/wiki/tpu_pod) | The full set of interconnected chips in a single ICI fabric |
| [TPU type](/wiki/tpu_type) | The hardware generation and topology requested at provisioning time |
| [TPU master](/wiki/tpu_master) | The control plane host that drives a TPU job |
| [TPU worker](/wiki/tpu_worker) | A host machine that owns a slice of TPUs and runs the user program |
| [TPU resource](/wiki/tpu_resource) | The Cloud Resource Manager object representing a TPU allocation |

### cloud tpu pricing model

Cloud TPU pricing is published per chip-hour. Customers can buy time through three contract types: on-demand pricing, one-year and three-year [committed use discounts](/wiki/committed_use_discount) (CUDs), and Spot/preemptible pricing for fault-tolerant workloads. Spot TPUs typically save more than 50% relative to on-demand and are widely used for long pretraining runs that checkpoint frequently. TPUs are billed for the entire slice (rather than per active chip), so the unit economics reward dense scheduling and efficient use of the [Dynamic Workload Scheduler](/wiki/dynamic_workload_scheduler).

## ai hypercomputer and gke for ai workloads

AI Hypercomputer is Google Cloud's branding for the integrated supercomputing system that combines compute (TPU and GPU), networking, storage, and software (orchestration, frameworks, libraries) into one offering. AI Hypercomputer underpins nearly every AI workload on Google Cloud and is the deployment fabric used by Google's own product teams as well as external research labs.

The major components are:

| Component | Role |
|---|---|
| [Cloud TPU](/wiki/cloud_tpu) and [GPU](/wiki/gpu) instances | Accelerator compute |
| [Cluster Director](/wiki/cluster_director) | Cluster management and job scheduling for GPU and TPU pods |
| [Dynamic Workload Scheduler](/wiki/dynamic_workload_scheduler) | Queueing system that bin-packs accelerator demand to maximize utilization |
| [Managed Lustre](/wiki/managed_lustre) | High-throughput file system, multiple performance tiers, up to 8 PiB |
| [GKE](/wiki/gke) | Managed [Kubernetes](/wiki/kubernetes) for AI workloads |
| [GKE Inference Gateway](/wiki/gke_inference_gateway) | Prefix-aware load balancer for [LLM](/wiki/large_language_model) inference workloads with recurring prompts |
| [MaxText](/wiki/maxtext) and [MaxDiffusion](/wiki/maxdiffusion) | Reference [JAX](/wiki/jax) implementations of LLMs and diffusion models tuned for TPU pods |
| [vLLM TPU](/wiki/vllm_tpu) | Port of [vLLM](/wiki/vllm) inference engine to JAX and TPU |

Google Kubernetes Engine is the primary container surface for AI Hypercomputer. GKE supports both TPU and GPU node pools, exposes the Container Storage Interface (CSI) drivers needed for high-throughput data access, and provides specialized features such as the GKE Inference Gateway, prefix-aware autoscalers, and topology-aware scheduling. Customers running large-scale training or serving on GKE typically combine these features with [Kueue](/wiki/kueue) for batch job queueing and the Dynamic Workload Scheduler for accelerator reservations.

## bigquery ml

[BigQuery ML](/wiki/bigquery_ml) (BQML) lets data analysts train and run [machine learning](/wiki/machine_learning) models using only SQL inside Google's serverless data warehouse. The product was first released in 2018 and originally supported only [linear regression](/wiki/linear_regression) and [logistic regression](/wiki/logistic_regression). It has since expanded to cover [k-means clustering](/wiki/k_means), [matrix factorization](/wiki/matrix_factorization), [time-series forecasting](/wiki/time_series_forecasting), [boosted trees](/wiki/boosted_tree) (powered by XGBoost), [deep neural networks](/wiki/deep_neural_network) (powered by TensorFlow and Keras), and [autoencoders](/wiki/autoencoder).

BigQuery ML also exposes a `REMOTE` model type that calls Cloud AI APIs from inside SQL. The most common forms are:

| SQL function | Backing service | Use |
|---|---|---|
| `ML.GENERATE_TEXT` | Vertex AI Gemini | Generate or transform text from BigQuery rows |
| `ML.GENERATE_EMBEDDING` | Vertex AI text and multimodal embedding models | Vector embeddings for retrieval and clustering |
| `ML.PROCESS_DOCUMENT` | [Document AI](/wiki/document_ai) | Parse PDFs and other structured documents |
| `ML.TRANSCRIBE` | [Speech-to-Text](/wiki/speech_to_text) | Convert audio files into text |
| `ML.UNDERSTAND_TEXT` | Vertex AI Natural Language | Sentiment, entity, and syntax analysis |
| `ML.ANNOTATE_IMAGE` | [Vision API](/wiki/vision_api) | Label detection, OCR, and face analysis |
| `ML.TRANSLATE` | [Cloud Translation](/wiki/translation_api) | Translate text between languages |

Because BigQuery already governs the data, BQML is a popular path for analytics teams who want to add ML and generative AI to existing dashboards without writing Python code or managing GPUs.

## prebuilt cloud ai apis

Google Cloud offers a set of focused, prebuilt APIs that wrap a single ML capability. These services existed before the Vertex AI consolidation and continue to evolve as both standalone products and as services callable from Vertex AI and BigQuery.

### document ai

[Document AI](/wiki/document_ai) is a managed platform for parsing structured and unstructured documents. It exposes pretrained processors for common document types (invoices, receipts, contracts, IRS tax forms, identity documents, mortgage and lending packets) and a Custom Extractor that uses generative AI to extract structured data from arbitrary documents with little training data. Document AI integrates with [Workflows](/wiki/workflows), [Eventarc](/wiki/eventarc), [Cloud Functions](/wiki/cloud_functions), and BigQuery, and is a common building block for accounts payable automation, KYC pipelines, and contract review.

### cloud translation

[Cloud Translation](/wiki/translation_api), historically known as the Translation API, provides [neural machine translation](/wiki/neural_machine_translation) across more than 100 languages. The Basic (v2) tier exposes pretrained NMT for fast batch translation, while the Advanced (v3) tier adds custom AutoML translation models, glossaries for consistent terminology, [batch document](/wiki/batch_document_translation) translation for PDFs, DOCX, and PPTX files, and Gemini-powered Adaptive Translation. Pricing is volume based: the first 500,000 characters per month are free, with additional usage at roughly $20 per million characters and $0.08 per page for document translation.

### speech-to-text and text-to-speech

[Speech-to-Text](/wiki/speech_to_text) is the [Chirp](/wiki/chirp)-powered transcription API. It supports more than 125 languages and variants, real-time streaming and asynchronous batch transcription, multi-speaker [diarization](/wiki/diarization), profanity filtering, and word-level timestamps. The latest models are based on [Universal Speech Model](/wiki/universal_speech_model) (USM) and Chirp foundation models. The companion [Text-to-Speech](/wiki/text_to_speech_google) API uses [WaveNet](/wiki/wavenet) and [Studio](/wiki/text_to_speech_studio) voices to generate natural-sounding speech.

### vision api

The [Vision API](/wiki/vision_api) provides label detection, [optical character recognition](/wiki/optical_character_recognition) (OCR), face and landmark detection, logo recognition, [explicit content detection](/wiki/safesearch), object localization, and product search through pretrained models. It can be called directly or through BigQuery's `ML.ANNOTATE_IMAGE` function. The first 1,000 units per month for most features are included in the free tier.

### video intelligence

[Video Intelligence API](/wiki/video_intelligence) extends similar capabilities to video. It identifies shot changes, labels objects and scenes over time, transcribes speech, detects faces and text, recognizes logos and celebrities, and flags explicit or violent content. The first 1,000 minutes of video analysis per month are free, with per-minute pricing for additional usage rounded up to the next full minute.

### natural language and discovery engine

The [Cloud Natural Language API](/wiki/cloud_natural_language) exposes sentiment analysis, entity recognition, syntactic parsing, and content classification. The [Discovery Engine](/wiki/discovery_engine) backs Vertex AI Search and Conversation by indexing structured and unstructured corpora with vector and lexical retrieval.

## vertex ai search and conversation

[Vertex AI Search and Conversation](/wiki/vertex_ai_search), originally launched as Generative App Builder (Gen App Builder) and later folded into [Vertex AI Agent Builder](/wiki/vertex_ai_agent_builder), is the platform for building [generative AI](/wiki/generative_ai) search experiences and chat agents on top of Google's enterprise search index. The service has three main pieces:

- **Vertex AI Search** for retrieval-augmented question answering across websites, BigQuery, [Cloud Storage](/wiki/cloud_storage), [Confluence](/wiki/confluence), [Jira](/wiki/jira), and other connected sources, with built-in citations and grounding.
- **Vertex AI Conversation** for building chatbots that can transact (for example, take payments, schedule appointments, or look up account information).
- **Vertex AI Agent Builder** for assembling [agents](/wiki/ai_agent) that combine retrieval, tool use, and orchestration into multi-step workflows.

At Google Cloud Next 2026, Google rebranded the surface as the **Gemini Enterprise Agent Platform** and merged the Vertex AI Agent Builder, [Agentspace](/wiki/agentspace), and Gemini Enterprise products into a single offering. The platform exposes new capabilities such as Agent Studio (visual builder), Agent-to-Agent Orchestration, Agent Registry, Agent Identity, Agent Gateway, and Agent Observability, while preserving the underlying Vertex AI APIs for backwards compatibility [2].

## alphafold via vertex

[AlphaFold](/wiki/alphafold) is the [Google DeepMind](/wiki/google_deepmind) protein-structure-prediction system that won the [CASP](/wiki/casp) 14 competition in 2020, where AlphaFold 2 produced the best prediction for 88 of 97 targets and scored above 90 on the global distance test for roughly two-thirds of proteins, a level the CASP organizers judged competitive with experimental methods and a solution to the 50-year-old protein folding problem [14]. Google Cloud exposes AlphaFold to enterprises in two ways: as a hosted [Vertex AI Pipelines](/wiki/vertex_ai_pipelines) workflow that runs end-to-end inference on managed accelerators, and through a user-friendly AlphaFold Portal on Vertex AI that lets researchers submit amino-acid sequences without writing Python or managing notebooks. The pipeline takes a sequence, performs multiple-sequence-alignment and feature-engineering steps, runs the [neural network](/wiki/neural_network) inference on TPUs or GPUs, and produces predicted 3D structures. The biotech company Nuclera and many academic groups use Vertex AI as their AlphaFold backend [4].

## comparison with aws and azure

Google Cloud, [Amazon Web Services](/wiki/aws), and [Microsoft Azure](/wiki/microsoft_azure) are the three dominant enterprise machine learning clouds. Each platform exposes a managed end-to-end ML service plus a generative AI front door. The table below summarizes the main analogues.

| Capability | Google Cloud | AWS | Azure |
|---|---|---|---|
| Managed ML platform | [Vertex AI](/wiki/vertex_ai) | [Amazon SageMaker](/wiki/amazon_sagemaker) AI | [Azure Machine Learning](/wiki/azure_machine_learning) |
| Foundation-model marketplace | Vertex AI Model Garden | [Amazon Bedrock](/wiki/amazon_bedrock) | [Azure AI Foundry](/wiki/azure_ai_foundry) Models |
| First-party model family | [Gemini](/wiki/gemini), [Imagen](/wiki/imagen), [Veo](/wiki/veo), [Lyria](/wiki/lyria), [Chirp](/wiki/chirp), [Gemma](/wiki/gemma) | [Amazon Nova](/wiki/amazon_nova), [Titan](/wiki/amazon_titan) | [Phi](/wiki/phi) family, plus deep [OpenAI](/wiki/openai) integration via [Azure OpenAI](/wiki/azure_openai) |
| Custom AI accelerator | [TPU](/wiki/tpu) | [Trainium](/wiki/aws_trainium), [Inferentia](/wiki/inferentia) | Maia 100 |
| GPU partner | [NVIDIA](/wiki/nvidia) ([H100](/wiki/h100), [B200](/wiki/b200)), [AMD](/wiki/amd) | NVIDIA, AMD | NVIDIA, AMD |
| Generative app surface | Vertex AI Search and Conversation, Gemini Enterprise Agent Platform | Amazon Q, Bedrock Agents | Azure AI Foundry Agent Service, Copilot Studio |
| Data warehouse with ML | [BigQuery ML](/wiki/bigquery_ml) | Amazon SageMaker Lakehouse, [Redshift ML](/wiki/redshift_ml) | [Microsoft Fabric](/wiki/microsoft_fabric), [Synapse Analytics](/wiki/synapse_analytics) |
| Vector database | Vertex AI Vector Search, [AlloyDB](/wiki/alloydb) for [pgvector](/wiki/pgvector) | OpenSearch Serverless, Aurora pgvector | Azure AI Search |

The three platforms differ less in scope than in default model partner. Google ships its own frontier family on Vertex; AWS leans on Anthropic, Meta, and Mistral inside Bedrock; and Azure is the privileged channel for OpenAI's GPT-5 and o-series models through Azure OpenAI and Azure AI Foundry. Industry coverage in 2025 and 2026 generally credits SageMaker with the strongest [MLOps](/wiki/mlops) tooling and instance variety, Azure AI Foundry with the deepest enterprise productivity integrations, and Vertex AI with the cleanest first-party generative model integration and the only fully owned custom training silicon at scale via TPUs [5][6].

## tpu vs gpu trade-offs

A recurring question for ML teams adopting Google Cloud is whether to train on Google's [TPU](/wiki/tpu) or to keep using NVIDIA [GPUs](/wiki/gpu). Both are available on Vertex AI and GKE. The main trade-offs follow.

### where do tpus win?

- **Tensor-heavy workloads at scale.** TPUs are purpose-built around a [systolic array](/wiki/systolic_array) for [matrix multiplication](/wiki/matrix_multiplication). Standard transformer training and inference fit this shape almost perfectly.
- **Performance per dollar at large batch sizes.** Trillium delivers up to 2.1x better performance per dollar than TPU v5e and up to 2.5x better than TPU v5p on dense LLM training [3]. Independent benchmarks have reported up to 4x better cost-performance than NVIDIA H100 on representative LLM training and serving workloads at large batch sizes [7].
- **Energy efficiency.** Google reports Trillium is 67% more energy efficient than TPU v5e, and that Ironwood is roughly 30x more power efficient than the first Cloud TPU (v2) on a performance-per-watt basis [3][13].
- **Inference at very large scale.** Ironwood is the first TPU built primarily for inference. Customers such as Anthropic and Lightricks have publicly cited TPU inference economics as a reason to adopt Cloud TPU. Midjourney reported cutting monthly inference spend from about $2.1 million to under $700,000, a roughly 65% reduction, after migrating its image-generation fleet from GPUs to TPU v6e [15].

### where do gpus win?

- **Software ecosystem.** [CUDA](/wiki/cuda), cuDNN, and the broader NVIDIA stack remain the default in academic and open-source code. Most third-party libraries are tested first on NVIDIA hardware.
- **Workload diversity.** GPUs handle non-tensor workloads (graph algorithms, scientific computing, [reinforcement learning](/wiki/reinforcement_learning) environments, [diffusion model](/wiki/diffusion_model) sampling at low batch sizes) more flexibly than TPUs.
- **Mixed framework usage.** Teams that need to switch between [PyTorch](/wiki/pytorch), [JAX](/wiki/jax), and [TensorFlow](/wiki/tensorflow) often find GPU drivers more uniform than TPU runtimes, where peak performance still requires JAX or PyTorch/XLA tuning.
- **Single-node debugging.** Single-GPU developer machines and modest 2 to 8 GPU nodes are easier to provision and iterate on than minimum-size TPU slices.

Many Google Cloud customers run a hybrid: large-scale pretraining and high-throughput inference on Cloud TPU pods, exploration and short fine-tuning runs on NVIDIA GPUs, and edge deployment on [Edge TPU](/wiki/edge_tpu) or NVIDIA Jetson devices.

## notable customers and case studies

Google reports that as of 2025, more than a third of its new public cloud case studies involve a Cloud AI product, the highest share among the three major hyperscalers [8]. The most prominent single commitment is [Anthropic](/wiki/anthropic)'s: in October 2025 it agreed to expand to up to one million Google Cloud TPUs and more than a gigawatt of capacity coming online in 2026, a deal reported to be worth tens of billions of dollars [11]. A non-exhaustive list of widely cited deployments follows.

| Customer | Industry | Use of Google Cloud ML |
|---|---|---|
| [Anthropic](/wiki/anthropic) | AI research | Trains and serves [Claude](/wiki/claude) on Cloud TPU at very large scale; ~1M TPUs, >1 GW in 2026 [11] |
| [Salesforce](/wiki/salesforce) | Enterprise SaaS | Uses Vertex AI and Gemini for [Einstein](/wiki/einstein) inside CRM |
| [Etsy](/wiki/etsy) | E-commerce | Personalization for ~90 million shoppers via Vertex AI, BigQuery, [Dataflow](/wiki/dataflow), Gemini; Gemini alt text lifted SEO visits 5% and conversions 3% [16] |
| [BMW Group](/wiki/bmw) | Manufacturing | SORDI.ai 3D digital twins built on Vertex AI |
| [Harvey](/wiki/harvey_ai) | Legal AI | Gemini 2.5 Pro on Vertex AI for long-document legal review |
| Domina | Logistics | Predicts package returns across 20 million annual shipments |
| Amdocs | Telecom software | Telco Customer Experience Agent on Gemini Enterprise |
| Huge | Business services | AI agents for market research and contract analysis |
| Lightricks | Creative apps | Inference workloads on Ironwood TPU |
| Essential AI | Foundation models | Training on Ironwood TPU |
| Nuclera | Biotech | Runs AlphaFold inference on Vertex AI |
| Mercedes-Benz | Automotive | Uses Vertex AI for in-car voice and driver assistance |
| Wendy's | QSR | Drive-thru voice ordering on Vertex AI and Google speech |
| Uber | Mobility | Vertex AI and BigQuery ML for pricing, ETAs, and support |

Google's own products are large internal users too: [Search](/wiki/google_search), [YouTube](/wiki/youtube) recommendations, [Gmail](/wiki/gmail) Smart Compose, [Google Photos](/wiki/google_photos), [Google Translate](/wiki/google_translate), and [Waymo](/wiki/waymo) all rely on the same TPU and Vertex AI surfaces external customers use.

## index of google cloud term wiki pages

*See also: [Machine learning terms](/wiki/machine_learning_terms)*
- [Cloud TPU](/wiki/cloud_tpu)

- [Tensor Processing Unit (TPU)](/wiki/tensor_processing_unit_tpu)

- [TPU](/wiki/tpu)

- [TPU chip](/wiki/tpu_chip)

- [TPU device](/wiki/tpu_device)

- [TPU master](/wiki/tpu_master)

- [TPU node](/wiki/tpu_node)

- [TPU Pod](/wiki/tpu_pod)

- [TPU resource](/wiki/tpu_resource)

- [TPU slice](/wiki/tpu_slice)

- [TPU type](/wiki/tpu_type)

- [TPU worker](/wiki/tpu_worker)

## references

[1] Google Cloud Blog, "Welcome to Google Cloud Next 26," 2026, https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26.

[2] Google Cloud, "Gemini Enterprise Agent Platform (formerly Vertex AI)," 2026, https://cloud.google.com/products/gemini-enterprise-agent-platform.

[3] Google Cloud Blog, "Trillium TPU is GA," December 11, 2024, https://cloud.google.com/blog/products/compute/trillium-tpu-is-ga.

[4] Google Cloud Blog, "AlphaFold Portal on Vertex AI," https://cloud.google.com/blog/products/ai-machine-learning/alphafold-portal-on-vertex-ai-alphafold-inference-pipeline.

[5] Index.dev, "Vertex AI vs AWS Bedrock vs Azure AI Foundry: Features, Pricing in 2026," 2026, https://www.index.dev/skill-vs-skill/ai-aws-bedrock-vs-azure-ai-vs-vertex.

[6] AWS in Plain English, "AWS Bedrock/SageMaker vs Azure AI Foundry vs Google Vertex AI: The Ultimate Cloud AI Platform (2026 Edition)," 2026, https://aws.plainenglish.io/aws-bedrock-sagemaker-vs-azure-ai-foundry-vs-google-vertex-ai-the-ultimate-cloud-ai-platform-2026-03bbbab919b2.

[7] Introl Blog, "TPU v6e vs GPU: 4x Better AI Performance Per Dollar," https://introl.com/blog/google-tpu-v6e-vs-gpu-4x-better-ai-performance-per-dollar-guide.

[8] Google Cloud Blog, "Real-world gen AI use cases from the world's leading organizations," https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders.

[9] Google Cloud, "Google Cloud Launches Vertex AI, Making Machine Learning More Accessible and Useful For Developers and Businesses," May 18, 2021, https://www.googlecloudpresscorner.com/2021-05-18-Google-Cloud-Launches-Vertex-AI,-Making-Machine-Learning-More-Accessible-and-Useful-For-Developers-and-Businesses.

[10] Google Cloud Blog, "Google supercharges machine learning tasks with TPU custom chip," May 18, 2016, https://cloud.google.com/blog/products/ai-machine-learning/google-supercharges-machine-learning-tasks-with-custom-chip.

[11] Anthropic, "Expanding our use of Google Cloud TPUs and Services," October 2025, https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services.

[12] Norman P. Jouppi et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit," ISCA 2017, https://arxiv.org/abs/1704.04760.

[13] Google Cloud Blog, "Inside the Ironwood TPU codesigned AI stack," 2025, https://cloud.google.com/blog/products/compute/inside-the-ironwood-tpu-codesigned-ai-stack.

[14] DeepMind, "AlphaFold: a solution to a 50-year-old grand challenge in biology," November 30, 2020, https://deepmind.google/discover/blog/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology/.

[15] FourWeekMBA, "Why Midjourney's 65% Cost Cut Reveals AI's Hardware Future," 2025, https://fourweekmba.com/why-midjourneys-65-cost-cut-reveals-ais-hardware-future/.

[16] Google Cloud, "Etsy case study," https://cloud.google.com/customers/etsy-ai.