Machine learning terms/Google Cloud
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,993 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 3,993 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Machine learning terms, Vertex AI, Cloud TPU, TPU, Google, Google Cloud
Google Cloud is the public cloud arm of Google and one of the three dominant providers of machine learning infrastructure, alongside Amazon Web Services and Microsoft Azure. Google Cloud's machine learning portfolio is unusual in that it is the same stack Google DeepMind and Google's own product teams use to train and serve flagship models such as Gemini, Imagen, Veo, Lyria, Chirp, and AlphaFold. The platform is built around three layers: a managed ML and generative AI service called Vertex AI, a custom silicon family called the Tensor Processing Unit (TPU), and an integrated supercomputing fabric called AI Hypercomputer that ties accelerators, networking, storage, and software into a single offering.
This page is the gateway hub for Google Cloud machine learning entries on the AI Wiki. It introduces the major products, surveys the TPU generations and their pricing, compares the platform with AWS and Azure equivalents, and provides a curated index of every Google Cloud term with its own dedicated wiki page.
Google Cloud's ML offerings span the full lifecycle of artificial intelligence work: data ingestion, feature engineering, custom training, hyperparameter tuning, serving, monitoring, and the building of generative AI applications and agents. Customers can either use prebuilt APIs that wrap a single capability (vision, speech, translation, document parsing) or take full control through Vertex AI's notebooks, training jobs, pipelines, and inference endpoints. Most of the same accelerators that power Google's first-party products are exposed to enterprises through Vertex AI, the Google Kubernetes Engine (GKE), and the Compute Engine virtual machine surface.
The portfolio reached its current shape through a series of consolidations. In 2017 Google released Cloud ML Engine for managed TensorFlow training, then in 2018 added AutoML for no-code modeling. On May 18, 2021, at Google I/O, Google merged the two into a single managed service called Vertex AI. After the launch of Bard and Gemini, Vertex AI became the primary path for enterprise customers to consume frontier Google models alongside third-party models from Anthropic, Mistral AI, and Meta. At Google Cloud Next 2026, Google rebranded the surface as the Gemini Enterprise Agent Platform while continuing to ship the underlying Vertex AI services [1][2].
Vertex AI is the unified machine learning and generative AI platform offered by Google Cloud. It consolidates data labeling, AutoML training, custom training, hyperparameter tuning, pipelines, feature stores, model registries, online and batch prediction, model evaluation, and monitoring into a single API and console. Customers do not run the underlying training schedulers or inference endpoints themselves: they call Vertex AI services, and Google manages the queues, autoscalers, and accelerator fleets behind the scenes.
The service has three broad surfaces:
| Surface | Purpose | Typical entry points |
|---|---|---|
| Predictive AI | Train and serve classical supervised learning models, regression, classification, forecasting, and ranking | AutoML, custom training jobs, hyperparameter tuning, feature store, batch prediction, online endpoints |
| Generative AI | Use foundation models and large language models for text, code, image, video, audio, and multimodal applications | Vertex AI Studio, Model Garden, tuning jobs, evaluation, RAG Engine, prompt management |
| Agentic AI | Build agents that call tools, retrieve documents, and reason in multi-step workflows | Vertex AI Agent Builder (now Gemini Enterprise Agent Platform), Agent Development Kit, Agent Engine, Agent Starter Pack |
AutoML is Vertex AI's no-code path. Users upload a labeled tabular, image, text, or video dataset and Vertex AI runs an architecture search and hyperparameter search to produce a deployable model. Custom training, in contrast, runs arbitrary container images on managed GPUs or TPUs. It supports TensorFlow, PyTorch, JAX, scikit-learn, and XGBoost out of the box, and integrates with Vertex AI Pipelines for orchestration. The model registry is a versioned catalog of trained artifacts that ties model lineage to deployments, evaluations, and tuning jobs.
Model Garden is Vertex AI's curated catalog of models. It exposes Google's first-party models (Gemini, Gemma, Imagen, Veo, Lyria, Chirp, PaLM, Codey), third-party models (Anthropic's Claude family, Mistral models, Llama), and open models from Hugging Face. Each entry can be deployed to a Vertex AI endpoint, fine-tuned, or queried via the Vertex AI API, with billing and quotas handled by Google Cloud.
The Gemini family of multimodal models is available on Google Cloud through two complementary surfaces: Google AI Studio for prototyping with personal accounts, and the Vertex AI Gemini API for production use with enterprise governance, regional endpoints, VPC Service Controls, and Customer-Managed Encryption Keys. Vertex AI was the first commercial surface for Gemini when it launched in December 2023 and remains the default channel for enterprise deployments.
The table below summarizes the main Gemini variants available on Vertex AI as of 2026.
| Model | Context window | Strengths | Typical use |
|---|---|---|---|
| Gemini 3 Pro | 1,000,000 tokens | Most advanced reasoning, agentic workflows, code, multimodal | Complex enterprise reasoning, long-document analysis |
| Gemini 3 Flash | 1,000,000 tokens | Lower latency, lower cost, multimodal | High-throughput consumer and customer-support apps |
| Gemini 2.5 Pro | 1,000,000 tokens | Adaptive thinking, deep reasoning, code | Coding agents, RAG, structured extraction |
| Gemini 2.5 Flash | 1,000,000 tokens | Cheap, fast, capable enough for most tasks | Bulk inference, classification, summarization |
Vertex AI exposes Gemini through several inference modes, including standard online inference, provisioned throughput for predictable capacity, batch prediction for large offline jobs, and global endpoints that route requests to the nearest healthy region. The same models can also be tuned with supervised fine-tuning, reinforcement learning from human feedback, or distillation directly inside Vertex AI tuning jobs.
The Tensor Processing Unit is Google's family of custom application-specific integrated circuits (ASICs) for deep learning workloads. TPUs are organized around a systolic array for matrix multiplication and use bfloat16 as their primary low-precision number format. The first TPU was deployed inside Google data centers in 2015 to serve voice search and was publicly disclosed at Google I/O in May 2016. Google has since shipped seven public generations, of which TPU v5e, TPU v5p, Trillium (TPU v6e), and Ironwood (TPU v7) are still actively sold through Cloud TPU [3].
The table below summarizes the main TPU generations on Google Cloud.
| Generation | Year | Focus | Peak performance per chip | HBM | Max pod size | Notable customers |
|---|---|---|---|---|---|---|
| TPU v1 | 2015 | Inference (8-bit integer) | 92 TOPS (INT8) | 8 GB DDR3 | Single chip in server | Google internal only |
| TPU v2 | 2017 | Training, bfloat16 | 45 TFLOPS (BF16) | 16 GB HBM | 256 chips per pod | Cloud TPU launch customers |
| TPU v3 | 2018 | Larger training | 123 TFLOPS (BF16) | 32 GB HBM | 1,024 chips per pod | Liquid-cooled |
| TPU v4 | 2021 | Optical reconfigurable pods | 275 TFLOPS (BF16) | 32 GB HBM | 4,096 chips per pod | Used to train PaLM and early Gemini |
| TPU v5e | 2023 | Cost-efficient training and inference | 197 TFLOPS (BF16) | 16 GB HBM | 256 chips per pod | Anthropic, Hugging Face, AssemblyAI |
| TPU v5p | 2023 | Highest training performance | 459 TFLOPS (BF16) | 95 GB HBM | 8,960 chips per pod | Used to train Gemini 1.0 Ultra |
| Trillium (TPU v6e) | 2024 | High performance per dollar, third-gen SparseCore | 918 TFLOPS (BF16) | 32 GB HBM | 256 chips per pod | Used to train Gemini 2.0 |
| Ironwood (TPU v7) | 2025 | Inference-first | 4,614 TFLOPS (FP8) | 192 GB HBM3E | 9,216-chip superpods | Anthropic, Lightricks, Essential AI |
Trillium provides up to a 4.7x increase in compute performance per chip over TPU v5e, doubles HBM capacity and bandwidth, scales to 256 chips per pod, and is over 67% more energy efficient. Google reports Trillium delivering up to 2.1x better performance per dollar than TPU v5e and 2.5x better than TPU v5p when training dense LLMs such as Llama 2 70B and Llama 3.1 405B [3]. Ironwood is the first TPU generation explicitly designed for inference, reflecting the industry shift toward serving rather than training-dominated compute. Each Ironwood chip delivers 4,614 teraFLOPS at FP8 precision and carries 192 GB of HBM3E with 7.37 TB/s of bandwidth.
Cloud TPU exposes a specific vocabulary that maps physical hardware to logical resources. Each term has its own dedicated wiki page (linked in the index at the bottom of this article).
| Term | Meaning |
|---|---|
| TPU chip | A single ASIC with one or more TensorCores |
| TPU device | A board containing four TPU chips |
| TPU node | A logical Cloud TPU resource exposed to a user, possibly spanning many chips |
| TPU slice | A subset of a TPU pod allocated to one job |
| TPU pod | The full set of interconnected chips in a single ICI fabric |
| TPU type | The hardware generation and topology requested at provisioning time |
| TPU master | The control plane host that drives a TPU job |
| TPU worker | A host machine that owns a slice of TPUs and runs the user program |
| TPU resource | The Cloud Resource Manager object representing a TPU allocation |
Cloud TPU pricing is published per chip-hour. Customers can buy time through three contract types: on-demand pricing, one-year and three-year committed use discounts (CUDs), and Spot/preemptible pricing for fault-tolerant workloads. Spot TPUs typically save more than 50% relative to on-demand and are widely used for long pretraining runs that checkpoint frequently. TPUs are billed for the entire slice (rather than per active chip), so the unit economics reward dense scheduling and efficient use of the Dynamic Workload Scheduler.
AI Hypercomputer is Google Cloud's branding for the integrated supercomputing system that combines compute (TPU and GPU), networking, storage, and software (orchestration, frameworks, libraries) into one offering. AI Hypercomputer underpins nearly every AI workload on Google Cloud and is the deployment fabric used by Google's own product teams as well as external research labs.
The major components are:
| Component | Role |
|---|---|
| Cloud TPU and GPU instances | Accelerator compute |
| Cluster Director | Cluster management and job scheduling for GPU and TPU pods |
| Dynamic Workload Scheduler | Queueing system that bin-packs accelerator demand to maximize utilization |
| Managed Lustre | High-throughput file system, multiple performance tiers, up to 8 PiB |
| GKE | Managed Kubernetes for AI workloads |
| GKE Inference Gateway | Prefix-aware load balancer for LLM inference workloads with recurring prompts |
| MaxText and MaxDiffusion | Reference JAX implementations of LLMs and diffusion models tuned for TPU pods |
| vLLM TPU | Port of vLLM inference engine to JAX and TPU |
Google Kubernetes Engine is the primary container surface for AI Hypercomputer. GKE supports both TPU and GPU node pools, exposes the Container Storage Interface (CSI) drivers needed for high-throughput data access, and provides specialized features such as the GKE Inference Gateway, prefix-aware autoscalers, and topology-aware scheduling. Customers running large-scale training or serving on GKE typically combine these features with Kueue for batch job queueing and the Dynamic Workload Scheduler for accelerator reservations.
BigQuery ML (BQML) lets data analysts train and run machine learning models using only SQL inside Google's serverless data warehouse. The product was first released in 2018 and originally supported only linear regression and logistic regression. It has since expanded to cover k-means clustering, matrix factorization, time-series forecasting, boosted trees (powered by XGBoost), deep neural networks (powered by TensorFlow and Keras), and autoencoders.
BigQuery ML also exposes a REMOTE model type that calls Cloud AI APIs from inside SQL. The most common forms are:
| SQL function | Backing service | Use |
|---|---|---|
ML.GENERATE_TEXT | Vertex AI Gemini | Generate or transform text from BigQuery rows |
ML.GENERATE_EMBEDDING | Vertex AI text and multimodal embedding models | Vector embeddings for retrieval and clustering |
ML.PROCESS_DOCUMENT | Document AI | Parse PDFs and other structured documents |
ML.TRANSCRIBE | Speech-to-Text | Convert audio files into text |
ML.UNDERSTAND_TEXT | Vertex AI Natural Language | Sentiment, entity, and syntax analysis |
ML.ANNOTATE_IMAGE | Vision API | Label detection, OCR, and face analysis |
ML.TRANSLATE | Cloud Translation | Translate text between languages |
Because BigQuery already governs the data, BQML is a popular path for analytics teams who want to add ML and generative AI to existing dashboards without writing Python code or managing GPUs.
Google Cloud offers a set of focused, prebuilt APIs that wrap a single ML capability. These services existed before the Vertex AI consolidation and continue to evolve as both standalone products and as services callable from Vertex AI and BigQuery.
Document AI is a managed platform for parsing structured and unstructured documents. It exposes pretrained processors for common document types (invoices, receipts, contracts, IRS tax forms, identity documents, mortgage and lending packets) and a Custom Extractor that uses generative AI to extract structured data from arbitrary documents with little training data. Document AI integrates with Workflows, Eventarc, Cloud Functions, and BigQuery, and is a common building block for accounts payable automation, KYC pipelines, and contract review.
Cloud Translation, historically known as the Translation API, provides neural machine translation across more than 100 languages. The Basic (v2) tier exposes pretrained NMT for fast batch translation, while the Advanced (v3) tier adds custom AutoML translation models, glossaries for consistent terminology, batch document translation for PDFs, DOCX, and PPTX files, and Gemini-powered Adaptive Translation. Pricing is volume based: the first 500,000 characters per month are free, with additional usage at roughly $20 per million characters and $0.08 per page for document translation.
Speech-to-Text is the Chirp-powered transcription API. It supports more than 125 languages and variants, real-time streaming and asynchronous batch transcription, multi-speaker diarization, profanity filtering, and word-level timestamps. The latest models are based on Universal Speech Model (USM) and Chirp foundation models. The companion Text-to-Speech API uses WaveNet and Studio voices to generate natural-sounding speech.
The Vision API provides label detection, optical character recognition (OCR), face and landmark detection, logo recognition, explicit content detection, object localization, and product search through pretrained models. It can be called directly or through BigQuery's ML.ANNOTATE_IMAGE function. The first 1,000 units per month for most features are included in the free tier.
Video Intelligence API extends similar capabilities to video. It identifies shot changes, labels objects and scenes over time, transcribes speech, detects faces and text, recognizes logos and celebrities, and flags explicit or violent content. The first 1,000 minutes of video analysis per month are free, with per-minute pricing for additional usage rounded up to the next full minute.
The Cloud Natural Language API exposes sentiment analysis, entity recognition, syntactic parsing, and content classification. The Discovery Engine backs Vertex AI Search and Conversation by indexing structured and unstructured corpora with vector and lexical retrieval.
Vertex AI Search and Conversation, originally launched as Generative App Builder (Gen App Builder) and later folded into Vertex AI Agent Builder, is the platform for building generative AI search experiences and chat agents on top of Google's enterprise search index. The service has three main pieces:
At Google Cloud Next 2026, Google rebranded the surface as the Gemini Enterprise Agent Platform and merged the Vertex AI Agent Builder, Agentspace, and Gemini Enterprise products into a single offering. The platform exposes new capabilities such as Agent Studio (visual builder), Agent-to-Agent Orchestration, Agent Registry, Agent Identity, Agent Gateway, and Agent Observability, while preserving the underlying Vertex AI APIs for backwards compatibility [2].
AlphaFold is the Google DeepMind protein-structure-prediction system that won the CASP 14 competition in 2020. Google Cloud exposes AlphaFold to enterprises in two ways: as a hosted Vertex AI Pipelines workflow that runs end-to-end inference on managed accelerators, and through a user-friendly AlphaFold Portal on Vertex AI that lets researchers submit amino-acid sequences without writing Python or managing notebooks. The pipeline takes a sequence, performs multiple-sequence-alignment and feature-engineering steps, runs the neural network inference on TPUs or GPUs, and produces predicted 3D structures. The biotech company Nuclera and many academic groups use Vertex AI as their AlphaFold backend [4].
Google Cloud, Amazon Web Services, and Microsoft Azure are the three dominant enterprise machine learning clouds. Each platform exposes a managed end-to-end ML service plus a generative AI front door. The table below summarizes the main analogues.
| Capability | Google Cloud | AWS | Azure |
|---|---|---|---|
| Managed ML platform | Vertex AI | Amazon SageMaker AI | Azure Machine Learning |
| Foundation-model marketplace | Vertex AI Model Garden | Amazon Bedrock | Azure AI Foundry Models |
| First-party model family | Gemini, Imagen, Veo, Lyria, Chirp, Gemma | Amazon Nova, Titan | Phi family, plus deep OpenAI integration via Azure OpenAI |
| Custom AI accelerator | TPU | Trainium, Inferentia | Maia 100 |
| GPU partner | NVIDIA (H100, B200), AMD | NVIDIA, AMD | NVIDIA, AMD |
| Generative app surface | Vertex AI Search and Conversation, Gemini Enterprise Agent Platform | Amazon Q, Bedrock Agents | Azure AI Foundry Agent Service, Copilot Studio |
| Data warehouse with ML | BigQuery ML | Amazon SageMaker Lakehouse, Redshift ML | Microsoft Fabric, Synapse Analytics |
| Vector database | Vertex AI Vector Search, AlloyDB for pgvector | OpenSearch Serverless, Aurora pgvector | Azure AI Search |
The three platforms differ less in scope than in default model partner. Google ships its own frontier family on Vertex; AWS leans on Anthropic, Meta, and Mistral inside Bedrock; and Azure is the privileged channel for OpenAI's GPT-5 and o-series models through Azure OpenAI and Azure AI Foundry. Industry coverage in 2025 and 2026 generally credits SageMaker with the strongest MLOps tooling and instance variety, Azure AI Foundry with the deepest enterprise productivity integrations, and Vertex AI with the cleanest first-party generative model integration and the only fully owned custom training silicon at scale via TPUs [5][6].
A recurring question for ML teams adopting Google Cloud is whether to train on Google's TPU or to keep using NVIDIA GPUs. Both are available on Vertex AI and GKE. The main trade-offs follow.
Many Google Cloud customers run a hybrid: large-scale pretraining and high-throughput inference on Cloud TPU pods, exploration and short fine-tuning runs on NVIDIA GPUs, and edge deployment on Edge TPU or NVIDIA Jetson devices.
Google reports that as of 2025, more than a third of its new public cloud case studies involve a Cloud AI product, the highest share among the three major hyperscalers [8]. A non-exhaustive list of widely cited deployments follows.
| Customer | Industry | Use of Google Cloud ML |
|---|---|---|
| Anthropic | AI research | Trains and serves Claude on Cloud TPU at very large scale |
| Salesforce | Enterprise SaaS | Uses Vertex AI and Gemini for Einstein inside CRM |
| Etsy | E-commerce | Personalization for 90 million shoppers via Vertex AI, BigQuery, Dataflow, Gemini |
| BMW Group | Manufacturing | SORDI.ai 3D digital twins built on Vertex AI |
| Harvey | Legal AI | Gemini 2.5 Pro on Vertex AI for long-document legal review |
| Domina | Logistics | Predicts package returns across 20 million annual shipments |
| Amdocs | Telecom software | Telco Customer Experience Agent on Gemini Enterprise |
| Huge | Business services | AI agents for market research and contract analysis |
| Lightricks | Creative apps | Inference workloads on Ironwood TPU |
| Essential AI | Foundation models | Training on Ironwood TPU |
| Nuclera | Biotech | Runs AlphaFold inference on Vertex AI |
| Mercedes-Benz | Automotive | Uses Vertex AI for in-car voice and driver assistance |
| Wendy's | QSR | Drive-thru voice ordering on Vertex AI and Google speech |
| Uber | Mobility | Vertex AI and BigQuery ML for pricing, ETAs, and support |
Google's own products are large internal users too: Search, YouTube recommendations, Gmail Smart Compose, Google Photos, Google Translate, and Waymo all rely on the same TPU and Vertex AI surfaces external customers use.
See also: Machine learning terms
[1] Google Cloud Blog, "Welcome to Google Cloud Next 26," 2026, https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26.
[2] Google Cloud, "Gemini Enterprise Agent Platform (formerly Vertex AI)," 2026, https://cloud.google.com/products/gemini-enterprise-agent-platform.
[3] Google Cloud Blog, "Trillium TPU is GA," December 11, 2024, https://cloud.google.com/blog/products/compute/trillium-tpu-is-ga.
[4] Google Cloud Blog, "AlphaFold Portal on Vertex AI," https://cloud.google.com/blog/products/ai-machine-learning/alphafold-portal-on-vertex-ai-alphafold-inference-pipeline.
[5] Index.dev, "Vertex AI vs AWS Bedrock vs Azure AI Foundry: Features, Pricing in 2026," 2026, https://www.index.dev/skill-vs-skill/ai-aws-bedrock-vs-azure-ai-vs-vertex.
[6] AWS in Plain English, "AWS Bedrock/SageMaker vs Azure AI Foundry vs Google Vertex AI: The Ultimate Cloud AI Platform (2026 Edition)," 2026, https://aws.plainenglish.io/aws-bedrock-sagemaker-vs-azure-ai-foundry-vs-google-vertex-ai-the-ultimate-cloud-ai-platform-2026-03bbbab919b2.
[7] Introl Blog, "TPU v6e vs GPU: 4x Better AI Performance Per Dollar," https://introl.com/blog/google-tpu-v6e-vs-gpu-4x-better-ai-performance-per-dollar-guide.
[8] Google Cloud Blog, "Real-world gen AI use cases from the world's leading organizations," https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders.