Gemini Robotics

Gemini Robotics is a family of robotics foundation models developed by Google DeepMind that extends the Gemini multimodal model line into the physical world. The first two models in the family, Gemini Robotics and Gemini Robotics-ER (short for Embodied Reasoning), were announced on March 12, 2025 and were both built on top of the Gemini 2.0 base model.^[1]^[2] Subsequent releases added an on-device variant in June 2025, the Gemini Robotics 1.5 generation in September 2025, and a refreshed reasoning model called Gemini Robotics-ER 1.6 in April 2026.^[3]^[4]^[5]^[6]

The line is positioned as a general-purpose vision-language-action (VLA) and embodied reasoning stack that lets robots perceive, plan, and act across different tasks and embodiments. Google DeepMind describes three properties as central to the design: generality across novel situations, interactivity through ordinary language, and dexterity for fine manipulation.^[1]^[2] The models are intended to be used together, with the embodied reasoning model acting as a high-level planner that calls the action model and other tools, including Google Search and user-defined functions.^[4]^[7]

Apptronik is the lead humanoid hardware partner for the program, and Google DeepMind has run a trusted tester program that included Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools at launch and grew to more than sixty organizations by late 2025.^[2]^[8]^[9] MIT Technology Review described the launch as one of the first major applications of generative AI to advanced robotics, while IEEE Spectrum called the announcement a step toward foundation models for embodied agents.^[10]^[11]

Background

Robotics research has long struggled to produce policies that generalize beyond the narrow conditions they were trained on. Earlier efforts inside Google DeepMind, notably the RT-1 (2022) and RT-2 (2023) projects, layered action prediction on top of vision-language models and showed that internet-scale pretraining could improve a robot's response to novel objects and instructions. Outside Google, the OpenVLA model from Stanford and the Pi-0 model from Physical Intelligence extended the same recipe with different choices about action representation and embodiment coverage.^[12]

Gemini Robotics inherits this approach but starts from a much larger and more recent base. Carolina Parada, who leads robotics at Google DeepMind, described the team's strategy as broad task learning instead of single-task specialization, with the bet that generalization would emerge once the model had enough exposure to varied tasks.^[13] The models are built on top of Gemini 2.0, which already encodes wide visual and linguistic context, and they add physical actions as an additional output modality alongside text.^[1]^[14]

The project was designed in collaboration with hardware partners from the start. Apptronik, maker of the Apollo humanoid robot, is the lead humanoid partner, and Google DeepMind opened a trusted tester program for Gemini Robotics-ER on the same day as the public announcement.^[2]^[15]

Release timeline

The table below summarizes the major Gemini Robotics releases through April 2026.

Date	Release	Notes
March 12, 2025	Gemini Robotics and Gemini Robotics-ER announced	Built on Gemini 2.0; both models gated to trusted testers at launch^[1]^[2]
March 25, 2025	Technical report posted to arXiv	Paper number 2503.20020, titled "Gemini Robotics: Bringing AI into the Physical World"^[16]
June 24, 2025	Gemini Robotics On-Device released	First on-robot VLA in the family; first VLA from Google DeepMind opened to fine-tuning^[3]^[17]
September 25, 2025	Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 announced	Two-model agentic stack; ER 1.5 made available in preview through the Gemini API and Google AI Studio^[4]^[18]
April 14, 2026	Gemini Robotics-ER 1.6 released	Improved spatial reasoning and instrument reading; deployed on Boston Dynamics Spot for industrial inspection^[5]^[6]^[19]

Model family

The Gemini Robotics product page groups the models into three roles: a vision-language-action model that turns visual input and instructions into motor commands, a reasoning model that plans, and an on-device variant that runs locally on the robot.^[20] The same role structure has been preserved across model generations, with the version numbers (1.5, 1.6) tracking improvements within each role rather than separate product lines.

Model	Type	First released	Role
Gemini Robotics	Vision-language-action	March 2025	Cloud-served VLA that issues motor commands
Gemini Robotics-ER	Vision-language model	March 2025	Embodied reasoning, perception, and planning layer
Gemini Robotics On-Device	Vision-language-action	June 2025	Local VLA optimized for on-robot inference
Gemini Robotics 1.5	Vision-language-action	September 2025	Reasoning-augmented VLA that explains its actions before executing
Gemini Robotics-ER 1.5	Vision-language model	September 2025	High-level planner with tool calling and adjustable thinking budget
Gemini Robotics-ER 1.6	Vision-language model	April 2026	Refresh focused on spatial reasoning, multi-view understanding, and instrument reading

The embodied reasoning models are vision-language models that emit pointing, object detection, success detection, and code rather than continuous joint commands. The vision-language-action models translate pixels and instructions into motor commands directly. In Google's deployment guidance, the two models are used together: the reasoning model produces a plan and calls the action model (or other tools) to execute each step.^[4]^[7]

Architecture

Gemini Robotics models inherit the transformer architecture and the multimodal pretraining data of Gemini 2.0. They are then fine-tuned on robot-specific data, including teleoperated demonstrations on real robots and synthetic trajectories generated in simulation. The technical report describes the resulting system as a generalist VLA that can perform object detection, pointing, trajectory and grasp prediction, multi-view correspondence, and 3D bounding box prediction without task-specific heads.^[16]

Two-model planning and action

From the original announcement onward, Google DeepMind has framed the family as a two-model system. Gemini Robotics-ER handles perception and decision-making: it recognizes elements in the scene, estimates their size and location, predicts grasp points and trajectories, and emits code to execute the action. Gemini Robotics handles execution: it converts the visual context plus an instruction into the joint-level commands that drive the robot.^[2]^[13] The 1.5 generation made this split explicit by giving Gemini Robotics-ER 1.5 native tool-calling abilities so it can call Gemini Robotics 1.5 (or any other VLA) the same way a language agent calls a function.^[4]^[7]

Cloud backbone and on-robot decoder

According to the technical report, the Gemini Robotics VLA is split between a vision-language backbone hosted in the cloud and a small action decoder running on the robot's onboard computer. The team optimized the backbone latency from "seconds to under 160 ms" and reported an end-to-end latency from raw camera observation to a chunk of low-level joint commands of about 250 ms, supporting an effective control frequency of 50 Hz.^[16] This split allows the system to use the full Gemini 2.0 weights for reasoning while still issuing high-frequency motor commands locally, an architecture choice that contrasts with fully on-device VLAs such as Gemini Robotics On-Device or with cloud-only approaches without a dedicated on-robot decoder.

The model emits actions in chunks rather than one step at a time. Action chunking lets the policy plan several timesteps of motion in advance, which Google DeepMind argues helps it produce smoother trajectories and tolerate the network round trip to the cloud backbone.^[16]

Training data

The Gemini Robotics paper describes the training mix as "a large and diverse dataset consisting of action-labeled robot data as well as other multimodal data." The robot-data portion includes thousands of hours of expert teleoperated demonstrations collected over twelve months on ALOHA 2 robots. The multimodal portion includes web documents, code, images, audio, video, and embodied reasoning and visual question answering data inherited from Gemini 2.0 pretraining.^[16] Ablation studies in the same paper found that training a Gemini Robotics specialist model from scratch, rather than fine-tuning the generalist checkpoint, dropped success rates on evaluation tasks to 0%, which the authors interpreted as evidence that the multimodal pretraining is doing most of the heavy lifting for generalization.^[16]

Reasoning before acting

In the 1.5 generation, the VLA was given the ability to think in natural language before producing actions. Google DeepMind described this as helping the robot "assess and complete complex tasks more transparently," and noted that the model could explain its plan in natural language while moving.^[4] On the planning side, Gemini Robotics-ER 1.5 added an adjustable thinking budget that lets developers trade response speed for reasoning depth, and explicit checks against payload limits and workspace constraints to filter physically infeasible plans.^[7]

Cross-embodiment transfer

Google DeepMind has emphasized that a single Gemini Robotics model can drive multiple robot embodiments. Internal evaluations showed that tasks demonstrated only on the bi-arm ALOHA 2 platform during training also worked on Apptronik's Apollo humanoid and on the bi-arm Franka FR3, with no per-robot specialization.^[1]^[4] In the on-device release, Google reported that the model could be adapted to the Franka FR3 and Apollo from ALOHA training data with as few as 50 to 100 demonstrations per new task.^[3]^[17]

On-device variant

Gemini Robotics On-Device is a smaller VLA designed to run locally on the robot's onboard compute, outside of any cloud connection. It is engineered for low-latency inference on bi-arm platforms, with general-purpose dexterity comparable to the cloud-served model on out-of-distribution and multi-step tasks.^[3] InfoQ reported that the on-device model completed evaluation tasks successfully more than 60% of the time on average, against roughly 80% for the cloud variant, with stronger performance than other published on-device VLA baselines.^[17] It was the first VLA Google DeepMind released for fine-tuning by external developers, distributed through the Gemini Robotics SDK with a MuJoCo physics simulator integration.^[3]^[17]

Capabilities

Google DeepMind's public materials describe three recurring properties for Gemini Robotics: generality, interactivity, and dexterity.^[1]^[2]

Capability	Description
Generality	Adapts to new objects, instructions, and environments. Google reported that Gemini Robotics more than doubles the score of other state-of-the-art VLA models on a comprehensive generalization benchmark.^[1]
Interactivity	Responds to commands phrased in everyday conversational language across multiple languages, monitors the scene continuously, and can adjust mid-task when objects move.^[1]^[2]
Dexterity	Performs multi-step manipulation including origami folding, packing snacks into a Ziploc bag, folding clothes, unzipping bags, and pouring salad dressing.^[1]^[3]
Tool use	The 1.5 generation can call digital tools such as Google Search and other VLA models to retrieve information or execute sub-steps.^[4]^[7]
Embodiment transfer	Adapts across ALOHA 2, the bi-arm Franka FR3, and humanoid platforms such as Apptronik's Apollo, often without per-robot training.^[1]^[4]

Gemini Robotics-ER, the reasoning model, contributes a different set of capabilities focused on perception and planning: 2D pointing and object detection grounded in size, weight, and affordance information, multi-view correspondence, success and failure detection from camera streams, and code generation that calls other tools to execute physical actions.^[7]^[16]

Generalization axes

The Gemini Robotics technical report decomposes generalization into three axes that are evaluated separately, a structure that reflects how earlier RT-2 work measured progress on robot foundation models.^[16]

Axis	Definition	Example
Visual generalization	Invariance to visual changes that do not affect the actions required to solve the task	New backgrounds, lighting changes, or distractor objects added to a familiar scene
Instruction generalization	Robustness to paraphrased or differently structured instructions	"Put the banana into the container" versus "Place the yellow fruit inside the clear box"
Action generalization	Ability to adapt or synthesize new motions for tasks the robot has not been trained on	Slam-dunking a new toy basketball, or grasping objects of an unseen shape

Google reported that on a comprehensive generalization benchmark spanning these axes, Gemini Robotics more than doubled the average score of prior state-of-the-art VLA baselines, with the largest gaps appearing on instruction and action generalization rather than visual generalization alone.^[1]^[16]

Demonstrations

The demonstrations released with the original Gemini Robotics announcement included a robotic arm placing a banana into a clear container while the container was repositioned, folding glasses into a case, performing origami, and slam-dunking a small basketball into a net despite never having seen those specific objects in training.^[1]^[10] Apollo, Apptronik's humanoid robot, was shown sorting laundry, placing colored blocks into trays, and loading bread into Ziploc bags.^[15]^[21]

For the 1.5 generation, Google DeepMind highlighted a multi-step waste sorting task in which the robot first looked up local recycling rules using Google Search, then identified each object visually, then placed it in the correct bin: a sequence that required tool use, planning, and physical manipulation in a single mission.^[4]

Benchmarks

Google has published several headline benchmark numbers for the Gemini Robotics family. The numbers below are taken directly from Google DeepMind's announcements and the underlying technical reports.

Model	Benchmark	Result	Source
Gemini Robotics (March 2025)	Comprehensive generalization benchmark	More than 2x average score over prior state-of-the-art VLAs	^[1]
Gemini Robotics-ER (March 2025)	End-to-end robotic control	2x to 3x success rate over Gemini 2.0 baseline	^[1]
Gemini Robotics-ER 1.5 (September 2025)	15 academic embodied reasoning benchmarks (ERQA, Point-Bench, RefSpatial, RoboSpatial-Pointing, Where2Place, BLINK, CV-Bench, EmbSpatial, MindCube, RoboSpatial-VQA, SAT, Cosmos-Reason1, Min Video Pairs, OpenEQA, VSI-Bench)	Highest aggregated score among models tested by Google	^[4]
Gemini Robotics On-Device (June 2025)	Average task success across seven evaluation tasks	About 60% on-device, about 80% for cloud variant	^[17]
Gemini Robotics-ER 1.6 (April 2026)	Instrument reading without agentic vision	86%	^[5]
Gemini Robotics-ER 1.6 (April 2026)	Instrument reading with agentic vision	93%	^[5]
Gemini Robotics models	ASIMOV semantic safety benchmark	Over 80% accuracy on hazardous-scenario questions, including bleach-and-vinegar mixing	^[11]^[22]

The IEEE Spectrum coverage of the April 2026 release reported that ER 1.6 lifted instrument reading accuracy from a Gemini Robotics-ER 1.5 baseline of 23% to 98% when equipped with agentic vision in Boston Dynamics' deployment, illustrating how much the agentic vision pipeline contributes on top of the base model.^[19]

Hardware partners and trusted testers

Gemini Robotics is intentionally embodiment-flexible, but most public demonstrations have used a small set of hardware partners.

Partner	Robot	Role
Google DeepMind in-house	ALOHA 2 (bi-arm research platform)	Primary training and evaluation platform^[1]^[16]
Franka Robotics	Franka FR3 (bi-arm)	Cross-embodiment evaluation; on-device adaptation target^[3]^[17]
Apptronik	Apollo (humanoid)	Lead humanoid partner; demonstrations of laundry sorting, color-block placement, and packing^[15]^[21]
Boston Dynamics	Spot (quadruped)	Industrial inspection, gauge reading, and autonomous navigation with Gemini Robotics-ER 1.6^[6]^[19]
Agile Robots	Industrial bi-arm platforms	Trusted tester for Gemini Robotics-ER^[2]^[8]
Agility Robotics	Digit (humanoid)	Trusted tester for Gemini Robotics-ER^[2]^[8]
Enchanted Tools	Mirokai (mobile humanoid)	Trusted tester for Gemini Robotics-ER^[2]^[8]

Google DeepMind reported that the trusted tester program had grown to over sixty participants by the September 2025 update, with Apptronik named as a continuing partner during that release.^[18]

Spot at Boston Dynamics

Boston Dynamics integrated Gemini Robotics into the Spot SDK by exposing a small set of "tools" (navigation between locations, image capture, object identification, grasping, and placement) that Gemini Robotics could call. The integration deliberately limits the model's capabilities to the existing API surface so that the model cannot invent new actions outside what Spot is sanctioned to do.^[23] On top of this, the AIVI-Learning visual inspection product on Spot and the Orbit fleet manager incorporated Gemini Robotics-ER 1.6 to read analog gauges, thermometers, sight glasses, and digital displays during autonomous patrol.^[6]^[19]

Safety and responsibility

Google DeepMind has framed Gemini Robotics as a robotics program with an explicit safety layer rather than a research demo. The company says its robotics models are reviewed through its Responsibility and Safety Council and evaluated against the ASIMOV benchmark suite for semantic and physical safety constraints.^[1]^[22] In MIT Technology Review's launch coverage, the team described a constitutional approach inspired by Isaac Asimov's laws of robotics that produces a data-driven robot constitution to align behavior with human values.^[10]

The Gemini Robotics 1.5 release upgraded the safety stack in three ways: it added a high-level semantic reasoning step that lets the planner think about safety before acting, aligned the planner's output with the existing Gemini Safety Policies, and triggered low-level on-board safety subsystems for collision avoidance.^[4] An updated benchmark, ASIMOV v2, added broader tail coverage, new safety question types, and video modalities, and Google reported state-of-the-art results on it for Gemini Robotics-ER 1.5.^[4]

For the April 2026 ER 1.6 release, Google reported better adherence to physical constraints (such as weight limits and liquid handling), a roughly 10-percentage-point improvement on video-based safety hazard identification compared to Gemini 3.0 Flash, and stronger compliance with Gemini safety policies on adversarial prompts.^[5]

Comparison with other robotics foundation models

Gemini Robotics sits in a small but growing category of robotics foundation models. The table below summarizes how it compares to the most widely discussed peers as of April 2026.

Model	Developer	Released	Approach	Action representation	Notable embodiments
RT-1	Google Research	2022	Transformer policy on real demonstrations	Discrete tokens	Everyday Robots mobile manipulator
RT-2	Google DeepMind	2023	VLM (PaLI-X / PaLM-E) fine-tuned on robot data	Discrete tokens	Bi-arm research robots
OpenVLA	Stanford and partners	June 2024	Open-source 7B VLA on Open X-Embodiment data	Discrete tokens	22 embodiments via Open X-Embodiment
Pi-0	Physical Intelligence	2024	Diffusion-based VLA with 50Hz continuous joint outputs	Continuous, diffusion-generated trajectories	Multiple bi-arm and humanoid platforms
GR00T N1	NVIDIA	2025	Foundation VLA for humanoids	Continuous joint actions	Humanoid robots
Gemini Robotics 1.5	Google DeepMind	September 2025	VLA on top of Gemini 2.0 with thinking before action	Continuous joint actions	ALOHA 2, Franka FR3, Apptronik Apollo

In architecture terms, Pi-0 emphasizes diffusion-based continuous control and a hardware-agnostic, real-data philosophy, while OpenVLA and the RT line use discrete action tokens. Gemini Robotics shares Pi-0's continuous control philosophy but inherits the much larger Gemini 2.0 base, which gives it stronger world knowledge and tool-use behavior at the expense of fully open weights.^[12] OpenVLA is open source and has been shown to outperform RT-2 on a suite of manipulation tasks despite a smaller parameter count, while Gemini Robotics is closed source and accessed through partner programs and the Gemini API.^[12]

Developer access

Google DeepMind has staggered developer access across the family. The vision-language-action models have generally been gated to partner programs, while the embodied reasoning models have been the first to reach broader developer audiences.

Model	Access route as of April 2026	Notes
Gemini Robotics (March 2025)	Trusted tester program only	Required signup form; available to a small set of partners^[2]^[8]
Gemini Robotics-ER (March 2025)	Trusted tester program only	Same partner program as the VLA^[2]^[8]
Gemini Robotics On-Device (June 2025)	Waitlist, then Gemini Robotics SDK	First VLA in the family released for fine-tuning, distributed with MuJoCo integration^[3]^[17]
Gemini Robotics 1.5 (September 2025)	Select partners only	Continued partner-only distribution for the action model^[4]^[18]
Gemini Robotics-ER 1.5 (September 2025)	Public preview through Gemini API and Google AI Studio	First Gemini Robotics model on the public Gemini API^[4]^[7]
Gemini Robotics-ER 1.6 (April 2026)	Gemini API and Google AI Studio	Sample Colab notebooks and reference integrations published with the release^[5]

Google's developer materials for Gemini 2.5 highlight related primitives that complement Gemini Robotics, including the Live API for real-time voice interaction with robots, function-calling for defining robot APIs as tools, and code-generation patterns for pick-and-place planning. These primitives are exposed through Google AI Studio, the Gemini API, and Vertex AI for any application built on Gemini 2.5, not just the dedicated robotics models.^[25]

Use cases

The public deployments of Gemini Robotics fall into several broad use cases.

Use case	Examples	Models involved
Industrial inspection	Reading analog gauges, thermometers, sight glasses, and digital displays during autonomous patrol with Boston Dynamics Spot	Gemini Robotics-ER 1.6^[6]^[19]
Logistics and manufacturing	Apptronik Apollo trial deployments at Mercedes-Benz, GXO Logistics, and Jabil	Gemini Robotics, Gemini Robotics 1.5^[15]^[21]
Household tasks	Sorting laundry, packing snacks into bags, folding garments, organizing shoes by handwritten instructions	Gemini Robotics, Gemini Robotics 1.5, Gemini Robotics On-Device^[1]^[3]^[23]
Multi-step agentic missions	Sorting trash and recycling using local rules looked up via Google Search, then physically placing each object in the correct bin	Gemini Robotics 1.5 + Gemini Robotics-ER 1.5^[4]
Research demonstrations	Origami folding, slam-dunking small basketballs, drawing cards, pouring salad dressing	Gemini Robotics, Gemini Robotics On-Device^[1]^[3]^[10]

The combination of Gemini Robotics-ER and a vision-language-action model has been pitched as a generic agent loop for embodied tasks: the planner inspects the scene, decomposes the goal into sub-tasks, calls a VLA or external API for each step, and uses success detection to decide whether to retry or move on.^[4]^[7]

Reception

Reception of the launch combined enthusiasm about scope with skepticism about real-world readiness. Stanford bioengineer Jan Liphardt told MIT Technology Review that the missing piece between cognition, large language models, and decision-making was an intermediate level of physical intelligence, and that Gemini Robotics was a credible attempt to fill that gap.^[10] IEEE Spectrum noted that the dexterity demonstrations applied to specific high-quality training data rather than fully general skills, and that the embodied reasoning model's reliance on human-centric training data could produce suboptimal grasps for some robotic end effectors.^[11]

MIT Technology Review's hands-on commentary observed that the demonstrations remained "quite slow and a little janky," while crediting the underlying generalization with a clear step up from prior systems.^[10] Axios characterized the launch as the moment the humanoid robot industry started to converge with frontier AI labs, with Google DeepMind, Apptronik, Boston Dynamics, and Agility Robotics all named in the same announcement.^[24]

Later coverage of Gemini Robotics-ER 1.6 was more pointed about practical impact. IEEE Spectrum and The Robot Report both highlighted the deployment on Boston Dynamics' Spot for industrial inspection as the moment when Gemini Robotics moved from research lab to revenue-generating field robot.^[19]

Limitations

Google DeepMind has acknowledged several limitations of the Gemini Robotics family in its own materials and in interviews. The cloud-served VLA depends on a network connection, which constrains deployments where bandwidth or latency is unreliable; the on-device model addresses this but trades some capability for size, with average task success closer to 60% than to the cloud variant's roughly 80% in Google's own evaluations.^[3]^[17] Generalization gains depend heavily on the training data mix, and dexterous tasks such as origami folding require careful per-task data curation rather than emerging fully zero-shot.^[11]

The reasoning model can hallucinate spatial properties or affordances, and the September 2025 release added explicit physical-feasibility checks to filter out plans the planner generated but the robot could not safely execute.^[7] Demonstrations remain comparatively slow: motions are deliberate rather than human-speed, and several reviewers have noted that the dexterity reels published by Google DeepMind are edited highlights from longer takes.^[10]^[11]

Finally, Gemini Robotics models are largely closed: weights are not published, and access is gated through partner programs, the Gemini Robotics SDK trusted-tester waitlist, and the Gemini API for the embodied reasoning models. This contrasts with peers such as OpenVLA, whose weights and training data are public, and limits independent reproduction of Google's benchmark numbers.^[12]

References

Google DeepMind. "Gemini Robotics brings AI into the physical world." March 12, 2025. https://deepmind.google/blog/gemini-robotics-brings-ai-into-the-physical-world/
Google. "Gemini Robotics and Gemini Robotics-ER are two new Gemini models designed for robotics." March 12, 2025. https://blog.google/feed/gemini-robotics/
Google DeepMind. "Gemini Robotics On-Device brings AI to local robotic devices." June 24, 2025. https://deepmind.google/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/
Google DeepMind. "Gemini Robotics 1.5 brings AI agents into the physical world." September 25, 2025. https://deepmind.google/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/
Google DeepMind. "Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning." April 14, 2026. https://deepmind.google/blog/gemini-robotics-er-1-6/
Boston Dynamics. "Tools for Your To Do List with Spot and Gemini Robotics." https://bostondynamics.com/blog/tools-for-your-to-do-list-with-spot-and-gemini-robotics/
Google DeepMind. "Gemini Robotics-ER 1.5 model page." https://deepmind.google/models/gemini-robotics/gemini-robotics-er/
Google DeepMind. "Gemini Robotics product page." https://deepmind.google/models/gemini-robotics/
Google. "How we built the new family of Gemini Robotics models." https://blog.google/products/gemini/how-we-built-gemini-robotics/
Melissa Heikkila. "Gemini Robotics uses Google's top language model to make robots more useful." MIT Technology Review, March 12, 2025. https://www.technologyreview.com/2025/03/12/1113178/gemini-robotics-uses-googles-top-language-model-to-make-robots-more-useful/
Evan Ackerman. "Gemini Robotics: Google DeepMind's New AI Models for Robots." IEEE Spectrum, March 12, 2025. https://spectrum.ieee.org/gemini-robotics
Pebblous. "Three Teams, Three Robot Brains: A Head-to-Head Comparison of GR00T, Gemini, and Pi Architectures." https://blog.pebblous.ai/report/vla-architecture-comparison/en/
Google. "How we built the new family of Gemini Robotics models." Google Blog. https://blog.google/products/gemini/how-we-built-gemini-robotics/
Google DeepMind. "Gemini Robotics 1.5 model page." https://deepmind.google/models/gemini-robotics/gemini-robotics/
Interesting Engineering. "Apollo humanoid robot tackles unknown objects with Google DeepMind." https://interestingengineering.com/ai-robotics/google-deepmind-apollo-humanoid-robot
Google DeepMind robotics team. "Gemini Robotics: Bringing AI into the Physical World." arXiv preprint 2503.20020, March 25, 2025. https://arxiv.org/abs/2503.20020
Anthony Alford. "Google DeepMind Announces Robotics Foundation Model Gemini Robotics On-Device." InfoQ, July 2025. https://www.infoq.com/news/2025/07/google-gemini-robotics/
Anthony Alford. "DeepMind Releases Gemini Robotics-ER 1.5 for Embodied Reasoning." InfoQ, September 2025. https://www.infoq.com/news/2025/09/deepmind-gemini-robotics/
Eugenio Reyes. "Gemini Robotics-ER 1.6 hits 98% gauge-reading accuracy on Boston Dynamics Spot." Resultsense, April 16, 2026. https://www.resultsense.com/news/2026-04-16-gemini-robotics-er-instrument-reading
Google DeepMind. "Gemini Robotics product family page." https://deepmind.google/models/gemini-robotics/
Eugene Demaitre. "Gemini Robotics 1.5 enables agentic experiences, explains Google DeepMind." The Robot Report, September 25, 2025. https://www.therobotreport.com/gemini-robotics-1-5-enables-agentic-experiences-explains-google-deepmind/
Google DeepMind. "Responsibly advancing AI and robotics." https://deepmind.google/models/gemini-robotics/responsibly-advancing-ai-and-robotics/
Boston Dynamics. "Tools for Your To Do List with Spot and Gemini Robotics." https://bostondynamics.com/blog/tools-for-your-to-do-list-with-spot-and-gemini-robotics/
Ina Fried. "New humanoid robots get smarter with Google's AI." Axios, March 12, 2025. https://www.axios.com/2025/03/12/google-humanoid-robotics-gemini-deepmind
Google for Developers. "Gemini 2.5 for robotics and embodied intelligence." https://developers.googleblog.com/en/gemini-25-for-robotics-and-embodied-intelligence/

Background

Release timeline

Model family

Architecture

Two-model planning and action

Cloud backbone and on-robot decoder

Training data

Reasoning before acting

Cross-embodiment transfer

On-device variant

Capabilities

Generalization axes

Demonstrations

Benchmarks

Hardware partners and trusted testers

Spot at Boston Dynamics

Safety and responsibility

Comparison with other robotics foundation models

Developer access

Use cases

Reception

Limitations

See also

References

Improve this article

Related Articles

SmolVLA

RT-2

ERQA

Gemini 2.5 Pro

Veo 3

Genie 3

Background

Release timeline

Model family

Architecture

Two-model planning and action

Cloud backbone and on-robot decoder

Training data

Reasoning before acting

Cross-embodiment transfer

On-device variant

Capabilities

Generalization axes

Demonstrations

Benchmarks

Hardware partners and trusted testers

Spot at Boston Dynamics

Safety and responsibility

Comparison with other robotics foundation models

Developer access

Use cases

Reception

Limitations

See also

References

Related Articles

SmolVLA

RT-2

ERQA

Gemini 2.5 Pro

Veo 3

Genie 3