Project Astra
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,515 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,515 words
Add missing citations, update stale details, or suggest a clearer explanation.
Project Astra is a research prototype from Google DeepMind exploring what a universal AI assistant might look like: a single agent that can see and hear the world in real time, hold a natural spoken conversation, remember what it has just encountered, and take actions on a user's behalf. It was first shown at Google I/O in May 2024 and has served since then as a testbed whose capabilities feed into shipping Google products rather than as a standalone app of its own.[1][2]
By early 2024, Google had consolidated much of its AI research under Google DeepMind and was racing to turn its Gemini family of multimodal models into consumer products. The company framed Project Astra as the next step in that effort: an attempt to build agents that perceive and reason about their surroundings continuously, the way a person does, rather than answering isolated text prompts. DeepMind chief executive Demis Hassabis described it as "a new initiative within DeepMind to create AI-powered apps and agents for real-time, multimodal understanding," and put the long-term ambition plainly: "Imagine agents that can see and hear what we do, better understand the context we're in and respond quickly in conversation."[1]
The timing was pointed. OpenAI had unveiled its multimodal GPT-4o model a day before Google I/O, demonstrating a fast, spoken, camera-aware assistant of its own, so Project Astra arrived as a direct answer to that announcement.[3]
Hassabis introduced Project Astra during the Google I/O keynote on May 14, 2024.[1][3] The reveal centered on a single continuous video clip, which Google said was recorded in one take. In it, a person walks around an office holding up a phone with the camera running and talks to the assistant without pause.[3][4]
Across the demo Astra identifies objects the camera passes over and reasons about them on the fly. Asked what part of a setup makes sound, it picks out a speaker; when the user circles the top of the speaker and asks what that component is, it answers that the part is a tweeter. It reads and explains a snippet of code shown on a monitor, identifies a neighborhood from the view out a window, and composes a band name for a toy. The moment most often quoted came when the user asked where she had left her glasses: Astra recalled having seen them earlier on a desk, even though they were no longer in view, illustrating a short rolling visual memory of what the camera had recently captured. The clip then continued through a pair of prototype smart glasses, suggesting how the same assistant might work in a wearable form factor.[3][4]
Two themes ran through the presentation. The first was low latency. DeepMind said it had built prototype agents that process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching that information for efficient recall, so the assistant could respond in conversation without an awkward delay.[1] The second was that this was research, not a product launch. Google labeled Astra a prototype and said some of its capabilities would begin reaching Gemini products later in the year.[1][2]
Project Astra is built around a few interlocking ideas that distinguish it from a conventional chatbot.
| Capability | What it does |
|---|---|
| Live multimodal input | Continuously processes streaming camera video and audio together rather than single images or one-off prompts.[1] |
| Low-latency speech | Responds in spoken conversation at roughly the cadence of human dialogue, with the ability to be interrupted.[1][5] |
| Memory | Maintains a rolling memory of recent events and, in later versions, longer in-session recall and some memory across past conversations.[4][5] |
| Tool use | Calls on services such as Google Search, Lens and Maps to ground its answers.[5] |
| Agentic action | In its 2025 form, can carry out multi-step tasks, navigate interfaces and control a device on the user's behalf.[6][7] |
The underlying intelligence comes from Google's Gemini models. At unveiling, DeepMind principal scientist Oriol Vinyals described Astra as "a real-time voice interface" with "extremely powerful multimodal capabilities combined with long context," drawing on Gemini 1.5 Pro's large context window.[1] In December 2024 Astra was rebuilt on Gemini 2.0 Flash, Google's model family for what it called the agentic era.[5]
Project Astra was conceived less as something users would download than as a research engine whose features migrate into Google's existing apps once they are ready. Google has repeatedly described its consumer features as having been "first explored" in Astra.[2][6]
The clearest example is Gemini Live, the conversational voice mode in the Gemini app. Google announced Gemini Live at its August 13, 2024 hardware event as a low-latency, free-flowing voice interface, initially for paying subscribers.[8] The real-time camera and screen-sharing input that Astra demonstrated then made its way into Gemini Live: those visual features began rolling out to subscribers on Android around March 2025, were made free for Android users in April, and were extended free to both Android and iOS at I/O 2025, with the iPhone rollout starting May 20.[9][10] Google described each of these visual capabilities as powered by Project Astra.[9]
Astra also fed the broader Gemini 2.0 push into agentic AI. In the December 11, 2024 Gemini 2.0 announcement, Google detailed an upgraded Astra prototype and unveiled a sibling research project, Project Mariner, an agent that operates a web browser to complete tasks.[5] The December update gave Astra up to 10 minutes of in-session memory plus recall of some earlier conversations, the ability to converse in multiple and mixed languages with better handling of accents, native tool use across Search, Lens and Maps, and lower-latency streaming audio.[5] Google also began testing Astra on prototype glasses with a small group of trusted testers, alongside the Android phone testers already trying it.[5]
At Google I/O on May 20, 2025, Google folded Project Astra into a larger pitch: turning the Gemini app into a universal AI assistant. Hassabis again set the goal as building "a universal AI assistant that will perform everyday tasks for us," and the company tied Astra's research to its work extending Gemini toward a "world model" that can plan and reason about the physical environment.[6][2]
The 2025 updates concentrated on three areas. Voice output was upgraded to more natural-sounding native audio. Memory was improved well beyond the early prototype, which Google and press noted had originally retained context only briefly, so the assistant could carry details across a longer interaction; in one demo it recalled a pet's name mentioned earlier in order to recommend a suitable bike basket. And Astra gained computer control, letting it take actions, navigate apps and interfaces, look things up, and even place phone calls to businesses to complete a multi-step request such as fixing a bike.[6][7]
Google said these capabilities would arrive across several surfaces over time rather than as one product: enhanced versions of Gemini Live, new experiences in Search such as a live-camera "Search Live" feature in AI Mode, an updated Live API for developers with audio-visual input and native audio output, and new form factors including Android XR glasses being developed with partners.[6][7] The DeepMind site lists a Trusted Tester program, including recruitment of blind and low-vision users, as part of how it continues to develop the prototype.[2]
Project Astra was widely judged the standout of Google I/O 2024, with coverage casting it as Google's credible response to OpenAI's GPT-4o and praising the smooth, low-latency video demo even as reporters cautioned that a polished one-take clip is not the same as a shipping product.[3][4] Commentary in 2024 and 2025 tended to track the same tension: enthusiasm that Astra pointed toward a genuinely new kind of always-on, see-and-hear assistant, set against skepticism about how much of the demo would translate into reliable everyday use and on what timeline.[3][6] By I/O 2025, observers noted that the once-speculative prototype was steadily materializing inside real Google products, particularly Gemini Live, even though the most ambitious pieces, agentic control and the smart-glasses hardware, remained early and largely confined to testers.[6][9]