World Labs
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,407 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,407 words
Add missing citations, update stale details, or suggest a clearer explanation.
World Labs is an American artificial intelligence company headquartered in San Francisco, California. Founded in early 2024 by computer scientist Fei-Fei Li, University of Michigan professor Justin Johnson, former Meta Reality Labs researcher Christoph Lassner, and Google DeepMind alumnus Ben Mildenhall, the company develops what it calls "Large World Models" (LWMs), generative artificial intelligence systems that can perceive, generate, reason about and interact with three-dimensional environments. World Labs positions itself at the center of an emerging field known as spatial intelligence, a term popularized by Li as the spatial counterpart to language intelligence.
World Labs emerged from stealth in September 2024 with a $230 million Series A round at a valuation of slightly over one billion dollars, making it one of the fastest companies in history to reach unicorn status. In November 2025 the company released its first commercial product, Marble, a multimodal world model that generates persistent, downloadable three-dimensional environments from text, images, video or coarse 3D layouts. A month later it previewed RTFM (Real-Time Frame Model), a research system that streams interactive 3D scenes on a single GPU. In February 2026 the company closed a follow-on funding round of more than one billion dollars led by a consortium that included AMD, Autodesk, NVIDIA, Fidelity Management and Research Company, and the Singapore-based conglomerate Sea.
The company is generally regarded as one of the four pillars of the so-called world-model race, alongside Google DeepMind's Genie 3, Meta's V-JEPA 2 and NVIDIA's Cosmos platform.
World Labs was incorporated in January 2024 in Palo Alto, California. The company was the result of roughly a year of conversations between Fei-Fei Li and her former Stanford PhD student Justin Johnson, both of whom had become convinced that the next frontier of artificial intelligence would not be language but the modeling of physical space. Li, who had been on a sabbatical from Stanford to launch the university's Institute for Human-Centered AI (HAI), framed the project as the construction of systems that move "from seeing to doing, from perceiving to reasoning, and from imagining to creating."
The quartet of co-founders combined four distinct lineages within modern computer vision and graphics. Fei-Fei Li had previously built ImageNet, the labeled dataset that catalyzed the deep-learning revolution of the early 2010s. Justin Johnson had co-authored the influential paper "Perceptual Losses for Real-Time Style Transfer" and led work on visual question answering. Ben Mildenhall was the lead author of the 2020 paper that introduced Neural Radiance Fields (NeRF), the volumetric scene representation that ignited the modern neural-rendering research line. Christoph Lassner had previously worked at Meta Reality Labs Research, where he co-developed Pulsar, a sphere-based differentiable renderer used in some of the earliest commercial avatars for VR headsets.
The company's founding investors included Andreessen Horowitz, Radical Ventures, NEA, Adobe Ventures and NVentures (NVIDIA's venture arm). Marc Benioff, Geoff Hinton, Eric Schmidt and Jeff Dean participated as individual angels. Within nine months of incorporation the company had hired roughly thirty researchers, many of them from Google, Meta Reality Labs and Stanford.
| Name | Role | Prior affiliation | Notable prior work |
|---|---|---|---|
| Fei-Fei Li | Co-founder, CEO | Stanford University (HAI), Google Cloud | ImageNet, CS231n, "AI4ALL" |
| Justin Johnson | Co-founder | University of Michigan | Perceptual losses, visual question answering, CLEVR dataset |
| Ben Mildenhall | Co-founder | Google Research | Lead author, NeRF (2020); RawNeRF; MipNeRF |
| Christoph Lassner | Co-founder | Meta Reality Labs Research | Pulsar renderer; photorealistic avatars |
World Labs has raised capital across two publicly disclosed rounds, both at multi-unicorn valuations. The September 2024 Series A was widely covered as one of the largest seed-to-Series-A jumps for a vision-focused start-up, and the February 2026 round drew unusual attention because it brought together two semiconductor rivals, AMD and NVIDIA, as co-investors in the same private company.
| Date | Round | Amount | Valuation | Selected investors |
|---|---|---|---|---|
| September 13, 2024 | Series A | $230 million | $1.0+ billion | Andreessen Horowitz, Radical Ventures, NEA, NVentures, Adobe Ventures, Marc Benioff |
| February 19, 2026 | Series B | $1.0+ billion | Approximately $4 billion (reported) | AMD, NVIDIA, Autodesk, Fidelity, Sea Ltd., Emerson Collective |
The February 2026 round included a $200 million commitment from Autodesk, the largest single investment by Autodesk in any external start-up at that point. Under the terms of the agreement, Autodesk became both a strategic investor and an industrial adviser, with the explicit goal of integrating Marble's outputs into the company's architecture, engineering and construction (AEC) software pipelines. The Singapore-based conglomerate Sea, parent of Garena and Shopee, joined for its gaming and entertainment exposure. Both AMD and NVIDIA, despite being competitors in the GPU market, took equity stakes, reflecting how attractive spatial-intelligence research has become for makers of accelerated hardware.
World Labs has been one of the most aggressive popularizers of the phrase "spatial intelligence," which Fei-Fei Li first used in a TED talk in April 2024 and elaborated in a Time magazine essay later that year. The company defines spatial intelligence as the ability of an AI system to construct an internal, three-dimensional, physically coherent representation of the world, to reason about it geometrically, and to act upon it. In contrast to large language models, which compress text into a one-dimensional sequence, World Labs argues that any agent operating in the real world (a robot, a self-driving car, a mixed-reality headset, a video-game character) requires a model that natively understands volume, occlusion, parallax, lighting and motion.
The company refers to its in-house systems as Large World Models (LWMs), a deliberate echo of the phrase Large Language Model. An LWM, in the company's framing, is a foundation model whose primary modality is 3D space rather than tokens. LWMs are evaluated on three properties:
These three properties are reflected in the design of the company's two flagship systems, Marble and RTFM.
World Labs has released two distinct generative systems with overlapping but separate intended uses. Marble is a persistent-world generator aimed at content creation, and RTFM is a real-time interactive renderer aimed at exploration and embodied agents.
| Product | First release | Modality | Persistence | Primary use cases | Access |
|---|---|---|---|---|---|
| Marble | November 12, 2025 | Text, image, video, 3D layout to persistent 3D | Persistent, downloadable | Game and film prototyping, VR, robotic sim, AEC | Free tier and paid subscription |
| RTFM | October 16, 2025 (research preview) | Single image to streamed video frames | Persistent via spatial memory | Real-time exploration of generated and real-world scenes | Public web demo |
Marble, launched on November 12, 2025 after a two-month closed beta, is World Labs' first commercial product and the first generative world model to be sold to the public as a finished software service rather than a research preview. Users can provide the system with a text prompt, a single photo, a small set of photos, a video, a panoramic image, or a coarse 3D layout, and Marble produces a complete, spatially consistent 3D environment that can be navigated in a browser, exported to game engines, or viewed in VR.
A distinctive feature of Marble compared with earlier world models is that its outputs are not generated frame-by-frame at runtime but are baked into persistent assets. Once generated, a Marble world can be downloaded as a Gaussian splat (the native representation), as a triangle mesh (both a low-fidelity collider mesh and a higher-quality variant) or as a high-resolution video with pixel-accurate camera control. The mesh export, in particular, allows Marble worlds to be loaded directly into Unreal Engine, Unity, Blender or NVIDIA Isaac Sim. Marble worlds can be viewed in stereoscopic 3D on the Apple Vision Pro and Meta Quest 3 headsets at launch.
Marble ships with two editing systems. The first is a conventional AI-native editor, which allows users to modify the world through natural-language instructions ("remove the lamp," "make the room snowy," "add a doorway here"). The second is Chisel, an experimental hybrid 3D editor in which the user lays out coarse spatial structure using primitive boxes, planes and walls, and then writes a text prompt that styles the interior. Chisel decouples structure from style and is the feature most often highlighted by World Labs as the proof that Marble is not merely a text-to-3D generator but a controllable spatial authoring tool.
Marble launched with a free tier that allows a limited number of world generations per month at reduced resolution, and paid tiers for individuals, studios and enterprises. In an update branded Marble 1.1, released in early 2026, the company added auto-expansion, a feature that lets a generated world grow outward beyond its original bounds when the user attempts to navigate past its edges. A composition mode, also added in 2026, allows multiple generated worlds to be stitched together into a single contiguous environment.
RTFM (Real-Time Frame Model), unveiled in October 2025 alongside the Marble beta, is World Labs' first publicly demonstrated real-time generative world model. Unlike Marble, which bakes a world ahead of time, RTFM generates each video frame on the fly in response to the user's viewpoint, running at interactive frame rates on a single NVIDIA H100 GPU. The model takes one or more input images, optionally with associated camera poses, and produces a stream of new frames that depict the scene from new viewpoints.
The defining technical contribution of RTFM is what World Labs calls spatial memory. Rather than maintaining a temporal buffer of past frames as a recurrent video model would, RTFM associates each generated frame with a pose in three-dimensional space. When the user requests a new viewpoint, the model retrieves nearby frames from its spatial memory and uses them as conditioning context. This design, which the company describes as "context juggling," allows the model to produce scenes that remain three-dimensionally consistent over arbitrarily long exploration sessions, without ever constructing an explicit mesh or radiance field. RTFM's training corpus is a large collection of posed video, and the model has empirically learned to reproduce phenomena such as specular reflections, glossiness and dynamic shadows that traditional explicit 3D representations struggle to capture.
RTFM was released as a public web demo at rtfm.worldlabs.ai. As of mid-2026, RTFM remains a research preview and is not part of Marble's paid tiers, although World Labs has indicated that real-time generation will eventually be integrated with Marble for embodied applications.
The company has positioned its world models for several markets where 3D content is currently expensive or slow to author.
Game studios have been among Marble's most visible early adopters. A Marble world can serve as an inspirational "greybox" for level designers, as a fully textured background environment, or, with the mesh export and Chisel, as a playable area. The platform's ability to round-trip into Unreal Engine and Unity, combined with its native VR support, has made it attractive to small and mid-sized studios that lack large environment-art teams. Sea, which invested in World Labs in 2026, has been particularly vocal about its plans to use Marble for the rapid creation of game backdrops within its Garena studios.
World Labs has partnered with HTC VIVE to bring Marble-generated environments into virtual-production stages, where they can be displayed on LED walls behind live actors. Because Marble worlds are persistent and physically consistent, they can be photographed from multiple angles within a single shoot without re-rendering, an advantage over earlier text-to-video models that produced views without underlying geometry.
In November 2025 NVIDIA published a technical blog and tutorial demonstrating that Marble's exported meshes could be ingested directly into Isaac Sim, the company's robotics simulator, where they could be used to train manipulation and navigation policies. Because each Marble world is a self-consistent 3D asset, robot policies trained inside it can be transferred via standard sim-to-real techniques. World Labs has subsequently released a dedicated robotics case study describing how its environments can scale the supply of training scenes for general-purpose robotic foundation models.
Autodesk's $200 million investment in February 2026 was explicitly oriented toward integrating Marble into AEC pipelines. The company has begun demonstrations in which a single sketch or photograph of a real building is converted into a navigable 3D model that can be loaded into Revit, Maya or Forma. World Labs and Autodesk are jointly developing extensions that would allow generated 3D models to be tagged with BIM (Building Information Modeling) metadata.
Because every Marble output can be rendered as Gaussian splats in stereoscopic 3D, World Labs has marketed Marble heavily to VR content creators. Worlds can be viewed natively on the Apple Vision Pro and Meta Quest 3, and a partnership with HTC VIVE supports the VIVE XR Elite. The company has described VR as the most natural display surface for spatial intelligence and as a category that benefits disproportionately from generative environment creation, since hand-authored VR scenes are expensive to produce.
During 2025 and 2026 the phrase "world model" became one of the central rallying points of frontier artificial intelligence research, attracting investment from many of the largest technology companies. World Labs is generally compared with three other groups, each pursuing a different architecture and a different commercial intent.
| Lab | Flagship system | Primary modality | Persistence | Real-time? | Intended use |
|---|---|---|---|---|---|
| World Labs | Marble, RTFM | 3D scenes from text, image, video, layout | Persistent (Marble); spatial memory (RTFM) | RTFM is real-time | Content creation, VR, robotic sim, AEC |
| Google DeepMind | Genie 3 | Interactive video from text | Several minutes of memory | Yes, 24 frames per second | Generalist research, agent training |
| Meta AI | V-JEPA 2 | Joint embedding predictive video | Latent only, not pixel | Yes (planning) | Robot policy learning, action anticipation |
| NVIDIA | Cosmos (Predict, Transfer, Reason) | Video, multi-modal | Up to 30 seconds | Yes | Industrial simulation, AV, robotics |
| Wayve | GAIA-2 | Driving video | Several seconds | Yes | Autonomous-vehicle training |
World Labs occupies a distinctive position in this landscape because its flagship product, Marble, is the only one that produces a downloadable, asset-style output rather than a stream of pixels. Where Genie 3, NVIDIA's Cosmos Predict and Wayve's GAIA-2 all generate video at runtime, Marble bakes a world that can be used by other tools. The company has argued that this architectural choice is what gives Marble its commercial traction, because game engines, robotic simulators and AEC software are all built around discrete assets rather than generated video.
RTFM, by contrast, is World Labs' answer to the real-time, on-the-fly world models. Compared with Genie 3's roughly 24 frames per second navigable video, RTFM's design replaces the recurrent video buffer with an explicit spatial memory, a choice that the company believes will scale better to large environments. Compared with V-JEPA 2, RTFM is pixel-level and is designed for human consumption rather than for latent action prediction by robots. Compared with Cosmos, RTFM is more compact (running on a single H100) but produces no explicit 3D representation.
World Labs publishes a steady stream of technical writing on its company blog and has open-sourced two notable software libraries. Spark is an open-source renderer for Gaussian splats, released in 2025 to make Marble exports easy to display on the web. Spark is designed to be embeddable in any WebGL application and supports a streaming variant for incrementally loaded worlds. The company has also released a developer API for Marble, which exposes the generation and editing pipeline programmatically and allows third-party integrations such as the NVIDIA Isaac Sim plug-in.
The research arm of the company has emphasized scaling laws for spatial models, the construction of large-scale posed-video datasets, and the integration of explicit and implicit 3D representations. Christoph Lassner has led much of the renderer work, drawing on his Pulsar background. Ben Mildenhall has continued to work on neural representations, building on his NeRF and MipNeRF work. Justin Johnson has led work on multimodal control. Fei-Fei Li has set strategic direction, in particular pushing the company toward what she has publicly called "the AlexNet moment for spatial intelligence," a moment she expects within the next two to four years.
World Labs' two billion-dollar valuation jumps (in 2024 and 2026) have made it one of the most discussed AI start-ups outside the language-model space. Coverage in TechCrunch, Fast Company, the Financial Times and the New York Times has tended to emphasize three themes: the perceived shift of investor attention from text to space, the gravity of having a co-founder team that includes a NeRF author and the creator of ImageNet, and the speed with which Marble moved from research preview to commercial product. Industry analysts at Bessemer Venture Partners and a16z have argued that world models are likely to be the most important enabling technology for general-purpose robotics and that World Labs is best positioned among the start-ups in the space because of the persistence of its outputs.
Critics have raised several concerns. The first is data provenance: like other generative systems trained on large web video corpora, Marble and RTFM raise unresolved copyright questions, and the company has been vague in public communications about its training corpus. The second is evaluation: there is no consensus benchmark for the quality of a generated 3D world, and reviewers have noted that Marble's outputs, while striking, can still contain geometric inconsistencies near the edges of generated regions. The third is competition with open-source NeRF and Gaussian-splat ecosystems, which can produce high-quality reconstructions from real captures without the need for a proprietary model.
Fei-Fei Li has responded to such critiques by emphasizing that World Labs is building a category of system, not a single model, and by framing spatial intelligence as a multi-decade research program rather than a single product. In a February 2026 interview with the Financial Times, she argued that the success of language models has "created the misconception that intelligence is one-dimensional" and that the next great challenge of artificial intelligence is to build agents that "think in three dimensions, plus time."