Marble (World Labs)
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,478 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
15 citations
Review status
Source-backed
Revision
v1 ยท 3,478 words
Add missing citations, update stale details, or suggest a clearer explanation.
Marble is a multimodal generative world model developed by World Labs, the spatial intelligence startup co-founded by Stanford computer scientist Fei-Fei Li. Marble synthesizes persistent, navigable three-dimensional environments from a range of inputs, including text prompts, single images, multiple images, video clips, panoramas, and coarse 3D layouts. The generated worlds can be edited inside Marble's browser-based tools and exported as Gaussian splats, polygonal meshes, or rendered videos for use in downstream applications such as game development, visual effects, virtual reality, and robotics simulation.
World Labs first opened Marble as a limited beta preview on September 16, 2025, and made the product generally available on November 12, 2025, at the URL marble.worldlabs.ai. The launch made Marble the first commercial product from World Labs and one of the first commercially available world models offering persistent, downloadable 3D outputs rather than streamed video frames. Marble is offered as a freemium service with three paid tiers, and is positioned by the company as an early step toward broader systems for spatial intelligence.
World Labs emerged from stealth in September 2024 with approximately 230 million United States dollars in seed and Series A funding from investors including Andreessen Horowitz, Radical Ventures, and NEA. The company was co-founded by Fei-Fei Li, often referred to as the "godmother of artificial intelligence" for her foundational work on ImageNet, alongside Justin Johnson, Christoph Lassner, and Ben Mildenhall, the latter being one of the researchers behind Neural Radiance Fields (NeRFs).
In December 2024 the company demonstrated an early prototype that could generate an explorable 3D scene from a single image. Throughout the first half of 2025 it published research previews and continued to refine the model, focusing on geometric consistency and the ability to revisit and edit generated environments. On September 16, 2025, World Labs announced the Marble brand and opened a limited beta preview, inviting users via a waitlist to generate their own 3D worlds.
During the beta period, World Labs released several incremental updates. In October 2025 the company shipped what it called "bigger and better worlds," expanding the size and detail of generated environments and integrating its real-time frame model with the persistent generation pipeline. On November 12, 2025, Marble exited beta and became generally available alongside the public introduction of the Chisel editing tool, the Composer mode for combining worlds, and the four-tier subscription plan.
Marble is designed around the production of persistent 3D scenes that remain stable across multiple viewings, in contrast to streaming generators that re-synthesize geometry on every frame. Once a world is generated, it is stored as a 3D asset that can be revisited, edited, expanded, and exported. The system supports a wide range of input modalities and produces several output formats, summarized below.
| Modality | Direction | Notes |
|---|---|---|
| Text prompt | Input | Natural language description; available on all tiers including Free. |
| Single image | Input | Photograph, rendering, or generated image; available on Free tier. |
| Panorama | Input | Equirectangular or 360-degree image; available on Free tier. |
| Multiple images | Input | Several views from different angles for fuller coverage. |
| Video clip | Input | Short video used to infer a scene's structure and style. |
| Coarse 3D layout | Input | Boxes, planes, and imported meshes via the Chisel editor. |
| Gaussian splats | Output | High-fidelity radiance representation in PLY format. |
| High-quality mesh | Output | Triangle mesh in GLB format, roughly 600,000 to 1 million triangles, with texture or vertex color. |
| Collider mesh | Output | Simplified GLB mesh of a few megabytes, intended for physics simulation. |
| Rendered video | Output | Video file with user-controlled camera trajectory. |
Gaussian splat outputs are rendered in Marble's web viewer through World Labs' open-source Spark renderer, which is built on the WebGL graphics stack and runs in modern browsers without dedicated software. The collider mesh is purposely lower resolution because it is meant to define rigid-body boundaries in game engines and robotic simulators rather than to be drawn on screen. The high-quality mesh is supplied in two flavors: a roughly 600,000-triangle version with texture maps and a roughly 1-million-triangle version with vertex color data. Marble takes up to an hour to produce the high-quality mesh in an offline processing step.
Chisel is an experimental three-dimensional editor that ships inside the Marble interface. It allows users to "decouple structure from style," in the company's words, by sketching the bulk geometry of a world out of simple primitives such as boxes and planes, optionally importing existing 3D assets, and then attaching a text prompt that specifies the visual style or thematic content of the scene. Marble combines the coarse layout with the prompt to fill in materials, lighting, vegetation, props, and other detail.
The Chisel workflow is intended to give creators more deterministic control over composition than a pure text-to-3D pipeline. A user can lay out a room with predetermined wall positions, place markers for furniture, and then instruct Marble to interpret the layout in different styles such as a Victorian study, a science-fiction laboratory, or an anime apartment, without changing the underlying geometry. Marble supports a range of aesthetic styles, including cartoon, sci-fi, fantasy, anime, photorealistic, and retro low-poly looks.
Beyond Chisel, Marble offers a suite of AI-native editing operations performed directly on generated worlds. These include local edits such as removing or swapping objects, and larger transformations such as restyling the entire scene, replacing materials, or reorganizing the spatial structure. Edits can be triggered by text prompts or by image references and operate on the persistent representation rather than on a 2D video frame.
Marble also supports scene expansion, in which a user selects a region of the world and asks the model to extend it outward, generating new traversable space that joins seamlessly with the existing geometry. Expansion can also be used to add finer detail to a specific area of a scene without regenerating the rest. A related feature, Composer mode, lets users place multiple separately generated worlds inside a shared coordinate frame and stitch them together into one larger environment, which is useful for building game levels or virtual production sets that exceed the size of a single generation.
For video outputs, Marble provides what World Labs calls pixel-accurate camera control, allowing users to plot a precise camera trajectory through the world and render it to a video file. Generated videos can be further enhanced with detail passes, artifact removal, and dynamic elements such as drifting smoke or moving water while preserving the user-specified camera path.
In October 2025 World Labs published a separate model called RTFM, short for Real-Time Frame Model. RTFM is a real-time generative system that produces video frames on the fly as a user navigates a virtual space, running on a single NVIDIA H100 GPU. Unlike Marble, RTFM does not output a downloadable 3D asset; instead it treats each rendered frame as a posed observation, using previously generated frames as a kind of spatial memory to maintain consistency over long interactions.
World Labs has described RTFM and Marble as complementary parts of the same product family. Marble is the persistent, editable representation that can be exported and integrated with other tools, while RTFM is the lightweight real-time engine that can stream a continuous experience without holding the full scene in memory. World Labs has integrated RTFM with Marble so that interactive previews inside the Marble interface can be powered by the real-time model. As World Labs co-founder Justin Johnson summarized in interviews around the public launch, Marble emphasizes "fidelity, control, and reusability," while RTFM emphasizes "latency and interactivity."
Marble is offered with a freemium plan and three paid subscription tiers. Generation budgets are measured per month and scale with subscription cost. The Free tier accepts only text, single images, and panoramas as inputs, while paid tiers unlock multi-image and video inputs, the Chisel editor, scene expansion, the Composer mode, and commercial usage rights.
| Tier | Price (USD per month) | Generations | Notable features |
|---|---|---|---|
| Free | 0 | 4 | Text, image, and panorama inputs; web viewer access. |
| Standard | 20 | 12 | Multi-image and video inputs; advanced editing; Chisel. |
| Pro | 35 | 25 | Scene expansion; Composer mode; commercial rights. |
| Max | 95 | 75 | All features; highest generation budget; priority processing. |
The pricing structure was announced alongside the public launch on November 12, 2025. World Labs has indicated that enterprise arrangements and higher-volume access are available outside the standard tiers, and the company has published documentation at docs.worldlabs.ai covering the export pipelines for Gaussian splats and meshes.
World Labs and its early customers have publicized several application areas for Marble. Co-founder Justin Johnson has emphasized that the product is intended as a complement to existing production pipelines, not a replacement for them, and the most prominent use cases sit in adjacent media and simulation industries.
Marble's mesh and splat exports are directly compatible with the standard formats used by game engines such as Unity and Unreal Engine, and World Labs has highlighted workflows for importing Marble worlds into both engines as well as Blender and Houdini. The collider mesh export is specifically intended for game physics: it provides a lightweight invisible geometry that defines surfaces for collision detection, while the higher-fidelity Gaussian splat or textured mesh is used for rendering. World Labs has demonstrated first-person prototypes in which Marble splats are paired with collider meshes to support character locomotion and shooting interactions.
Johnson has framed Marble's role in gaming as the production of background environments and ambient spaces rather than full playable levels, with hand-authored gameplay logic and characters added on top of Marble assets.
Marble's pixel-accurate camera control is positioned as a response to a long-standing problem in AI-generated video, where camera motion is inconsistent or impossible to plan precisely. Because Marble produces an actual 3D scene rather than a frame-by-frame video stream, a virtual camera can be moved through it like a physical camera, supporting traditional cinematography techniques such as dollies, cranes, and parallax. World Labs has published case studies in its "Bringing Marble to Life" blog series in which independent filmmakers and visual effects artists use Marble to scout and render environments that would otherwise require manual modeling.
Every Marble world is viewable in virtual reality through compatible headsets, including Apple's Vision Pro and Meta's Quest 3. Because the underlying representation is geometrically consistent, VR users can move within a scene with full six-degree-of-freedom tracking rather than the limited window of a 2D video. World Labs has positioned VR as a primary distribution channel for creator-made Marble worlds, particularly for therapeutic, educational, and cinematic experiences.
Alongside Marble's general release, NVIDIA published a technical workflow showing how Marble outputs can be imported into Isaac Sim, NVIDIA's robotics simulation platform. The workflow exports Gaussian splats in PLY format and collider meshes in GLB format from Marble, converts the PLY data to the OpenUSD format using NVIDIA Omniverse's NuRec 3DGRUT algorithm, and aligns the geometry and physics inside Isaac Sim. NVIDIA reported that this enables environment construction in hours rather than weeks of manual modeling, and demonstrated controlling its Nova Carter robot inside a Marble-generated photorealistic scene.
The robotics application is especially relevant because high-quality, photorealistic training environments are scarce, and the cost of capturing or modeling real spaces is significant. Fei-Fei Li has linked this use case to her broader thesis about spatial intelligence, arguing that machines need the ability to understand and generate three-dimensional space in order to act in it.
World Labs has not published a detailed technical paper describing the full architecture of Marble at the time of the public launch. The company has, however, described several design principles that distinguish Marble from contemporaneous world models. Marble is described as a multimodal system that accepts a heterogeneous mix of inputs and produces an internal 3D representation rather than emitting pixels directly. The representation can be materialized into a Gaussian splat radiance field, a polygonal mesh, or a sequence of rendered frames at export time.
Marble's outputs leverage Gaussian splatting, a 3D scene representation popularized in 2023 in which a scene is encoded as millions of anisotropic semi-transparent particles. Gaussian splats are well suited to Marble's goals because they can be rendered in real time in web browsers, support high visual fidelity, and can be edited locally without retraining a large neural network. The Spark renderer that Marble uses for its in-browser previews is published under an open-source license by World Labs.
For the mesh pathway, Marble converts the underlying representation into a triangle mesh through a separate offline pipeline. The triangle mesh is provided in two variants of different complexity to support different downstream applications, and a simplified collider mesh is supplied for physics use.
The 2024 to 2025 period saw the rapid emergence of generative world models from multiple leading research organizations. These systems differ along several axes, including whether they emit a downloadable 3D asset or a stream of video, whether they target real-time interaction, and whether their primary use case is media production, robotics training, or general AI representation learning.
| Model | Developer | Output | Real-time interaction | Primary positioning |
|---|---|---|---|---|
| Marble | World Labs | Persistent 3D world: Gaussian splats, meshes, video | Real-time preview via RTFM integration | Editable, exportable worlds for games, VFX, VR, robotics |
| RTFM | World Labs | Streamed video frames | Yes, single H100 GPU | Real-time interactive exploration |
| Genie 3 | Google DeepMind | Streamed video frames | Yes, research preview | Long-horizon consistent interactive worlds for research |
| V-JEPA 2 | Meta AI | Latent video predictions; not user-facing 3D | No interactive 3D output | Pretrained representations and robot planning |
| NVIDIA Cosmos | NVIDIA | Generated video frames and embeddings | Limited; offline batch | World foundation models for physical AI development |
Marble is most directly comparable to Google DeepMind's Genie 3, which was released as a research preview in August 2025. Genie 3 is notable for solving the long-horizon consistency problem in real-time generated worlds, ensuring that scenery does not drift or disappear when the camera turns around. However, Genie 3 generates pixels frame by frame and does not expose a downloadable 3D asset, which limits its use in production pipelines that depend on standard formats such as GLB or PLY. Marble's positioning, by contrast, emphasizes downloadable, editable, geometrically explicit outputs at the cost of generating each world up front rather than on the fly.
Meta AI's V-JEPA 2, released in 2025, is a different style of world model. It is a self-supervised video prediction system trained on more than one million hours of video, and is positioned primarily as a backbone for action anticipation and zero-shot robotic planning. V-JEPA 2 does not generate a user-facing 3D scene; instead, it predicts latent representations of future video that downstream agents can plan against. Internal benchmarks reported by Meta indicate that planning with V-JEPA 2 runs significantly faster than planning with NVIDIA's Cosmos because V-JEPA 2 reasons in a learned latent space rather than synthesizing pixels.
NVIDIA Cosmos, introduced at the Consumer Electronics Show in January 2025, is a family of world foundation models that includes Cosmos Predict for future-frame generation, Cosmos Transfer for video-to-world editing, and Cosmos Reason, a vision-language model for robot planning. Cosmos is targeted primarily at physical AI workflows such as autonomous vehicles, humanoid robots, and industrial automation, and operates as a backend for developers rather than as a consumer-facing creative tool. Marble is in turn integrated with NVIDIA's Isaac Sim, suggesting that the two ecosystems are increasingly complementary rather than directly competitive.
Other early entrants in the consumer-facing world model space include Decart and Odyssey, both of which were available as web demos through 2025. Like Genie 3, those systems emphasize real-time streamed exploration rather than downloadable 3D assets.
Marble's general release on November 12, 2025, drew substantial coverage in both technology and entertainment trade publications. TechCrunch described Marble as a product that "speeds up the world model race" and noted that the persistent, exportable output places World Labs ahead of competitors who have focused on real-time streamed video. SiliconANGLE characterized Marble as an inflection point in the commercialization of spatial intelligence, while Radiance Fields highlighted Marble's use of Gaussian splats as a step forward for 3D content creation accessible through the web browser.
Fei-Fei Li framed the launch in terms of her long-running argument that artificial intelligence systems need to develop spatial intelligence to be genuinely useful in the physical world. In statements accompanying the launch she described Marble as "the first step toward creating a truly spatially intelligent world model" and emphasized that the implications extend beyond gaming and creative tools to science, medicine, and robotics.
Industry reaction has been mixed in some quarters. A 2025 Game Developers Conference survey found that about one third of respondents believed generative AI has a negative impact on the gaming industry, citing concerns about intellectual property, energy consumption, and asset quality. Justin Johnson has responded to these concerns by characterizing Marble as a tool for environment and ambient asset generation rather than a substitute for human level designers or artists.
Researchers have also positioned Marble within a broader 2025 inflection in world model research. Public commentary has contrasted Fei-Fei Li's emphasis on generative 3D content for creators with Meta chief AI scientist Yann LeCun's view that the most important world models are those, like V-JEPA 2, that learn latent representations of physics for embodied reasoning. The two positions are not mutually exclusive, but they imply different research and product trajectories.
For World Labs, Marble represents the company's transition from a research stage startup to a revenue-generating product company. The company has framed Marble as both a commercial product and a stepping stone toward a more general spatial intelligence platform. Subsequent releases are expected to deepen interactivity, lengthen the temporal horizon of generated experiences, and broaden the set of domains in which generated worlds can be used. World Labs has been explicit that real-time, persistent, and physically grounded world models are converging into a single product family.
For the wider industry, Marble's release marks one of the first commercial deployments of a generative model whose primary output is a downloadable 3D asset rather than text or pixels. By making persistent, editable 3D worlds available through a subscription priced comparably to other creative AI tools, Marble lowers the cost and time required for environment creation in games, film, VR, and simulation. Whether this leads to broader adoption depends on continued improvements in fidelity, the resolution of intellectual property questions around training data, and the willingness of established creative pipelines to integrate Gaussian splat assets alongside conventional polygonal meshes.