HY-World 2.0
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,394 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,394 words
Add missing citations, update stale details, or suggest a clearer explanation.
HY-World 2.0 (also written HunyuanWorld 2.0 or Hunyuan World Model 2.0) is an open multimodal 3D world model from Tencent's Hunyuan team, with a technical report and first code release published on April 16, 2026.[1][2] The model takes text, single images, multi-view images, or video as input and produces persistent 3D worlds that can be exported as polygon meshes, 3D Gaussian splats, and point clouds. Its main departure from the video-based world models that dominated 2025 is that it outputs editable 3D geometry directly, so a generated scene can be dropped into Blender, Unreal Engine, Unity, or NVIDIA Isaac Sim rather than played back as a fixed clip.[1]
HY-World 2.0 is described by Tencent as a multimodal world model for reconstructing, generating, and simulating 3D worlds.[1] It is not a single network but a framework of cooperating components, released under the tencent-hy-world-2.0-community license on Hugging Face and ModelScope.[2] The system splits its job into two tasks that share a common 3D representation:
Both paths converge on the same output, a set of 3D Gaussian splats plus extractable mesh and point-cloud geometry, which is what makes the result reusable in standard graphics and robotics pipelines.[2]
HY-World 2.0 is the second major generation in a fast-moving Tencent line. The predecessor, HunyuanWorld 1.0, shipped in late July 2025 as what Tencent called the first open-source, simulation-capable model for immersive 3D world generation.[3] Version 1.0 worked by generating a 360-degree panorama as a "world proxy," decomposing it into semantic layers, then running hierarchical 3D reconstruction to produce a layered mesh that could be exported to game engines. Tencent later added a 1.0-Lite variant trimmed to run on consumer GPUs with under 17 GB of VRAM.[4]
A parallel research thread, HunyuanWorld-Voyager, arrived in September 2025. Voyager is an RGB-D video diffusion model that generates world-consistent point-cloud video along a user-defined camera path and supports fast 3D reconstruction from that output.[5] HY-World 2.0 folds the lessons from both efforts into a single framework and pushes harder on producing clean, persistent 3D assets instead of frame sequences. The broader Hunyuan family also includes the Hunyuan3D object generators and HunyuanVideo, so the world model sits inside a fairly deep stack of Tencent generative tools.
The release is built from four named modules, each with its own checkpoint.[2]
| Component | Role | Approx. parameters |
|---|---|---|
| WorldMirror 2.0 | Feed-forward 3D reconstruction from images or video | ~1.2B |
| HY-Pano 2.0 | Text or image to 360-degree panorama | ~80B |
| HY-Pano-2-Qwen | Lightweight panorama variant | ~425M |
| WorldStereo 2.0 | Panorama to expanded multi-view 3D world | ~17B |
| WorldNav | Camera trajectory planning | not disclosed |
WorldMirror 2.0 is the piece most people will touch first, partly because its code and weights led the staged rollout. It is a unified feed-forward model that predicts depth, surface normals, camera intrinsics and extrinsics, dense 3D point clouds, and 3D Gaussian splat attributes in a single forward pass.[2] It runs at flexible resolutions (Tencent cites roughly 50K to 500K pixels), accepts optional camera and depth priors, and supports multi-GPU inference with FSDP and BF16. One practical quirk worth knowing: in multi-GPU mode the number of input images has to be at least the number of GPUs.[2]
The generation path is heavier. HY-Pano 2.0, at around 80 billion parameters, is the largest single piece and handles the initial panorama. The smaller HY-Pano-2-Qwen variant exists for users who cannot run the full model.
For text-to-world or single-image-to-world, HY-World 2.0 runs a four-stage pipeline.[2]
The reconstruction path is simpler: feed multi-view images or a video to WorldMirror 2.0 and get the same kind of 3D output without the generative front end.
2026 turned "world models" into one of the most crowded areas in AI, and the entrants do not all mean the same thing by the term. It helps to group them by what they actually hand you.
Genie 3 from Google DeepMind, previewed in August 2025, is a real-time interactive world model. It generates navigable environments at 720p and around 24 frames per second and keeps them consistent for a few minutes, but the world is a stream of generated frames rather than a downloadable 3D file, and access stayed limited to a research preview.[6] NVIDIA Cosmos, and the Cosmos 3 generation shown at COMPUTEX 2026, takes a different angle again. Cosmos is a family of world foundation models aimed squarely at physical AI: generating photorealistic, physics-grounded video and synthetic data to train robots and autonomous vehicles, with components for prediction, transfer, and reasoning.[7]
The closest comparison is Marble from World Labs, the spatial-intelligence company co-founded by Fei-Fei Li, which launched commercially in late 2025. Marble accepts text, images, video, or coarse 3D layouts and, like HY-World 2.0, produces persistent, downloadable 3D environments that export to Gaussian splats, meshes, or video.[8] The two arrive at a similar product from opposite directions: Marble is a polished hosted service, while HY-World 2.0 is an open release you run yourself. That openness, weights and code under a community license, is the main thing HY-World 2.0 brings to a field where the strongest interactive models have stayed closed.
Tencent released HY-World 2.0 in stages rather than all at once.[2] The April 16, 2026 launch shipped the technical report and the WorldMirror 2.0 reconstruction code. HY-Pano 2.0 inference code and weights followed on May 11, and the world generation code plus WorldStereo 2.0 weights landed on May 18. Everything is distributed under the tencent-hy-world-2.0-community license, with checkpoints on Hugging Face and ModelScope and an interactive demo at Tencent's sceneTo3D site.[2] As with HunyuanWorld 1.0's Lite edition, the staged structure and a smaller HY-Pano variant suggest Tencent is trying to keep at least part of the system reachable on hardware short of a multi-GPU server.
The pitch for outputting real 3D assets rather than video is reusability. Because a generated or reconstructed scene comes out as editable meshes and Gaussian splats, the obvious applications are content and game development: rapidly blocking out maps, level prototypes, and environment art that artists can then refine in their existing tools.[1] Engine compatibility with Isaac Sim points at the other big use case, robotics and embodied AI, where a cheap supply of varied, physically plausible 3D scenes is useful for training and testing agents in simulation. Digital twins and VR or AR content round out the list. The honest caveat is that generative 3D worlds in 2026 are still uneven on fine geometry and physical accuracy, so how much of this holds up in production rather than demos is something the independent benchmarks in the technical report, and outside testing, will have to settle.