# HY-World 2.0

> Source: https://aiwiki.ai/wiki/hy_world_2
> Updated: 2026-06-03
> Categories: Chinese AI, Multimodal AI, World Models
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**HY-World 2.0** (also written HunyuanWorld 2.0 or Hunyuan World Model 2.0) is an open multimodal 3D world model from [Tencent](/wiki/tencent)'s Hunyuan team, with a technical report and first code release published on April 16, 2026.[1][2] The model takes text, single images, multi-view images, or video as input and produces persistent 3D worlds that can be exported as polygon meshes, [3D Gaussian splats](/wiki/gaussian_splatting), and point clouds. Its main departure from the video-based [world models](/wiki/world_models) that dominated 2025 is that it outputs editable 3D geometry directly, so a generated scene can be dropped into [Blender](/wiki/blender), Unreal Engine, Unity, or [NVIDIA Isaac Sim](/wiki/nvidia_isaac_sim) rather than played back as a fixed clip.[1]

## What it is

HY-World 2.0 is described by Tencent as a multimodal world model for reconstructing, generating, and simulating 3D worlds.[1] It is not a single network but a framework of cooperating components, released under the `tencent-hy-world-2.0-community` license on [Hugging Face](/wiki/hugging_face) and ModelScope.[2] The system splits its job into two tasks that share a common 3D representation:

- World reconstruction: turning real captures (multiple photos of a place, or a video walkthrough) into a 3D model.
- World generation: inventing a navigable 3D scene from a text prompt or a single image.

Both paths converge on the same output, a set of 3D Gaussian splats plus extractable mesh and point-cloud geometry, which is what makes the result reusable in standard graphics and robotics pipelines.[2]

## The Hunyuan world model line

HY-World 2.0 is the second major generation in a fast-moving Tencent line. The predecessor, [HunyuanWorld 1.0](/wiki/hunyuanworld), shipped in late July 2025 as what Tencent called the first open-source, simulation-capable model for immersive 3D world generation.[3] Version 1.0 worked by generating a 360-degree panorama as a "world proxy," decomposing it into semantic layers, then running hierarchical 3D reconstruction to produce a layered mesh that could be exported to game engines. Tencent later added a 1.0-Lite variant trimmed to run on consumer GPUs with under 17 GB of VRAM.[4]

A parallel research thread, [HunyuanWorld-Voyager](/wiki/hunyuanworld_voyager), arrived in September 2025. Voyager is an RGB-D video diffusion model that generates world-consistent point-cloud video along a user-defined camera path and supports fast 3D reconstruction from that output.[5] HY-World 2.0 folds the lessons from both efforts into a single framework and pushes harder on producing clean, persistent 3D assets instead of frame sequences. The broader Hunyuan family also includes the [Hunyuan3D](/wiki/hunyuan_3d) object generators and [HunyuanVideo](/wiki/hunyuan_video), so the world model sits inside a fairly deep stack of Tencent generative tools.

## Components and parameters

The release is built from four named modules, each with its own checkpoint.[2]

| Component | Role | Approx. parameters |
|---|---|---|
| WorldMirror 2.0 | Feed-forward 3D reconstruction from images or video | ~1.2B |
| HY-Pano 2.0 | Text or image to 360-degree panorama | ~80B |
| HY-Pano-2-Qwen | Lightweight panorama variant | ~425M |
| WorldStereo 2.0 | Panorama to expanded multi-view 3D world | ~17B |
| WorldNav | Camera trajectory planning | not disclosed |

WorldMirror 2.0 is the piece most people will touch first, partly because its code and weights led the staged rollout. It is a unified feed-forward model that predicts depth, surface normals, camera intrinsics and extrinsics, dense 3D point clouds, and 3D Gaussian splat attributes in a single forward pass.[2] It runs at flexible resolutions (Tencent cites roughly 50K to 500K pixels), accepts optional camera and depth priors, and supports multi-GPU inference with FSDP and BF16. One practical quirk worth knowing: in multi-GPU mode the number of input images has to be at least the number of GPUs.[2]

The generation path is heavier. HY-Pano 2.0, at around 80 billion parameters, is the largest single piece and handles the initial panorama. The smaller HY-Pano-2-Qwen variant exists for users who cannot run the full model.

## How generation works

For text-to-world or single-image-to-world, HY-World 2.0 runs a four-stage pipeline.[2]

1. Panorama generation. HY-Pano 2.0 turns the prompt or input image into a 360-degree panorama that anchors the scene.
2. Trajectory planning. WorldNav, guided by a vision-language model, plans a camera path through the scene with obstacle-aware navigation, so the virtual camera does not try to fly through walls.
3. World expansion. WorldStereo 2.0 generates additional keyframes along that path, using memory to keep newly revealed areas consistent with what was already seen.
4. World composition. WorldMirror 2.0 extracts frames, depth, normals, and camera parameters, then trains a 3D Gaussian splat representation that ties everything into one coherent, persistent scene.

The reconstruction path is simpler: feed multi-view images or a video to WorldMirror 2.0 and get the same kind of 3D output without the generative front end.

## Where it fits in the 2026 world-model race

2026 turned "world models" into one of the most crowded areas in AI, and the entrants do not all mean the same thing by the term. It helps to group them by what they actually hand you.

[Genie 3](/wiki/genie_3) from Google DeepMind, previewed in August 2025, is a real-time interactive world model. It generates navigable environments at 720p and around 24 frames per second and keeps them consistent for a few minutes, but the world is a stream of generated frames rather than a downloadable 3D file, and access stayed limited to a research preview.[6] [NVIDIA Cosmos](/wiki/nvidia_cosmos), and the [Cosmos 3](/wiki/nvidia_cosmos_3) generation shown at COMPUTEX 2026, takes a different angle again. Cosmos is a family of world foundation models aimed squarely at [physical AI](/wiki/physical_ai): generating photorealistic, physics-grounded video and synthetic data to train robots and autonomous vehicles, with components for prediction, transfer, and reasoning.[7]

The closest comparison is [Marble](/wiki/marble_world_labs) from [World Labs](/wiki/world_labs), the spatial-intelligence company co-founded by Fei-Fei Li, which launched commercially in late 2025. Marble accepts text, images, video, or coarse 3D layouts and, like HY-World 2.0, produces persistent, downloadable 3D environments that export to Gaussian splats, meshes, or video.[8] The two arrive at a similar product from opposite directions: Marble is a polished hosted service, while HY-World 2.0 is an open release you run yourself. That openness, weights and code under a community license, is the main thing HY-World 2.0 brings to a field where the strongest interactive models have stayed closed.

## Open-source release and license

Tencent released HY-World 2.0 in stages rather than all at once.[2] The April 16, 2026 launch shipped the technical report and the WorldMirror 2.0 reconstruction code. HY-Pano 2.0 inference code and weights followed on May 11, and the world generation code plus WorldStereo 2.0 weights landed on May 18. Everything is distributed under the `tencent-hy-world-2.0-community` license, with checkpoints on Hugging Face and ModelScope and an interactive demo at Tencent's sceneTo3D site.[2] As with HunyuanWorld 1.0's Lite edition, the staged structure and a smaller HY-Pano variant suggest Tencent is trying to keep at least part of the system reachable on hardware short of a multi-GPU server.

## Use cases

The pitch for outputting real 3D assets rather than video is reusability. Because a generated or reconstructed scene comes out as editable meshes and Gaussian splats, the obvious applications are content and game development: rapidly blocking out maps, level prototypes, and environment art that artists can then refine in their existing tools.[1] Engine compatibility with Isaac Sim points at the other big use case, robotics and embodied AI, where a cheap supply of varied, physically plausible 3D scenes is useful for training and testing agents in simulation. Digital twins and VR or AR content round out the list. The honest caveat is that generative 3D worlds in 2026 are still uneven on fine geometry and physical accuracy, so how much of this holds up in production rather than demos is something the independent benchmarks in the technical report, and outside testing, will have to settle.

## References

1. Tencent Hunyuan, "HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds" (technical report), April 16, 2026. https://3d-models.hunyuan.tencent.com/world/world2_0/HY_World_2_0.pdf
2. Tencent-Hunyuan, "HY-World-2.0" (GitHub repository and Hugging Face model card). https://github.com/Tencent-Hunyuan/HY-World-2.0 and https://huggingface.co/tencent/HY-World-2.0
3. Tencent-Hunyuan, "HunyuanWorld-1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels," July 2025. https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0 (paper: https://arxiv.org/abs/2507.21809)
4. The Decoder, "Tencent releases 'Hunyuan World Model 1.0-Lite' for faster, resource-efficient 3D scene generation," 2025. https://the-decoder.com/tencent-releases-hunyuan-world-model-1-0-as-an-open-source-ai-for-3d-scene-generation/
5. Tencent-Hunyuan, "HunyuanWorld-Voyager," September 2025. https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
6. Google DeepMind, "Genie 3: A new frontier for world models," August 2025. https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
7. NVIDIA, "NVIDIA Cosmos: World Foundation Models Powering Physical AI." https://www.nvidia.com/en-us/ai/cosmos/
8. World Labs, "Marble: A Multimodal World Model," 2025. https://www.worldlabs.ai/blog/marble-world-model

