HY-World 2.0

Chinese AI Multimodal AI World Models

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

8 citations

Revision

v1 · 1,394 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

HY-World 2.0 (also written HunyuanWorld 2.0 or Hunyuan World Model 2.0) is an open multimodal 3D world model from Tencent's Hunyuan team, with a technical report and first code release published on April 16, 2026.^[1]^[2] The model takes text, single images, multi-view images, or video as input and produces persistent 3D worlds that can be exported as polygon meshes, 3D Gaussian splats, and point clouds. Its main departure from the video-based world models that dominated 2025 is that it outputs editable 3D geometry directly, so a generated scene can be dropped into Blender, Unreal Engine, Unity, or NVIDIA Isaac Sim rather than played back as a fixed clip.^[1]

What it is

HY-World 2.0 is described by Tencent as a multimodal world model for reconstructing, generating, and simulating 3D worlds.^[1] It is not a single network but a framework of cooperating components, released under the tencent-hy-world-2.0-community license on Hugging Face and ModelScope.^[2] The system splits its job into two tasks that share a common 3D representation:

World reconstruction: turning real captures (multiple photos of a place, or a video walkthrough) into a 3D model.
World generation: inventing a navigable 3D scene from a text prompt or a single image.

Both paths converge on the same output, a set of 3D Gaussian splats plus extractable mesh and point-cloud geometry, which is what makes the result reusable in standard graphics and robotics pipelines.^[2]

The Hunyuan world model line

HY-World 2.0 is the second major generation in a fast-moving Tencent line. The predecessor, HunyuanWorld 1.0, shipped in late July 2025 as what Tencent called the first open-source, simulation-capable model for immersive 3D world generation.^[3] Version 1.0 worked by generating a 360-degree panorama as a "world proxy," decomposing it into semantic layers, then running hierarchical 3D reconstruction to produce a layered mesh that could be exported to game engines. Tencent later added a 1.0-Lite variant trimmed to run on consumer GPUs with under 17 GB of VRAM.^[4]

A parallel research thread, HunyuanWorld-Voyager, arrived in September 2025. Voyager is an RGB-D video diffusion model that generates world-consistent point-cloud video along a user-defined camera path and supports fast 3D reconstruction from that output.^[5] HY-World 2.0 folds the lessons from both efforts into a single framework and pushes harder on producing clean, persistent 3D assets instead of frame sequences. The broader Hunyuan family also includes the Hunyuan3D object generators and HunyuanVideo, so the world model sits inside a fairly deep stack of Tencent generative tools.

Components and parameters

The release is built from four named modules, each with its own checkpoint.^[2]

Component	Role	Approx. parameters
WorldMirror 2.0	Feed-forward 3D reconstruction from images or video	~1.2B
HY-Pano 2.0	Text or image to 360-degree panorama	~80B
HY-Pano-2-Qwen	Lightweight panorama variant	~425M
WorldStereo 2.0	Panorama to expanded multi-view 3D world	~17B
WorldNav	Camera trajectory planning	not disclosed

WorldMirror 2.0 is the piece most people will touch first, partly because its code and weights led the staged rollout. It is a unified feed-forward model that predicts depth, surface normals, camera intrinsics and extrinsics, dense 3D point clouds, and 3D Gaussian splat attributes in a single forward pass.^[2] It runs at flexible resolutions (Tencent cites roughly 50K to 500K pixels), accepts optional camera and depth priors, and supports multi-GPU inference with FSDP and BF16. One practical quirk worth knowing: in multi-GPU mode the number of input images has to be at least the number of GPUs.^[2]

The generation path is heavier. HY-Pano 2.0, at around 80 billion parameters, is the largest single piece and handles the initial panorama. The smaller HY-Pano-2-Qwen variant exists for users who cannot run the full model.

How generation works

For text-to-world or single-image-to-world, HY-World 2.0 runs a four-stage pipeline.^[2]

Panorama generation. HY-Pano 2.0 turns the prompt or input image into a 360-degree panorama that anchors the scene.
Trajectory planning. WorldNav, guided by a vision-language model, plans a camera path through the scene with obstacle-aware navigation, so the virtual camera does not try to fly through walls.
World expansion. WorldStereo 2.0 generates additional keyframes along that path, using memory to keep newly revealed areas consistent with what was already seen.
World composition. WorldMirror 2.0 extracts frames, depth, normals, and camera parameters, then trains a 3D Gaussian splat representation that ties everything into one coherent, persistent scene.

The reconstruction path is simpler: feed multi-view images or a video to WorldMirror 2.0 and get the same kind of 3D output without the generative front end.

Where it fits in the 2026 world-model race

2026 turned "world models" into one of the most crowded areas in AI, and the entrants do not all mean the same thing by the term. It helps to group them by what they actually hand you.

Genie 3 from Google DeepMind, previewed in August 2025, is a real-time interactive world model. It generates navigable environments at 720p and around 24 frames per second and keeps them consistent for a few minutes, but the world is a stream of generated frames rather than a downloadable 3D file, and access stayed limited to a research preview.^[6] NVIDIA Cosmos, and the Cosmos 3 generation shown at COMPUTEX 2026, takes a different angle again. Cosmos is a family of world foundation models aimed squarely at physical AI: generating photorealistic, physics-grounded video and synthetic data to train robots and autonomous vehicles, with components for prediction, transfer, and reasoning.^[7]

The closest comparison is Marble from World Labs, the spatial-intelligence company co-founded by Fei-Fei Li, which launched commercially in late 2025. Marble accepts text, images, video, or coarse 3D layouts and, like HY-World 2.0, produces persistent, downloadable 3D environments that export to Gaussian splats, meshes, or video.^[8] The two arrive at a similar product from opposite directions: Marble is a polished hosted service, while HY-World 2.0 is an open release you run yourself. That openness, weights and code under a community license, is the main thing HY-World 2.0 brings to a field where the strongest interactive models have stayed closed.

Open-source release and license

Tencent released HY-World 2.0 in stages rather than all at once.^[2] The April 16, 2026 launch shipped the technical report and the WorldMirror 2.0 reconstruction code. HY-Pano 2.0 inference code and weights followed on May 11, and the world generation code plus WorldStereo 2.0 weights landed on May 18. Everything is distributed under the tencent-hy-world-2.0-community license, with checkpoints on Hugging Face and ModelScope and an interactive demo at Tencent's sceneTo3D site.^[2] As with HunyuanWorld 1.0's Lite edition, the staged structure and a smaller HY-Pano variant suggest Tencent is trying to keep at least part of the system reachable on hardware short of a multi-GPU server.

Use cases

The pitch for outputting real 3D assets rather than video is reusability. Because a generated or reconstructed scene comes out as editable meshes and Gaussian splats, the obvious applications are content and game development: rapidly blocking out maps, level prototypes, and environment art that artists can then refine in their existing tools.^[1] Engine compatibility with Isaac Sim points at the other big use case, robotics and embodied AI, where a cheap supply of varied, physically plausible 3D scenes is useful for training and testing agents in simulation. Digital twins and VR or AR content round out the list. The honest caveat is that generative 3D worlds in 2026 are still uneven on fine geometry and physical accuracy, so how much of this holds up in production rather than demos is something the independent benchmarks in the technical report, and outside testing, will have to settle.

References

Tencent Hunyuan, "HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds" (technical report), April 16, 2026. https://3d-models.hunyuan.tencent.com/world/world2_0/HY_World_2_0.pdf ↩
Tencent-Hunyuan, "HY-World-2.0" (GitHub repository and Hugging Face model card). https://github.com/Tencent-Hunyuan/HY-World-2.0 and https://huggingface.co/tencent/HY-World-2.0 ↩
Tencent-Hunyuan, "HunyuanWorld-1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels," July 2025. https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0 (paper: https://arxiv.org/abs/2507.21809) ↩
The Decoder, "Tencent releases 'Hunyuan World Model 1.0-Lite' for faster, resource-efficient 3D scene generation," 2025. https://the-decoder.com/tencent-releases-hunyuan-world-model-1-0-as-an-open-source-ai-for-3d-scene-generation/ ↩
Tencent-Hunyuan, "HunyuanWorld-Voyager," September 2025. https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager ↩
Google DeepMind, "Genie 3: A new frontier for world models," August 2025. https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/ ↩
NVIDIA, "NVIDIA Cosmos: World Foundation Models Powering Physical AI." https://www.nvidia.com/en-us/ai/cosmos/ ↩
World Labs, "Marble: A Multimodal World Model," 2025. https://www.worldlabs.ai/blog/marble-world-model ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

InternVL

What it is

The Hunyuan world model line

Components and parameters

How generation works

Where it fits in the 2026 world-model race

Open-source release and license

Use cases

References

Improve this article

Related Articles

DeepSeek-OCR

Doubao Seed 1.6

InternVL

Qwen2.5-VL

DeepSeek Janus

DeepSeek-VL2

What links here

Related Articles

DeepSeek-OCR

Doubao Seed 1.6

InternVL

Qwen2.5-VL

DeepSeek Janus

DeepSeek-VL2