HunyuanWorld 1.0

Chinese AI Computer Vision Generative AI

10 min read

Updated May 31, 2026

Suggest edit History Talk

RawGraph

Last edited

May 31, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 2,098 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

HunyuanWorld 1.0 is an open model from Tencent that generates explorable 3D worlds from a text prompt or a single image. The Hunyuan team released it on July 26, 2025, and it belongs to the family of generative AI systems that try to build whole scenes rather than single objects or flat pictures. The pitch is easy to state and hard to deliver. You describe a place, or you hand the model one photo of a place, and you get back a 3D environment you can walk through, render, and edit with ordinary graphics tools.

The project frames itself as "the first open-source, simulation-capable, immersive 3D world generation model" that is also compatible with existing computer graphics pipelines. ^[1]^[2] That last part matters most in practice. Plenty of research systems can produce something that looks like a 3D scene, but the output is often locked inside a custom renderer or a point cloud that game engines cannot use directly. HunyuanWorld 1.0 exports meshes, so its worlds drop into the same pipelines artists already use for games, film, and virtual reality.

What it generates

The model works in two modes. In text-to-world mode you write a prompt, something like a sunlit medieval courtyard or a neon street at night, and the system invents a matching scene. In image-to-world mode you supply a single photo or rendered frame, and the model extends it into a full surrounding environment. ^[1]^[3] In both cases the result is a 360 degree world. You are not stuck looking at the scene from one fixed camera. You can turn around and the world is there behind you.

The output is mesh based, which is the choice that separates this work from video-style world generators. A mesh is the standard way computer graphics describes a surface, a set of vertices and faces, and almost every engine and modeling tool speaks that language. Because the scene comes out as meshes, you can light it, apply materials, run collision against it, and import it into tools like Blender, Unity, or Unreal Engine without converting from some exotic format first. ^[1]^[3]

How it works

The technical report describes the system as a framework that combines the strengths of two earlier approaches. ^[3] Video-based world generators give you rich variety but tend to drift out of 3D consistency, and they are expensive to render. Pure 3D methods stay geometrically consistent but are starved for training data and lean on memory-hungry representations. HunyuanWorld 1.0 tries to take the useful parts of each.

The pipeline runs in stages. First the model generates a panorama, a single wraparound image that captures the whole scene in 360 degrees. This panorama acts as what the authors call a world proxy, a 2D stand-in for the 3D world that is cheap to produce and easy to keep coherent. ^[3] The panorama generator is a diffusion transformer, referred to as Panorama-DiT, and the open release is built on the FLUX text-to-image diffusion model. The authors note the method can be adapted to other image generators such as Stable Diffusion. ^[4]

Second comes the part the authors lean on hardest, a semantically layered 3D mesh representation. Instead of treating the panorama as one flat shell, the system separates it into meaningful layers. Sky goes in one layer, the distant background in another, and individual foreground objects in their own layers. ^[3] The decomposition is handled by a vision-language model in an agentic loop, with object detection from Grounding DINO and segmentation from ZIM. Each layer then gets a depth estimate from a geometry model such as MoGe, and the layers are turned into separate meshes by sheet warping, a grid-mesh technique drawn from earlier work called WorldSheet. The meshes are composited into one navigable scene. ^[4] This layered reconstruction is what gives the world real depth instead of looking like a photo pasted on the inside of a sphere.

Third, the layering pays off as interactivity. Because foreground objects are disentangled into their own meshes rather than baked into the background, you can select a tree or a crate and move it, replace it, or animate it. ^[3] For the foreground objects the pipeline can call an image-to-3D generator such as Tencent's own Hunyuan 3D to produce clean standalone assets. ^[4] The public release ships four model checkpoints, two for panorama generation from text or image and two for inpainting the scene and the sky. ^[5]

The headline advantages from the report line up with these stages. ^[3]

Capability	What it provides
360 degree panoramic world proxy	Immersive scenes you can look around in any direction
Mesh export	Direct compatibility with standard graphics pipelines
Semantic layering	Sky, background, and foreground reconstructed as separate meshes
Disentangled objects	Individual items can be moved, swapped, or animated
Text and image input	World generation from a prompt or from a single picture

How it performs

The team evaluated HunyuanWorld 1.0 against earlier systems on two tasks. For panorama generation it compared against methods such as Diffusion360, MVDiffusion, PanFusion, and LayerPano3D. For full 3D world generation it compared against WonderJourney, DimensionX, and LayerPano3D. ^[3] The metrics are the usual mix for this area. There are no-reference image quality scores such as BRISQUE, NIQE, and Q-Align, plus CLIP-based scores for how well a result matches the text prompt and the input image. ^[3] The paper reports state of the art results across these comparisons. For text-to-world generation, for instance, it lists a BRISQUE of 34.6 and a CLIP text score of 24.0 against LayerPano3D's 35.3 and 22.0, and for image-to-world it reports a BRISQUE of 36.2 against WonderJourney's 51.8. ^[6]

These are the authors' own numbers from the technical report, so they should be read as the developers' evaluation rather than an independent benchmark. The broader point is that HunyuanWorld 1.0 holds up on image quality while also delivering a usable 3D mesh, which most of the baselines do not.

Use cases

The model is aimed at any workflow that needs a 3D environment in a hurry. The paper names virtual reality, physical simulation, game development, and interactive content creation. ^[3] For VR the 360 degree panoramic worlds are a natural fit, since the whole point is to look around freely. For games and film the mesh output and disentangled objects let a small team rough in a level or a set and then refine it by hand. For robotics and embodied AI the meshes can serve as training environments, places where an agent can be dropped to learn navigation or manipulation, which is where the simulation-capable label comes from.

Open release and licensing

Tencent published the code and weights openly, with the repository on GitHub and the model on Hugging Face. ^[1]^[2] The release covers both generation modes and the supporting pipeline. It is not a hosted demo behind an API, it is a model you download and run yourself.

Running the full model at high resolution takes a serious GPU, and the original release leaned toward data-center hardware. To widen access, on August 15, 2025 the team published a quantized build, HunyuanWorld-1.0-lite, that runs on consumer cards such as an NVIDIA RTX 4090. ^[1] That makes local use practical for people without a workstation-class accelerator, at some cost in fidelity.

The license is the Tencent HunyuanWorld-1.0 Community License Agreement, identified as tencent-hunyuanworld-1.0-community. ^[7] It allows commercial use without a fee, with two limits worth knowing. The grant applies to a defined Territory that excludes the European Union, the United Kingdom, and South Korea, so the standard terms do not cover users in those places. ^[7] And if a product built on the model passes 1 million monthly active users, the user has to request a separate license from Tencent, which Tencent can grant or refuse at its discretion. ^[7] This is the same broad shape as the licenses on other Hunyuan releases, permissive for most people in practice while carving out the largest deployments and a few jurisdictions.

Relation to Hunyuan 3D and other models

It is easy to confuse HunyuanWorld with Hunyuan 3D, and the two are related but do different jobs. Hunyuan 3D generates a single 3D object from an image or a prompt, a chair, a character, a prop. HunyuanWorld generates a whole scene, the room that holds the chair and everything around it. ^[1] In fact HunyuanWorld can call Hunyuan 3D to make the foreground objects inside its scenes, so the object model is a component of the world model rather than a competitor to it. ^[4] The two share the Hunyuan brand with other Tencent generators such as HunyuanVideo for video, but each targets a separate output type.

In the wider research landscape, HunyuanWorld 1.0 belongs to the push toward a generative world model, a system that produces navigable environments rather than fixed media. It contrasts with video-first world generators that synthesize frames of a moving camera but do not hand you reusable geometry, and with object-level 3D generation that makes single assets. Its distinguishing trait is the pairing of an explorable 360 degree scene with an exported mesh that ordinary tools can read.

Later versions

Tencent kept building on the line after the 1.0 release, and the follow-ups branch in two directions. HunyuanWorld-Voyager, released on September 2, 2025, takes the video route. It is an interactive RGBD video generation model conditioned on camera input that generates world-consistent 3D point-cloud sequences from a single image along a user-defined camera path, producing aligned color and depth frames for direct 3D reconstruction. ^[8] Tencent described it as the industry's first ultra-long-range world model with native 3D reconstruction, and it ranked first on Stanford's WorldScore benchmark. ^[8] Where 1.0 builds a static mesh world you explore freely, Voyager focuses on long camera traversals with depth, which puts it closer to the video-based branch the original paper had set out to complement.

The numbered line continued separately. HunyuanWorld 1.1, also called WorldMirror, arrived on October 22, 2025 and reconstructs 3D scenes from videos or multi-view images. ^[1] Later updates followed, including a real-time interactive system and a 2.0 release, which shows how quickly the area moved over the months after the first model.

The table below sorts out the family members that are easy to mix up.

Release	Date	What it does
HunyuanWorld 1.0	July 2025	Text or image to a layered, mesh-based 3D world
HunyuanWorld-1.0-lite	August 2025	Quantized build for consumer GPUs
HunyuanWorld-Voyager	September 2025	Camera-controllable RGBD video with native 3D reconstruction
HunyuanWorld 1.1 (WorldMirror)	October 2025	3D reconstruction from video or multi-view images

Limitations

The honest limits sit alongside the strengths. The full model wants a large GPU, so high-resolution generation is out of reach for most individual users without the quantized lite build or rented cloud hardware. ^[1] The output is a reconstruction from a single panorama, so geometry far from the original viewpoint is inferred rather than observed, and surfaces that were never visible in the proxy can look thin or incomplete when you move deep into the scene. Disentangled objects help interactivity, but the separation is only as good as the segmentation, so cluttered scenes can leave objects fused or mislabeled. And the licensing carve-outs mean teams in the European Union, the United Kingdom, and South Korea, or anyone planning a product at million-user scale, cannot simply assume the default terms cover them. ^[7] None of this undercuts the core idea. For a freely available model, getting an editable, engine-ready 3D world out of a sentence is a real step, and the open release lets the rest of the field build on it.

References

Tencent Hunyuan. "HunyuanWorld-1.0." GitHub repository. https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0 ↩
Tencent. "HunyuanWorld-1 model card." Hugging Face. https://huggingface.co/tencent/HunyuanWorld-1 ↩
HunyuanWorld Team, Tencent. "HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels." arXiv:2507.21809, July 2025. https://arxiv.org/abs/2507.21809 ↩
HunyuanWorld 1.0 technical report, HTML version (pipeline, FLUX base, Panorama-DiT, sheet warping, Hunyuan3D foreground, MoGe depth, ZIM and Grounding DINO). https://arxiv.org/html/2507.21809v2 ↩
Tencent Hunyuan. "HunyuanWorld-1.0 README, model checkpoints and acknowledgements." GitHub. https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0/blob/main/README.md ↩
HunyuanWorld 1.0, quantitative benchmark tables (BRISQUE, NIQE, Q-Align, CLIP-T, CLIP-I). arXiv:2507.21809. https://arxiv.org/pdf/2507.21809 ↩
Tencent. "TENCENT HUNYUANWORLD-1.0 COMMUNITY LICENSE AGREEMENT." GitHub. https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0/blob/main/LICENSE ↩
Tencent Hunyuan. "HunyuanWorld-Voyager." GitHub repository. https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager ↩
Black Forest Labs. "FLUX.1 image generation models." https://github.com/black-forest-labs/flux
The Decoder. "Tencent releases Hunyuan World Model 1.0-lite for faster, resource-efficient 3D scene generation." https://the-decoder.com/tencent-releases-hunyuan-world-model-1-0-lite-for-faster-resource-efficient-3d-scene-generation/

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

HY-World 2.0