
Tencent Hy
@TencentHunyuan • 39,513 subscribers
Tencent's foundation model for text, image, video, and 3D generation.
Shorts
Videos

We’re open-sourcing HY-World 2.0, a multimodal world model that generates, reconstructs, and simulates interactive *3D worlds* from text, images, and videos. Outputs can be integrated into game engines and embodied simulation pipelines. Key highlights: 🔹 One-click world generation Turn text or image into interactive 3D worlds automatically. 🔹 Pipeline-ready 3D outputs Editable 3D worlds for Unity and Unreal Engine, with standard 3D exports including mesh, 3DGS, and point clouds. 🔹 Unified world model system One model family for world generation and reconstruction across synthetic and real-world scenes. 🔹 Interactive character mode Explore generated 3D worlds in real time with physics-aware movement and collision support. ✨ Apply for access: 🔗 GitHub: 🤗 Hugging Face: 📄 Technical Report:
Tencent Hy368,087 Aufrufe • vor 1 Monat

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines for full editability & simulation. Set to transform game development, VR, digital content creation and so on. Get started now👇🏻 Project Page: Try it now: Github: Hugging Face:
Tencent Hy1,229,913 Aufrufe • vor 10 Monaten

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:
Tencent Hy327,656 Aufrufe • vor 5 Monaten

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:
Tencent Hy412,491 Aufrufe • vor 8 Monaten

🚀🚀🚀Introducing HY World 1.5 (WorldPlay)! We have now open-sourced the most systemized, comprehensive real-time world model framework in the industry. In HY World 1.5, we develop WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. You can generate and explore 3D worlds simply by inputting text or images. Walk, look around, and interact like you're playing a game. Highlights: 🔹Real-Time: Generates long-horizon streaming video at 24 FPS with superior consistency. 🔹Geometric Consistency: Achieved using a Reconstituted Context Memory mechanism to dynamically rebuild context from past frames to alleviate memory attenuation 🔹Robust Control: Uses a Dual Action Representation for robust response to user keyboard and mouse inputs. 🔹Versatile Applications: Supports both first-person and third-person perspectives, enabling applications like promptable events and infinite world extension. 👉🏻Try it now: 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical Report:
Tencent Hy189,561 Aufrufe • vor 5 Monaten

Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! 🚀 It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation. 🧠 A "Thinking" Model with Native CoT & MixGRPO: The model doesn’t just execute commands, it processes them through a Native Chain-of-Thought (CoT) schema. Enhanced by our self-developed MixGRPO algorithm, it reasons through complex instructions to achieve flawless intent alignment and human-preference consistency. 🎨 Precise Editing & Multi-Image Fusion: The model enables accurate image editing by adding, removing, or modifying elements while keeping non-target areas perfectly intact. It also excels at seamless multi-image fusion, synthesizing complex scenes by extracting and blending elements from multiple sources into a unified, consistent output. 🏆 SOTA Performance: HunyuanImage 3.0-Instruct sets a new benchmark in visual quality and alignment, delivering performance that matches leading proprietary models. We aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant image generation ecosystem. 🛠️🎨 💻Try it at (PC only):
Tencent Hy125,611 Aufrufe • vor 4 Monaten

🚀🚀🚀Hunyuan 3D Studio just leveled up to 1.1! We've integrated the art-grade 3D generative model, Hunyuan 3D-PolyGen 1.5, to deliver the industry's most advanced mesh quality directly to your workflow. 🎨 🖌️ Art-Grade Quad Mesh: We've pioneered an end-to-end native quad mesh generation method. Unlike older methods that generated only tri-meshes, PolyGen 1.5 directly learns quad topology to produce cleaner, continuous edge loops and superior wireframe quality. 🎮 Professional Viability: Achieving this topology standard makes models instantly production-ready for game artists, 3D designers, and developers across game development, animation, and VR content creation. ⚙️ Flexible Output: PolyGen 1.5 supports both Quad and Triangular Topology, ensuring viability for both soft-surface and hard-surface models in professional pipelines. PolyGen 1.5 sets a new SOTA in stability, detail, and wireframe quality. Explore Hunyuan 3D Studio 1.1 and see the results:
Tencent HY145,578 Aufrufe • vor 6 Monaten

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 significantly expands the input scope by unlocking video-to-3D and multi-view-to-3D world creation. Highlights: 🔹Any Input, Maximized Flexibility and Fidelity: Flexibly integrates diverse geometric priors (camera poses, intrinsics, depth maps) to resolve structural ambiguities and ensure geometrically consistent 3D outputs. 🔹Any Output, SOTA Results:This elegant architecture simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings. 🔹Single-GPU & Fast Inference: As an all-in-one, feed-forward model, Hunyuan World 1.1 runs on a single GPU and delivers all 3D attributes in a single forward pass, within seconds. 🌐Project Page: 🔗Github: 🤗Hugging Face: ✨Demo: 📄Technical Report:
Tencent Hy168,144 Aufrufe • vor 7 Monaten

HunyuanWorld-Voyager is here and fully open-source! The world’s first ultra-long-range world model with native 3D reconstruction, redefining AI-driven spatial intelligence for VR, gaming, and simulations. ✅Direct 3D Output: Exports point cloud videos to 3D formats without tools like COLMAP, enabling instant 3D application use. ✅Innovative 3D Memory: Introduces a scalable world caching mechanism, ensuring geometric consistency across any camera trajectory. ✅Top-Ranked Performance: #1 on Stanford’s WorldScore, excelling in video generation and 3D reconstruction benchmarks.( Built on HunyuanWorld 1.0, Voyager blends video generation with 3D modeling, delivering camera-controlled, high-fidelity RGB-D sequences. Control scenes via keyboard or joystick for unmatched 3D consistency. Explore now: 🌐Project Page: 🔗GitHub: 🤗HuggingFace: 📝Technical Details:
Tencent Hy198,207 Aufrufe • vor 9 Monaten

🚀 Hunyuan 3D 2.1 is here! The first fully open-source, production-ready PBR 3D generative model! ✅Cinema-grade visuals: PBR material synthesis brings leather, bronze, and more to life with stunning light interactions. ✅ Fully open-source: Model weights, training/inference code, data pipelines, and architecture—yours to fine-tune! ✅ Runs on consumer GPUs, empowering creators, devs, and small teams. Be the first to shape the future of 3D! Download now and build something epic. Model: Demo: Github: Hunyuan 3D Creation Engine: #Hunyuan3D #OpenSource #3DCreation
Tencent Hy236,173 Aufrufe • vor 11 Monaten

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence while drastically lowering the entry barrier for developers and creators: ⚡️ Unmatched Accessibility: Ultra-light 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. 🖥️ HD Cinematic Quality: Natively generates 5–10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality. By merging SOTA performance with high hardware efficiency, HunyuanVideo 1.5 sets the new technical baseline for the open-source community. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical Report:
Tencent Hy128,008 Aufrufe • vor 6 Monaten

🚀 Introducing HunyuanVideo-Avatar, a model jointly developed by Tencent Hunyuan and Tencent Music, bringing photos to life. ✅ Upload a photo + audio — auto-detect scene context & emotion, then generate lifelike speech/singing with dynamic visuals. ✅ Supports multi-style, multi-species scenarios and excels in multi-character interactions. ✅ Built for short video creation, e-commerce, advertising, and more — already deployed in multiple apps across Tencent Music Entertainment Group. The single-character mode of HunyuanVideo-Avatar is now open-sourced and live on the Hunyuan website, supporting audio up to 14 seconds for video generation. The multi-character mode will be open-sourced soon. Try it on: Project Page: Github: Tech report: #Tencent #Hunyuan
Tencent Hy218,174 Aufrufe • vor 1 Jahr

🚀We are thrilled to open-source Hunyuan-GameCraft, a high-dynamic interactive game video generation framework built on HunyuanVideo. It generates playable and physically realistic videos from a single scene image and user action signals, empowering creators and developers to "direct" games with first-person or third-person perspectives. Key Advantages: 🔹High Dynamics: Unifies standard keyboard inputs into a shared continuous action space, enabling high-precision control over velocity and angle. This allows for the exploration of complex trajectories, overcoming the stiff, limited motion of traditional models. It can also generate dynamic environmental content like moving clouds, rain, snow, and water flow. 🔹Long-term Consistency: Uses hybrid history condition to preserve the original scene information after significant movement. 🔹Significant Cost Reduction: No need for expensive modeling/rendering. PCM distillation compresses inference steps, boosting speed and lowering costs. This allows the quantized 13B model to run on consumer-grade GPUs like the RTX 4090. Project Page: Code: Technical Report: Hugging Face:
Tencent Hy169,533 Aufrufe • vor 9 Monaten

Meet Tencent HY 3D Studio 1.2 👋 With this major upgrade to our 3D creation pipeline, you can generate assets with sculpt-level detail and fine-grained interactive control. Starting today, the studio is officially open for Public Beta — no application required. Tencent HY 3D Studio 1.2 introduces a suite of features built for creative precision, including: 🧩 PartGen 1.5: High-Precision Component Partitioning — 1536³ Resolution: Boosted from 1024³ for crystal-clear model splitting and ultra-fine detail retention. — Fine-Grained Interaction: Introducing intuitive, brush-based control for precise, manual component editing. — Shape Adherence: Drastically improved geometry integrity for even the most intricate objects. 🎨 Tencent HY 3D 3.1: Sculpt-Level Detail — Enhanced Geometry: Fine-grained details that adapt perfectly to stylized inputs.. — Texture Fidelity: A massive leap in color accuracy and texture fidelity, reflecting your original input with stunning realism. — 8-View Control: Expanded from 4 to 8 input views for ultimate reconstruction accuracy. ✨We’re so excited to see what you create! Try it now:
Tencent Hy83,118 Aufrufe • vor 4 Monaten

㊗️Congrats on Lvmin Zhang’s (github@lllyasviel) latest project FramePack and thank you for using and recommending HunyuanVideo. 😀So happy to see innovations based on Hunyuan and we would like to see more. ▶️FramePack's Brief Intro and Showcases Attached: FramePack is a next-frame prediction neural network structure that generates videos progressively, compressing input contexts to a constant length so that the generation workload is invariant to video length. FramePack can process a very large number of frames with 13B models even on laptop GPUs. 🔗Project Page: 🔗Paper: 🔗Code:
Tencent HY210,162 Aufrufe • vor 1 Jahr

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan
Tencent HY191,908 Aufrufe • vor 1 Jahr

🚀Introducing Hunyuan3D-PolyGen, our newly upgraded and industry-first art-grade 3D generative model. It brings effortless intelligent retopology, making AI-generated models ready for professional art pipelines. ✅ Superior Mesh Topology: Our self-developed mesh autoregressive model ensures higher-quality mesh topology that meets stringent art standards. ✅ Complex Object Modeling: Leveraging our high-compression BPT representation, we can generate models with 10K+ faces, enabling more complex geometry, higher topology precision, and better detail. ✅ Flexible Output: Supports both tri and quad meshes, meeting diverse pipeline requirements. Hunyuan3D-PolyGen enables direct application of AI-generated 3D assets in game development and significantly boosts artist modeling efficiency. It's a robust foundation for the future of 3D content creation. 👉Try it now:
Tencent Hy161,030 Aufrufe • vor 11 Monaten

We're releasing HY-Embodied-0.5, a family of foundation models for real-world embodied agents. The 2B model is now open source. It strengthens spatial-temporal perception and embodied reasoning for prediction, interaction, and planning. 🤖 The suite includes: 🔹 2B for edge deployment 🔹 32B for complex reasoning Key innovations: 🔹 Mixture-of-Transformers (MoT) architecture for modality-specific computation 🔹 Latent tokens for improved perceptual representation 🔹 Self-evolving post-training 🔹 On-policy distillation from large to small models Across 22 benchmarks, the 2B model outperforms similarly sized SOTA systems on 16 tasks. The 32B model approaches frontier-level performance. 🔗 GitHub: 🤗 Hugging Face:
Tencent Hy34,305 Aufrufe • vor 1 Monat

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:
Tencent Hy122,539 Aufrufe • vor 9 Monaten