Tencent Hy's banner

Tencent Hy

@TencentHunyuan • 44,132 subscribers

Tencent's foundation model for text, image, video, and 3D generation.

Shorts

💡HunyuanVideo1.5 Update: We are now releasing the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). 🔗Check out the GitHub Repo:

💡HunyuanVideo1.5 Update: We are now releasing the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). 🔗Check out the GitHub Repo:

39,011 Aufrufe

🔥🔥🔥We’ve been listening to your feedback! Our latest world model HY-World 1.5 just got a major upgrade to make world generation more accessible than ever: 🛠️ Open Training Code: Fully customizable code for building and training your own models. ⚡ Accelerated Inference: Turbocharged speed and optimized VRAM for real-time interaction. 📉 Lite 5B Model: A new lightweight model that fits into small-VRAM GPUs. 🙌 Zero Waitlist: Our online app is now fully open to everyone—no application required. This is just the beginning. HY-World is building the future of spatial intelligence—open, accessible, and community-driven. 🕹️ Play now: ⭐ GitHub:

🔥🔥🔥We’ve been listening to your feedback! Our latest world model HY-World 1.5 just got a major upgrade to make world generation more accessible than ever: 🛠️ Open Training Code: Fully customizable code for building and training your own models. ⚡ Accelerated Inference: Turbocharged speed and optimized VRAM for real-time interaction. 📉 Lite 5B Model: A new lightweight model that fits into small-VRAM GPUs. 🙌 Zero Waitlist: Our online app is now fully open to everyone—no application required. This is just the beginning. HY-World is building the future of spatial intelligence—open, accessible, and community-driven. 🕹️ Play now: ⭐ GitHub:

20,581 Aufrufe

With Hunyuan3D World Model 1.0 now released and open-sourced, we're excited to showcase the technical highlights behind this impressive innovation: ✅360° Panoramic Generation: Creates complete, immersive “world scenes”, far beyond localized views. ✅Explorable 3D Scene Generation: Generates diverse, spatially consistent 3D worlds from text/image for truly immersive exploration. ✅Interactive/Editable: Achieves separation of foreground objects, background terrain, ground, and sky, for seamless secondary editing. ✅Exportable Mesh: Generated scenes can be exported as 3D meshes for direct import into mainstream game engines and modeling software. ✅Industry-Leading SOTA Evaluation: Surpasses state-of-the-art open-source models in generation quality. As the industry's first open-source model for physical simulation and explorable world generation, Hunyuan3D World Model 1.0 aims to foster a collaborative community ecosystem with developers and enthusiasts. ✨ Try it now: 🤗 Hugging Face:

With Hunyuan3D World Model 1.0 now released and open-sourced, we're excited to showcase the technical highlights behind this impressive innovation: ✅360° Panoramic Generation: Creates complete, immersive “world scenes”, far beyond localized views. ✅Explorable 3D Scene Generation: Generates diverse, spatially consistent 3D worlds from text/image for truly immersive exploration. ✅Interactive/Editable: Achieves separation of foreground objects, background terrain, ground, and sky, for seamless secondary editing. ✅Exportable Mesh: Generated scenes can be exported as 3D meshes for direct import into mainstream game engines and modeling software. ✅Industry-Leading SOTA Evaluation: Surpasses state-of-the-art open-source models in generation quality. As the industry's first open-source model for physical simulation and explorable world generation, Hunyuan3D World Model 1.0 aims to foster a collaborative community ecosystem with developers and enthusiasts. ✨ Try it now: 🤗 Hugging Face:

23,150 Aufrufe

🚀Our latest Tencent Hunyuan 3D model is going open source soon! Get ready to build. 😉 #TencentHunyuan #OpenSource #3D

🚀Our latest Tencent Hunyuan 3D model is going open source soon! Get ready to build. 😉 #TencentHunyuan #OpenSource #3D

25,239 Aufrufe

World Model is trending— let's revisit our HunyuanWorld journey. We’ve been pioneering open-source 3D world generation in the past two months, and this ride’s only getting started. 🌍 📅 July: HunyuanWorld 1.0 📌 First open-source 3D world model compatible with CG pipelines (Unity/Unreal/Blender) 📌 Hit 2K+ GitHub stars in just two months ⭐—thank you for the love! 📅 August: 1.0-Lite 📌Same top-tier quality, running on consumer GPUs! 📅 September: 1.0-Voyager 📌 Direct 3D output + world memory—taking exploration further! Seamlessly integrated into CG pipelines with layered 3D modeling (assets, terrain, skybox) and fully open-sourced.. we’re fully committed to building open-source spatial intelligence for all! 🚀 💡 Why it matters? ✅ Seamless CG Pipeline Integration: Export generated 3D scenes as standard mesh formats, effortlessly integrating into industry-standard tools like Blender, Unity, and Unreal Engine for direct editing, animation, and physical simulation. ✅ Hierarchical Scene Editing: Deconstruct scenes into semantic layers (sky, background, foreground objects) via instance recognition and layer decomposition, allowing for atomic-level control—independently modify, relocate, or replace objects without rebuilding the entire world. Project page: Github: Amazing creations by Stijn Spanhove camenduru GENEL | AIを用いた動画制作 apolinario 🌐 とりにく Directive Creator 🪥 👇 #AI #3DGeneration #OpenSource #WorldModels #Hunyuan3D #HunyuanWorld

World Model is trending— let's revisit our HunyuanWorld journey. We’ve been pioneering open-source 3D world generation in the past two months, and this ride’s only getting started. 🌍 📅 July: HunyuanWorld 1.0 📌 First open-source 3D world model compatible with CG pipelines (Unity/Unreal/Blender) 📌 Hit 2K+ GitHub stars in just two months ⭐—thank you for the love! 📅 August: 1.0-Lite 📌Same top-tier quality, running on consumer GPUs! 📅 September: 1.0-Voyager 📌 Direct 3D output + world memory—taking exploration further! Seamlessly integrated into CG pipelines with layered 3D modeling (assets, terrain, skybox) and fully open-sourced.. we’re fully committed to building open-source spatial intelligence for all! 🚀 💡 Why it matters? ✅ Seamless CG Pipeline Integration: Export generated 3D scenes as standard mesh formats, effortlessly integrating into industry-standard tools like Blender, Unity, and Unreal Engine for direct editing, animation, and physical simulation. ✅ Hierarchical Scene Editing: Deconstruct scenes into semantic layers (sky, background, foreground objects) via instance recognition and layer decomposition, allowing for atomic-level control—independently modify, relocate, or replace objects without rebuilding the entire world. Project page: Github: Amazing creations by Stijn Spanhove camenduru GENEL | AIを用いた動画制作 apolinario 🌐 とりにく Directive Creator 🪥 👇 #AI #3DGeneration #OpenSource #WorldModels #Hunyuan3D #HunyuanWorld

20,178 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

We’re open-sourcing HY-World 2.0, a multimodal world model that generates, reconstructs, and simulates interactive *3D worlds* from text, images, and videos. Outputs can be integrated into game engines and embodied simulation pipelines. Key highlights: 🔹 One-click world generation Turn text or image into interactive 3D worlds automatically. 🔹 Pipeline-ready 3D outputs Editable 3D worlds for Unity and Unreal Engine, with standard 3D exports including mesh, 3DGS, and point clouds. 🔹 Unified world model system One model family for world generation and reconstruction across synthetic and real-world scenes. 🔹 Interactive character mode Explore generated 3D worlds in real time with physics-aware movement and collision support. ✨ Apply for access: 🔗 GitHub: 🤗 Hugging Face: 📄 Technical Report:

We’re open-sourcing HY-World 2.0, a multimodal world model that generates, reconstructs, and simulates interactive 3D worlds from text, images, and videos. Outputs can be integrated into game engines and embodied simulation pipelines. Key highlights: 🔹 One-click world generation Turn text or image into interactive 3D worlds automatically. 🔹 Pipeline-ready 3D outputs Editable 3D worlds for Unity and Unreal Engine, with standard 3D exports including mesh, 3DGS, and point clouds. 🔹 Unified world model system One model family for world generation and reconstruction across synthetic and real-world scenes. 🔹 Interactive character mode Explore generated 3D worlds in real time with physics-aware movement and collision support. ✨ Apply for access: 🔗 GitHub: 🤗 Hugging Face: 📄 Technical Report:

370,675 Aufrufe • vor 3 Monaten

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines for full editability & simulation. Set to transform game development, VR, digital content creation and so on. Get started now👇🏻 Project Page： Try it now： Github： Hugging Face：

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines for full editability & simulation. Set to transform game development, VR, digital content creation and so on. Get started now👇🏻 Project Page： Try it now： Github： Hugging Face：

1,230,418 Aufrufe • vor 11 Monaten

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

328,493 Aufrufe • vor 6 Monaten

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

412,658 Aufrufe • vor 9 Monaten

🚀🚀🚀Introducing HY World 1.5 (WorldPlay)! We have now open-sourced the most systemized, comprehensive real-time world model framework in the industry. In HY World 1.5, we develop WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. You can generate and explore 3D worlds simply by inputting text or images. Walk, look around, and interact like you're playing a game. Highlights: 🔹Real-Time: Generates long-horizon streaming video at 24 FPS with superior consistency. 🔹Geometric Consistency: Achieved using a Reconstituted Context Memory mechanism to dynamically rebuild context from past frames to alleviate memory attenuation 🔹Robust Control: Uses a Dual Action Representation for robust response to user keyboard and mouse inputs. 🔹Versatile Applications: Supports both first-person and third-person perspectives, enabling applications like promptable events and infinite world extension. 👉🏻Try it now: 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical Report:

🚀🚀🚀Introducing HY World 1.5 (WorldPlay)! We have now open-sourced the most systemized, comprehensive real-time world model framework in the industry. In HY World 1.5, we develop WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. You can generate and explore 3D worlds simply by inputting text or images. Walk, look around, and interact like you're playing a game. Highlights: 🔹Real-Time: Generates long-horizon streaming video at 24 FPS with superior consistency. 🔹Geometric Consistency: Achieved using a Reconstituted Context Memory mechanism to dynamically rebuild context from past frames to alleviate memory attenuation 🔹Robust Control: Uses a Dual Action Representation for robust response to user keyboard and mouse inputs. 🔹Versatile Applications: Supports both first-person and third-person perspectives, enabling applications like promptable events and infinite world extension. 👉🏻Try it now: 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical Report:

189,802 Aufrufe • vor 7 Monaten

Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! 🚀 It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation. 🧠 A "Thinking" Model with Native CoT & MixGRPO: The model doesn’t just execute commands, it processes them through a Native Chain-of-Thought (CoT) schema. Enhanced by our self-developed MixGRPO algorithm, it reasons through complex instructions to achieve flawless intent alignment and human-preference consistency. 🎨 Precise Editing & Multi-Image Fusion: The model enables accurate image editing by adding, removing, or modifying elements while keeping non-target areas perfectly intact. It also excels at seamless multi-image fusion, synthesizing complex scenes by extracting and blending elements from multiple sources into a unified, consistent output. 🏆 SOTA Performance: HunyuanImage 3.0-Instruct sets a new benchmark in visual quality and alignment, delivering performance that matches leading proprietary models. We aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant image generation ecosystem. 🛠️🎨 💻Try it at (PC only):

Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! 🚀 It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation. 🧠 A "Thinking" Model with Native CoT & MixGRPO: The model doesn’t just execute commands, it processes them through a Native Chain-of-Thought (CoT) schema. Enhanced by our self-developed MixGRPO algorithm, it reasons through complex instructions to achieve flawless intent alignment and human-preference consistency. 🎨 Precise Editing & Multi-Image Fusion: The model enables accurate image editing by adding, removing, or modifying elements while keeping non-target areas perfectly intact. It also excels at seamless multi-image fusion, synthesizing complex scenes by extracting and blending elements from multiple sources into a unified, consistent output. 🏆 SOTA Performance: HunyuanImage 3.0-Instruct sets a new benchmark in visual quality and alignment, delivering performance that matches leading proprietary models. We aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant image generation ecosystem. 🛠️🎨 💻Try it at (PC only):

125,803 Aufrufe • vor 5 Monaten

HunyuanWorld-Voyager is here and fully open-source! The world’s first ultra-long-range world model with native 3D reconstruction, redefining AI-driven spatial intelligence for VR, gaming, and simulations. ✅Direct 3D Output: Exports point cloud videos to 3D formats without tools like COLMAP, enabling instant 3D application use. ✅Innovative 3D Memory: Introduces a scalable world caching mechanism, ensuring geometric consistency across any camera trajectory. ✅Top-Ranked Performance: #1 on Stanford’s WorldScore, excelling in video generation and 3D reconstruction benchmarks.( Built on HunyuanWorld 1.0, Voyager blends video generation with 3D modeling, delivering camera-controlled, high-fidelity RGB-D sequences. Control scenes via keyboard or joystick for unmatched 3D consistency. Explore now: 🌐Project Page: 🔗GitHub: 🤗HuggingFace: 📝Technical Details:

HunyuanWorld-Voyager is here and fully open-source! The world’s first ultra-long-range world model with native 3D reconstruction, redefining AI-driven spatial intelligence for VR, gaming, and simulations. ✅Direct 3D Output: Exports point cloud videos to 3D formats without tools like COLMAP, enabling instant 3D application use. ✅Innovative 3D Memory: Introduces a scalable world caching mechanism, ensuring geometric consistency across any camera trajectory. ✅Top-Ranked Performance: #1 on Stanford’s WorldScore, excelling in video generation and 3D reconstruction benchmarks.( Built on HunyuanWorld 1.0, Voyager blends video generation with 3D modeling, delivering camera-controlled, high-fidelity RGB-D sequences. Control scenes via keyboard or joystick for unmatched 3D consistency. Explore now: 🌐Project Page: 🔗GitHub: 🤗HuggingFace: 📝Technical Details:

198,289 Aufrufe • vor 10 Monaten

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 significantly expands the input scope by unlocking video-to-3D and multi-view-to-3D world creation. Highlights: 🔹Any Input, Maximized Flexibility and Fidelity: Flexibly integrates diverse geometric priors (camera poses, intrinsics, depth maps) to resolve structural ambiguities and ensure geometrically consistent 3D outputs. 🔹Any Output, SOTA Results：This elegant architecture simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings. 🔹Single-GPU & Fast Inference: As an all-in-one, feed-forward model, Hunyuan World 1.1 runs on a single GPU and delivers all 3D attributes in a single forward pass, within seconds. 🌐Project Page: 🔗Github： 🤗Hugging Face： ✨Demo: 📄Technical Report:

Today, we are open-sourcing Hunyuan World 1.1 (WorldMirror), a universal feed-forward 3D reconstruction model. 🚀🚀🚀 While our previously released Hunyuan World 1.0 (open-sourced, lite version deployable on consumer GPUs) focused on generating 3D worlds from text or single-view images, Hunyuan World 1.1 significantly expands the input scope by unlocking video-to-3D and multi-view-to-3D world creation. Highlights: 🔹Any Input, Maximized Flexibility and Fidelity: Flexibly integrates diverse geometric priors (camera poses, intrinsics, depth maps) to resolve structural ambiguities and ensure geometrically consistent 3D outputs. 🔹Any Output, SOTA Results：This elegant architecture simultaneously generates multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussian Splattings. 🔹Single-GPU & Fast Inference: As an all-in-one, feed-forward model, Hunyuan World 1.1 runs on a single GPU and delivers all 3D attributes in a single forward pass, within seconds. 🌐Project Page: 🔗Github： 🤗Hugging Face： ✨Demo: 📄Technical Report:

168,451 Aufrufe • vor 9 Monaten

🚀 Hunyuan 3D 2.1 is here! The first fully open-source, production-ready PBR 3D generative model! ✅Cinema-grade visuals: PBR material synthesis brings leather, bronze, and more to life with stunning light interactions. ✅ Fully open-source: Model weights, training/inference code, data pipelines, and architecture—yours to fine-tune! ✅ Runs on consumer GPUs, empowering creators, devs, and small teams. Be the first to shape the future of 3D! Download now and build something epic. Model: Demo: Github: Hunyuan 3D Creation Engine: #Hunyuan3D #OpenSource #3DCreation

🚀 Hunyuan 3D 2.1 is here! The first fully open-source, production-ready PBR 3D generative model! ✅Cinema-grade visuals: PBR material synthesis brings leather, bronze, and more to life with stunning light interactions. ✅ Fully open-source: Model weights, training/inference code, data pipelines, and architecture—yours to fine-tune! ✅ Runs on consumer GPUs, empowering creators, devs, and small teams. Be the first to shape the future of 3D! Download now and build something epic. Model: Demo: Github: Hunyuan 3D Creation Engine: #Hunyuan3D #OpenSource #3DCreation

236,318 Aufrufe • vor 1 Jahr

🚀🚀🚀Hunyuan 3D Studio just leveled up to 1.1! We've integrated the art-grade 3D generative model, Hunyuan 3D-PolyGen 1.5, to deliver the industry's most advanced mesh quality directly to your workflow. 🎨 🖌️ Art-Grade Quad Mesh: We've pioneered an end-to-end native quad mesh generation method. Unlike older methods that generated only tri-meshes, PolyGen 1.5 directly learns quad topology to produce cleaner, continuous edge loops and superior wireframe quality. 🎮 Professional Viability: Achieving this topology standard makes models instantly production-ready for game artists, 3D designers, and developers across game development, animation, and VR content creation. ⚙️ Flexible Output: PolyGen 1.5 supports both Quad and Triangular Topology, ensuring viability for both soft-surface and hard-surface models in professional pipelines. PolyGen 1.5 sets a new SOTA in stability, detail, and wireframe quality. Explore Hunyuan 3D Studio 1.1 and see the results:

🚀🚀🚀Hunyuan 3D Studio just leveled up to 1.1! We've integrated the art-grade 3D generative model, Hunyuan 3D-PolyGen 1.5, to deliver the industry's most advanced mesh quality directly to your workflow. 🎨 🖌️ Art-Grade Quad Mesh: We've pioneered an end-to-end native quad mesh generation method. Unlike older methods that generated only tri-meshes, PolyGen 1.5 directly learns quad topology to produce cleaner, continuous edge loops and superior wireframe quality. 🎮 Professional Viability: Achieving this topology standard makes models instantly production-ready for game artists, 3D designers, and developers across game development, animation, and VR content creation. ⚙️ Flexible Output: PolyGen 1.5 supports both Quad and Triangular Topology, ensuring viability for both soft-surface and hard-surface models in professional pipelines. PolyGen 1.5 sets a new SOTA in stability, detail, and wireframe quality. Explore Hunyuan 3D Studio 1.1 and see the results:

146,221 Aufrufe • vor 7 Monaten

🚀 Introducing HunyuanVideo-Avatar, a model jointly developed by Tencent Hunyuan and Tencent Music, bringing photos to life. ✅ Upload a photo + audio — auto-detect scene context & emotion, then generate lifelike speech/singing with dynamic visuals. ✅ Supports multi-style, multi-species scenarios and excels in multi-character interactions. ✅ Built for short video creation, e-commerce, advertising, and more — already deployed in multiple apps across Tencent Music Entertainment Group. The single-character mode of HunyuanVideo-Avatar is now open-sourced and live on the Hunyuan website, supporting audio up to 14 seconds for video generation. The multi-character mode will be open-sourced soon. Try it on: Project Page: Github: Tech report: #Tencent #Hunyuan

218,408 Aufrufe • vor 1 Jahr

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence while drastically lowering the entry barrier for developers and creators: ⚡️ Unmatched Accessibility: Ultra-light 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. 🖥️ HD Cinematic Quality: Natively generates 5–10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality. By merging SOTA performance with high hardware efficiency, HunyuanVideo 1.5 sets the new technical baseline for the open-source community. 🌐Project Page: 🔗Github: 🤗Hugging Face： 📄Technical Report:

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence while drastically lowering the entry barrier for developers and creators: ⚡️ Unmatched Accessibility: Ultra-light 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. 🖥️ HD Cinematic Quality: Natively generates 5–10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality. By merging SOTA performance with high hardware efficiency, HunyuanVideo 1.5 sets the new technical baseline for the open-source community. 🌐Project Page: 🔗Github: 🤗Hugging Face： 📄Technical Report:

128,008 Aufrufe • vor 8 Monaten

🚀We are thrilled to open-source Hunyuan-GameCraft, a high-dynamic interactive game video generation framework built on HunyuanVideo. It generates playable and physically realistic videos from a single scene image and user action signals, empowering creators and developers to "direct" games with first-person or third-person perspectives. Key Advantages: 🔹High Dynamics: Unifies standard keyboard inputs into a shared continuous action space, enabling high-precision control over velocity and angle. This allows for the exploration of complex trajectories, overcoming the stiff, limited motion of traditional models. It can also generate dynamic environmental content like moving clouds, rain, snow, and water flow. 🔹Long-term Consistency: Uses hybrid history condition to preserve the original scene information after significant movement. 🔹Significant Cost Reduction: No need for expensive modeling/rendering. PCM distillation compresses inference steps, boosting speed and lowering costs. This allows the quantized 13B model to run on consumer-grade GPUs like the RTX 4090. Project Page: Code: Technical Report: Hugging Face：

🚀We are thrilled to open-source Hunyuan-GameCraft, a high-dynamic interactive game video generation framework built on HunyuanVideo. It generates playable and physically realistic videos from a single scene image and user action signals, empowering creators and developers to "direct" games with first-person or third-person perspectives. Key Advantages: 🔹High Dynamics: Unifies standard keyboard inputs into a shared continuous action space, enabling high-precision control over velocity and angle. This allows for the exploration of complex trajectories, overcoming the stiff, limited motion of traditional models. It can also generate dynamic environmental content like moving clouds, rain, snow, and water flow. 🔹Long-term Consistency: Uses hybrid history condition to preserve the original scene information after significant movement. 🔹Significant Cost Reduction: No need for expensive modeling/rendering. PCM distillation compresses inference steps, boosting speed and lowering costs. This allows the quantized 13B model to run on consumer-grade GPUs like the RTX 4090. Project Page: Code: Technical Report: Hugging Face：

169,638 Aufrufe • vor 11 Monaten

We are thrilled to announce a 30x acceleration in white model generation speed across the entire Hunyuan3D 2.0 family, reducing the processing time from 30 seconds to just 1 second.

We are thrilled to announce a 30x acceleration in white model generation speed across the entire Hunyuan3D 2.0 family, reducing the processing time from 30 seconds to just 1 second.

229,733 Aufrufe • vor 1 Jahr

㊗️Congrats on Lvmin Zhang’s (github@lllyasviel) latest project FramePack and thank you for using and recommending HunyuanVideo. 😀So happy to see innovations based on Hunyuan and we would like to see more. ▶️FramePack's Brief Intro and Showcases Attached: FramePack is a next-frame prediction neural network structure that generates videos progressively, compressing input contexts to a constant length so that the generation workload is invariant to video length. FramePack can process a very large number of frames with 13B models even on laptop GPUs. 🔗Project Page： 🔗Paper： 🔗Code：

㊗️Congrats on Lvmin Zhang’s (github@lllyasviel) latest project FramePack and thank you for using and recommending HunyuanVideo. 😀So happy to see innovations based on Hunyuan and we would like to see more. ▶️FramePack's Brief Intro and Showcases Attached: FramePack is a next-frame prediction neural network structure that generates videos progressively, compressing input contexts to a constant length so that the generation workload is invariant to video length. FramePack can process a very large number of frames with 13B models even on laptop GPUs. 🔗Project Page： 🔗Paper： 🔗Code：

210,162 Aufrufe • vor 1 Jahr

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

191,908 Aufrufe • vor 1 Jahr

🚀Introducing Hunyuan3D-PolyGen, our newly upgraded and industry-first art-grade 3D generative model. It brings effortless intelligent retopology, making AI-generated models ready for professional art pipelines. ✅ Superior Mesh Topology: Our self-developed mesh autoregressive model ensures higher-quality mesh topology that meets stringent art standards. ✅ Complex Object Modeling: Leveraging our high-compression BPT representation, we can generate models with 10K+ faces, enabling more complex geometry, higher topology precision, and better detail. ✅ Flexible Output: Supports both tri and quad meshes, meeting diverse pipeline requirements. Hunyuan3D-PolyGen enables direct application of AI-generated 3D assets in game development and significantly boosts artist modeling efficiency. It's a robust foundation for the future of 3D content creation. 👉Try it now:

🚀Introducing Hunyuan3D-PolyGen, our newly upgraded and industry-first art-grade 3D generative model. It brings effortless intelligent retopology, making AI-generated models ready for professional art pipelines. ✅ Superior Mesh Topology: Our self-developed mesh autoregressive model ensures higher-quality mesh topology that meets stringent art standards. ✅ Complex Object Modeling: Leveraging our high-compression BPT representation, we can generate models with 10K+ faces, enabling more complex geometry, higher topology precision, and better detail. ✅ Flexible Output: Supports both tri and quad meshes, meeting diverse pipeline requirements. Hunyuan3D-PolyGen enables direct application of AI-generated 3D assets in game development and significantly boosts artist modeling efficiency. It's a robust foundation for the future of 3D content creation. 👉Try it now:

161,189 Aufrufe • vor 1 Jahr

Meet Tencent HY 3D Studio 1.2 👋 With this major upgrade to our 3D creation pipeline, you can generate assets with sculpt-level detail and fine-grained interactive control. Starting today, the studio is officially open for Public Beta — no application required. Tencent HY 3D Studio 1.2 introduces a suite of features built for creative precision, including: 🧩 PartGen 1.5: High-Precision Component Partitioning — 1536³ Resolution: Boosted from 1024³ for crystal-clear model splitting and ultra-fine detail retention. — Fine-Grained Interaction: Introducing intuitive, brush-based control for precise, manual component editing. — Shape Adherence: Drastically improved geometry integrity for even the most intricate objects. 🎨 Tencent HY 3D 3.1: Sculpt-Level Detail — Enhanced Geometry: Fine-grained details that adapt perfectly to stylized inputs.. — Texture Fidelity: A massive leap in color accuracy and texture fidelity, reflecting your original input with stunning realism. — 8-View Control: Expanded from 4 to 8 input views for ultimate reconstruction accuracy. ✨We’re so excited to see what you create! Try it now:

Meet Tencent HY 3D Studio 1.2 👋 With this major upgrade to our 3D creation pipeline, you can generate assets with sculpt-level detail and fine-grained interactive control. Starting today, the studio is officially open for Public Beta — no application required. Tencent HY 3D Studio 1.2 introduces a suite of features built for creative precision, including: 🧩 PartGen 1.5: High-Precision Component Partitioning — 1536³ Resolution: Boosted from 1024³ for crystal-clear model splitting and ultra-fine detail retention. — Fine-Grained Interaction: Introducing intuitive, brush-based control for precise, manual component editing. — Shape Adherence: Drastically improved geometry integrity for even the most intricate objects. 🎨 Tencent HY 3D 3.1: Sculpt-Level Detail — Enhanced Geometry: Fine-grained details that adapt perfectly to stylized inputs.. — Texture Fidelity: A massive leap in color accuracy and texture fidelity, reflecting your original input with stunning realism. — 8-View Control: Expanded from 4 to 8 input views for ultimate reconstruction accuracy. ✨We’re so excited to see what you create! Try it now:

83,314 Aufrufe • vor 6 Monaten

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

122,706 Aufrufe • vor 10 Monaten

Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and audio "Banana🍌" Instead of simply requiring the AI to "generate" audio, it demands that the AI understand an existing audio clip and precisely modify it according to natural language instructions—altering what needs to be changed while leaving the rest untouched. Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing. MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items ✅ 7 modality settings across sound, music, speech and their mixtures ✅ 6 task complexity from basic modifications to multi-hop reasoning and multi-round editing ✅ 8 operation types across local and global granularities How to use: arXiv: GitHub: HuggingFace: Demo:

Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and audio "Banana🍌" Instead of simply requiring the AI to "generate" audio, it demands that the AI understand an existing audio clip and precisely modify it according to natural language instructions—altering what needs to be changed while leaving the rest untouched. Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing. MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items ✅ 7 modality settings across sound, music, speech and their mixtures ✅ 6 task complexity from basic modifications to multi-hop reasoning and multi-round editing ✅ 8 operation types across local and global granularities How to use: arXiv: GitHub: HuggingFace: Demo:

21,088 Aufrufe • vor 1 Monat