正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

HunyuanImage 3.0 is officially here. It's the first open-source model with industrial-grade native multimodal architecture and delivers instant, high-quality visual content for every designer and content creator. Watch this quick showcase to see its performance on generating intricate text, detailed comics, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

Tencent Hy

39,672 subscribers

31,048 次观看 • 8 个月前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

Tencent Hy

412,523 次观看 • 8 个月前

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 次观看 • 9 个月前

StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization and text-guided SVG creation.

StarVector is out on Hugging Face StarVector is a foundation model for generating Scalable Vector Graphics (SVG) code from images and text. It utilizes a Vision-Language Modeling architecture to understand both visual and textual inputs, enabling high-quality vectorization and text-guided SVG creation.

AK

254,259 次观看 • 1 年前

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

Tencent Hy

122,539 次观看 • 9 个月前

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines for full editability & simulation. Set to transform game development, VR, digital content creation and so on. Get started now👇🏻 Project Page： Try it now： Github： Hugging Face：

We're thrilled to release & open-source Hunyuan3D World Model 1.0! This model enables you to generate immersive, explorable, and interactive 3D worlds from just a sentence or an image. It's the industry's first open-source 3D world generation model, compatible with CG pipelines for full editability & simulation. Set to transform game development, VR, digital content creation and so on. Get started now👇🏻 Project Page： Try it now： Github： Hugging Face：

Tencent Hy

1,230,042 次观看 • 11 个月前

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence while drastically lowering the entry barrier for developers and creators: ⚡️ Unmatched Accessibility: Ultra-light 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. 🖥️ HD Cinematic Quality: Natively generates 5–10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality. By merging SOTA performance with high hardware efficiency, HunyuanVideo 1.5 sets the new technical baseline for the open-source community. 🌐Project Page: 🔗Github: 🤗Hugging Face： 📄Technical Report:

We are excited to unveil HunyuanVideo 1.5, the strongest open-source video generation model. Built upon DiT architecture, it redefines the open-source SOTA for accessibility and performance.🚀🚀🚀 HunyuanVideo 1.5 delivers state-of-the-art visual quality and motion coherence while drastically lowering the entry barrier for developers and creators: ⚡️ Unmatched Accessibility: Ultra-light 8.3B parameters, deployable on consumer GPUs with only 14GB VRAM. 🖥️ HD Cinematic Quality: Natively generates 5–10 second 480p/720p HD videos, with super-resolution support for 1080p cinematic quality. By merging SOTA performance with high hardware efficiency, HunyuanVideo 1.5 sets the new technical baseline for the open-source community. 🌐Project Page: 🔗Github: 🤗Hugging Face： 📄Technical Report:

Tencent Hy

128,008 次观看 • 7 个月前

Meet Tencent HY-MT1.5 — our latest open-source translation models, featuring 1.8B and 7B models designed for seamless on-device and cloud deployment. 🌍 The 1.8B version is already trending at #1 on Hugging Face! 🥇It’s optimized for consumer hardware, delivering 0.18s latency (50 tokens) with just a 1GB memory footprint. Meanwhile, the 7B model achieves exceptional performance, surpassing many mid-sized models and rivaling the 90th percentile of Gemini 3.0 Pro. 🚀🧠 👉🏻 Try it now: 🔗 GitHub: 🤗 Hugging Face:

Meet Tencent HY-MT1.5 — our latest open-source translation models, featuring 1.8B and 7B models designed for seamless on-device and cloud deployment. 🌍 The 1.8B version is already trending at #1 on Hugging Face! 🥇It’s optimized for consumer hardware, delivering 0.18s latency (50 tokens) with just a 1GB memory footprint. Meanwhile, the 7B model achieves exceptional performance, surpassing many mid-sized models and rivaling the 90th percentile of Gemini 3.0 Pro. 🚀🧠 👉🏻 Try it now: 🔗 GitHub: 🤗 Hugging Face:

Tencent HY

17,105 次观看 • 5 个月前

The Future of Content Creation is Here. Dreamina AI Video 3.0 Model and Image 3.0 with Smart Image Reference are now officially live👏 Say goodbye to creative limits. Let’s see what’s new? 👉 Try it here:

The Future of Content Creation is Here. Dreamina AI Video 3.0 Model and Image 3.0 with Smart Image Reference are now officially live👏 Say goodbye to creative limits. Let’s see what’s new? 👉 Try it here:

Dreamina AI

12,804 次观看 • 1 年前

🚀Introducing Wan2.2-S2V — a 14B parameter model designed for film-grade, audio-driven human animation. 🎬Going beyond basic talking heads to deliver professional-level quality for film, TV, and digital content. And it’s open-source! ✨ Key features: 🔹 Long-video dynamic consistency 🔹 Cinema-quality audio-to-video generation 🔹 Advanced motion and environment control via instruction Perfect for filmmakers, content creators, and developers crafting immersive AI-powered stories. Try it now : Github: Project: Hugging Face Demo: Modelscope Demo: Hugging Face Weights: ModelScope Weights:

🚀Introducing Wan2.2-S2V — a 14B parameter model designed for film-grade, audio-driven human animation. 🎬Going beyond basic talking heads to deliver professional-level quality for film, TV, and digital content. And it’s open-source! ✨ Key features: 🔹 Long-video dynamic consistency 🔹 Cinema-quality audio-to-video generation 🔹 Advanced motion and environment control via instruction Perfect for filmmakers, content creators, and developers crafting immersive AI-powered stories. Try it now : Github: Project: Hugging Face Demo: Modelscope Demo: Hugging Face Weights: ModelScope Weights:

Wan

128,320 次观看 • 9 个月前

Dive into Hunyuan-A13B, our latest open-source LLM built on an MoE architecture for optimal resource efficiency and high performance. Watch the video to discover its core strengths and how it provides a robust foundation for advancement across academic research, cost-effective AI solution development, and innovative application exploration. Try it on: API Address: GitHub: Hugging Face: C3-Bench Dataset: ArtifactsBench Dataset:

Dive into Hunyuan-A13B, our latest open-source LLM built on an MoE architecture for optimal resource efficiency and high performance. Watch the video to discover its core strengths and how it provides a robust foundation for advancement across academic research, cost-effective AI solution development, and innovative application exploration. Try it on: API Address: GitHub: Hugging Face: C3-Bench Dataset: ArtifactsBench Dataset:

Tencent HY

44,270 次观看 • 11 个月前

🚀 LongCat-Video Now Open-Source: Text/Image-to-Video + Video Continuation in One Model 🏆 Text/Image-to-Video Performance Hits Open-Source SOTA 🎬 Minutes-Long High-Quality Videos: No Color Drift/Quality Loss (Industry-Standout) ⚙ 13.6B Params | Strong Open-Source DiT-Based Unified Multitask Video Base Model ⚡ C2F Pipeline + Block Sparse Attention: 720p/30fps Video in Minutes 🤗 Open-Source Links: GitHub： Hugging Face： Project Page：

🚀 LongCat-Video Now Open-Source: Text/Image-to-Video + Video Continuation in One Model 🏆 Text/Image-to-Video Performance Hits Open-Source SOTA 🎬 Minutes-Long High-Quality Videos: No Color Drift/Quality Loss (Industry-Standout) ⚙ 13.6B Params | Strong Open-Source DiT-Based Unified Multitask Video Base Model ⚡ C2F Pipeline + Block Sparse Attention: 720p/30fps Video in Minutes 🤗 Open-Source Links: GitHub： Hugging Face： Project Page：

Meituan LongCat

43,731 次观看 • 7 个月前

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision: - a high quality, unified embedding space for image, text, and multimodal tasks - outperforms both OpenAI CLIP and text-embedding-3-small - open weights and code to enable indie hacking, research, and experimentation - released in collaboration with MongoDB, LlamaIndex 🦙, , Hugging Face, Amazon Web Services, DigitalOcean, Lambda

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision: - a high quality, unified embedding space for image, text, and multimodal tasks - outperforms both OpenAI CLIP and text-embedding-3-small - open weights and code to enable indie hacking, research, and experimentation - released in collaboration with MongoDB, LlamaIndex 🦙, , Hugging Face, Amazon Web Services, DigitalOcean, Lambda

CalCo

103,205 次观看 • 2 年前

LTX-2 is the first truly open-source audio-video generation model, and it's extremely impressive. Production-grade, native 4K 50 FPS output generated entirely on your own hardware. 2,382,172 Hugging Face downloads in the last month says it all. LTX 👏

LTX-2 is the first truly open-source audio-video generation model, and it's extremely impressive. Production-grade, native 4K 50 FPS output generated entirely on your own hardware. 2,382,172 Hugging Face downloads in the last month says it all. LTX 👏

Roberto Nickson

39,431 次观看 • 4 个月前

Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! 🚀 It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation. 🧠 A "Thinking" Model with Native CoT & MixGRPO: The model doesn’t just execute commands, it processes them through a Native Chain-of-Thought (CoT) schema. Enhanced by our self-developed MixGRPO algorithm, it reasons through complex instructions to achieve flawless intent alignment and human-preference consistency. 🎨 Precise Editing & Multi-Image Fusion: The model enables accurate image editing by adding, removing, or modifying elements while keeping non-target areas perfectly intact. It also excels at seamless multi-image fusion, synthesizing complex scenes by extracting and blending elements from multiple sources into a unified, consistent output. 🏆 SOTA Performance: HunyuanImage 3.0-Instruct sets a new benchmark in visual quality and alignment, delivering performance that matches leading proprietary models. We aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant image generation ecosystem. 🛠️🎨 💻Try it at (PC only):

Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis! 🚀 It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation. 🧠 A "Thinking" Model with Native CoT & MixGRPO: The model doesn’t just execute commands, it processes them through a Native Chain-of-Thought (CoT) schema. Enhanced by our self-developed MixGRPO algorithm, it reasons through complex instructions to achieve flawless intent alignment and human-preference consistency. 🎨 Precise Editing & Multi-Image Fusion: The model enables accurate image editing by adding, removing, or modifying elements while keeping non-target areas perfectly intact. It also excels at seamless multi-image fusion, synthesizing complex scenes by extracting and blending elements from multiple sources into a unified, consistent output. 🏆 SOTA Performance: HunyuanImage 3.0-Instruct sets a new benchmark in visual quality and alignment, delivering performance that matches leading proprietary models. We aim to enable the community to explore new ideas with a state-of-the-art foundation model, fostering a dynamic and vibrant image generation ecosystem. 🛠️🎨 💻Try it at (PC only):

Tencent Hy

125,803 次观看 • 4 个月前

1/5 MiniCPM-V 4.6 (1.3B) is now live 🚀🚀 High-res visual processing, optimized for consumer-grade and mobile hardware. We’ve leveraged the latest LLaVA-UHD v4 technique to cut vision encoding costs by 55%, enabling native edge deployment with extreme efficiency. 🔥 Beats Gemma4-E2B-it and Qwen3.5-0.8B across key multimodal and Artificial Analysis benchmarks — scoring higher than Qwen3.5-0.8B using just 2.5% of its token budget. ⚡ TTFT (75.7ms) 2.2x Faster than Qwen3.5-0.8B even with 3136² high-res images. 🏗️ ~1.5x Token Throughput compared with Qwen3.5-0.8B on a single RTX 4090. Try the model here: 🤗 Hugging Face: 💻 GitHub: 🔭 Modelscope: 🌐 Web Demo: 📱 App Demo:

1/5 MiniCPM-V 4.6 (1.3B) is now live 🚀🚀 High-res visual processing, optimized for consumer-grade and mobile hardware. We’ve leveraged the latest LLaVA-UHD v4 technique to cut vision encoding costs by 55%, enabling native edge deployment with extreme efficiency. 🔥 Beats Gemma4-E2B-it and Qwen3.5-0.8B across key multimodal and Artificial Analysis benchmarks — scoring higher than Qwen3.5-0.8B using just 2.5% of its token budget. ⚡ TTFT (75.7ms) 2.2x Faster than Qwen3.5-0.8B even with 3136² high-res images. 🏗️ ~1.5x Token Throughput compared with Qwen3.5-0.8B on a single RTX 4090. Try the model here: 🤗 Hugging Face: 💻 GitHub: 🔭 Modelscope: 🌐 Web Demo: 📱 App Demo:

OpenBMB

351,399 次观看 • 1 个月前

Zhipu AI just released GLM-4.6V on Hugging Face This new multimodal model achieves SOTA visual understanding, features native function calling for agents, and handles 128k context for documents. Perception to action!

Zhipu AI just released GLM-4.6V on Hugging Face This new multimodal model achieves SOTA visual understanding, features native function calling for agents, and handles 128k context for documents. Perception to action!

DailyPapers

16,629 次观看 • 6 个月前

🎬 Create cinematic-quality videos with the Wan2.2 Series, the world’s first open-source MoE video gen model! Now showcasing richer motion, enhanced lighting, and vibrant colors, driven by foundational upgrades. One click, endless creativity 🪄 Explore on GitHub, Hugging Face and ModelScope!

🎬 Create cinematic-quality videos with the Wan2.2 Series, the world’s first open-source MoE video gen model! Now showcasing richer motion, enhanced lighting, and vibrant colors, driven by foundational upgrades. One click, endless creativity 🪄 Explore on GitHub, Hugging Face and ModelScope!

Alibaba Group

96,092 次观看 • 10 个月前

Atomic Chat is now on Hugging Face 🤗 We're officially a Local App on the world's biggest AI hub. Run 200,000+ open-weight models from Hugging Face directly on your device - private, local, and open source!

Atomic Chat is now on Hugging Face 🤗 We're officially a Local App on the world's biggest AI hub. Run 200,000+ open-weight models from Hugging Face directly on your device - private, local, and open source!

atomic.chat

27,837 次观看 • 10 天前

Just launched #CES2026, the new open-source NVIDIA Nemotron Speech ASR model is here to solve latency drift and redundant compute. Its cache-aware streaming architecture eliminates the need for buffered inference, giving you stable, sub-100ms latency (24ms median T-T-F) and up to 3x more throughput on your GPU. 🤗 Read the technical blog with real-world results from Daily and Modal on Hugging Face:

Just launched #CES2026, the new open-source NVIDIA Nemotron Speech ASR model is here to solve latency drift and redundant compute. Its cache-aware streaming architecture eliminates the need for buffered inference, giving you stable, sub-100ms latency (24ms median T-T-F) and up to 3x more throughput on your GPU. 🤗 Read the technical blog with real-world results from Daily and Modal on Hugging Face:

NVIDIA AI Developer

138,370 次观看 • 5 个月前

Introducing Chatterbox Multilingual 🌎 23 languages. One model. Open source. On GitHub , Hugging Face , and Let's dive in 🧵

Introducing Chatterbox Multilingual 🌎 23 languages. One model. Open source. On GitHub , Hugging Face , and Let's dive in 🧵

Resemble AI

24,082 次观看 • 9 个月前