正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

🎉Introducing Sonic: Shifting Focus to Global Audio Perception in Portrait Animation 🎶 👉 What's New? 1️⃣ Breathe life into static images! Single image + any audio → speeches, singing, & beyond! 2️⃣ Temporal Audio Learning harnesses global audio context for precise lip-sync & natural expressions 3️⃣ Decoupled Motion Control... disentangles head/expression movements for vivid adaptability 4️⃣ Time-aware Fusion ensures seamless video continuity across frames 👉 Why It Matters? ✅ First method to balance audio precision, motion diversity, & temporal stability ✅ Empowers animators to create long-form videos (e.g., ads/vlogs) with 1-click workflows 🖱 ✅ Industry-level quality: Enhanced head poses, authentic emotions, & identity consistency 👉 Game-changing for: ▶️ Advertising & media 🎥 | ▶️ Virtual influencers 📱 ▶️Interactive entertainment 🎮 | ▶️AI-driven education 🎓 📚 Explore Sonic’s Magic: 🔗 Project: 🔗 Code: 🔗 Paper:show more

Tencent HY

33,333 subscribers

39,079 次观看 • 1 年前 •via X (Twitter)

艺术科学技术教育

Anya Rossi• Live Now

Private livecam show

11 条评论

neb 的头像

neb1 年前

im confused, that the same model than 3 month ago or im wrong ?

AssemblyAI 的头像

AssemblyAI1 年前

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

MR BIZARRO 的头像

MR BIZARRO1 年前

Tencent > openai.

DataEatsWorld 的头像

DataEatsWorld1 年前

I don’t even have time to breathe and there’s already some new crazy open-source tech dropped by a big Chinese corp.

James 的头像

James1 年前

Non-commercial license means it's not quite game-changing in those verticals. Impressive none-the-less. Sonic has been around for months now, what's new? The git hasn't been touched in 3 months. Just a hype tweet?

Newman 的头像

Newman1 年前

Finally! Lip sync that doesn’t look like a dubbed kung fu movie

Chinmaya Kumar Behera 的头像

Chinmaya Kumar Behera1 年前

the hugging face demo is not working?

Gadgetify 的头像

Gadgetify1 年前

This guy is stepping up his game 😅

Maskai 的头像

Maskai1 年前

Great work guys 🙌🏼

Emily 的头像

Emily1 年前

Look impressive 👏

Hasan 的头像

Hasan1 年前

Well-done

相关视频

Wan2.2-S2V is now natively supported in ComfyUI! Transform static images and audio into cinematic-quality videos with synchronized lip-sync and natural movements. 🔹 Audio-driven video generation 🔹 Cinematic-grade quality output 🔹 Minute-level long-form videos 🔹 Multi-format support (full/half-body) 🔹 Enhanced motion control via text 🧵👇

Wan2.2-S2V is now natively supported in ComfyUI! Transform static images and audio into cinematic-quality videos with synchronized lip-sync and natural movements. 🔹 Audio-driven video generation 🔹 Cinematic-grade quality output 🔹 Minute-level long-form videos 🔹 Multi-format support (full/half-body) 🔹 Enhanced motion control via text 🧵👇

ComfyUI

49,251 次观看 • 9 个月前

Loopy Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency paper page: With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals to stabilize movements, which may compromise the naturalness and freedom of motion. In this paper, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference. Extensive experiments show that Loopy outperforms recent audio-driven portrait diffusion models, delivering more lifelike and high-quality results across various scenarios.

AK

128,797 次观看 • 1 年前

🎉 InstantCharacter is NOW Open-Source! Personalize Any Character with High Consistency One image + text → custom poses, styles & scenes 1️⃣ First framework to balance character consistency, image quality, & open-domain flexibility/generalization 2️⃣ Compatible with Flux, delivering high-fidelity, text-controllable results 3️⃣ Comparable to industry leaders like GPT-4o in precision & adaptability Try it yourself on： 🔗Hugging Face Demo： Dive Deep into InstantCharacter: 🔗Project Page: 🔗Code: 🔗Paper： #TencentHunyuan #HY #InstantCharacter #OpenSource #AI #CharacterCustomization

🎉 InstantCharacter is NOW Open-Source! Personalize Any Character with High Consistency One image + text → custom poses, styles & scenes 1️⃣ First framework to balance character consistency, image quality, & open-domain flexibility/generalization 2️⃣ Compatible with Flux, delivering high-fidelity, text-controllable results 3️⃣ Comparable to industry leaders like GPT-4o in precision & adaptability Try it yourself on： 🔗Hugging Face Demo： Dive Deep into InstantCharacter: 🔗Project Page: 🔗Code: 🔗Paper： #TencentHunyuan #HY #InstantCharacter #OpenSource #AI #CharacterCustomization

Tencent HY

77,790 次观看 • 1 年前

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

🚀 Introducing HunyuanCustom: An open-source, multimodal-driven architecture for customized video generation, powered by HunyuanVideo-13B. Outperforming existing open-source models, it rivals top closed-source solutions! 🎥 Highlights: ✅Subject Consistency: Maintains identity across single & multi-subject video generation. ✅Multimodal Inputs: Supports text, images, audio, and video for highly controlled outputs. ✅High-Quality Output: Industry-leading detail, motion smoothness, and lighting realism. Key Features: 1️⃣Single-Subject Video: Upload an image + text (e.g., “He’s walking a dog”) to create coherent videos with new actions, outfits, and scenes. 2️⃣Multi-Subject Video: Generate videos with multiple subjects (e.g., a man drinking coffee in a cozy room) from separate image inputs. 3️⃣Audio-Driven Video: Sync audio with visuals for talking or singing in any scene—perfect for digital avatars, virtual customer service, and more. 4️⃣Video-Driven Video: Seamlessly insert or replace subjects into any video for creative enhancements. Dive Deeper: 🌐Project: 📝Technical Details: 💻Code: The single-subject video capability is open-sourced and live on the Hunyuan website, with more features to be released soon! 👉 Try it now: #HunyuanCustom #VideoGeneration #AI #TencentHunyuan

Tencent HY

191,908 次观看 • 1 年前

Holy moly, this is awesome, Tencent Hunyuan just announced Sonic: Audio-driven Portrait Animation System - More natural facial expressions - Better lip sync and alignment with the input - More dynamic head poses - 1 to 10 min videos. 6 wild examples and details below 👇

Holy moly, this is awesome, Tencent Hunyuan just announced Sonic: Audio-driven Portrait Animation System - More natural facial expressions - Better lip sync and alignment with the input - More dynamic head poses - 1 to 10 min videos. 6 wild examples and details below 👇

AshutoshShrivastava

26,721 次观看 • 1 年前

In this week’s Stability Seconds, we’re showing you how to generate custom, commercially safe audio for your next video project with Stable Audio 2.5 🎵 Here’s how you can do it: ▶️ Find a short video clip that you want to add sound to. ▶️ Prompt Stable Audio 2.5 for genre, overall mood, and instruments that match your video. In our prompt, we used terms like “cinematic,” “awe-inspiring,” and “dramatic horn section.” ▶️ Generate a 3-minute track, then scrub through to find the best cut. ▶️ Add that audio to your clip to create a polished soundtrack. You can learn more and start using Stable Audio 2.5 here 👉

In this week’s Stability Seconds, we’re showing you how to generate custom, commercially safe audio for your next video project with Stable Audio 2.5 🎵 Here’s how you can do it: ▶️ Find a short video clip that you want to add sound to. ▶️ Prompt Stable Audio 2.5 for genre, overall mood, and instruments that match your video. In our prompt, we used terms like “cinematic,” “awe-inspiring,” and “dramatic horn section.” ▶️ Generate a 3-minute track, then scrub through to find the best cut. ▶️ Add that audio to your clip to create a polished soundtrack. You can learn more and start using Stable Audio 2.5 here 👉

Stability AI

12,229 次观看 • 8 个月前

New model from Meituan LongCat 🚀 LongCat-Video-Avatar🔥 Audio driven character animation with text, image, and video inputs, all in one! ✨ MIT license ✨ Audio > talking video (single & multi-person) ✨ Natural motion and lip sync ✨ Fewer repeats, stable identity ✨ Available on Hugging Face

Adina Yakup

13,881 次观看 • 6 个月前

Today we’re launching Stable Audio 2.5: The first audio model built for enterprise-grade sound production 🔊 Audio influences brand engagement by 86%, but few enterprises are leveraging audio as an extension of their brand, making customized sound an untapped differentiator. Stable Audio 2.5 is purpose-built for this opportunity to create customizable, high-quality audio at scale, with capabilities that include: ▶️ Improved musical composition: Generate full songs with multi-part structure, meaning a clear intro, middle, and outro. ▶️ Audio inpainting: Input audio, select where the track should start, and the model uses the context to generate the rest of the track. ▶️ Customization: Our team can fine-tune Stable Audio 2.5 to help enterprises create the right sound for their brand. ▶️ Faster inference: The model can generate up to three-minute long tracks in under two seconds on a GPU, outputting in just eight steps (compared to ~50 in the previous model). You can learn more here 👉

Stability AI

67,457 次观看 • 9 个月前

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Zhiyang (Frank) Dou

14,603 次观看 • 11 个月前

Udio v1.5 is here with advanced tools for creators! Make extraordinary music with AI & tag #UdioMusic on your creations.🤘 🔊 Improved Audio Quality 🎛️ Stem Downloads 🔑 Key Guidance 🎨 New Creation Workflows 🌍 Global Language Support 🎥 Shareable Lyric Videos 🎶 Audio to Audio

Udio v1.5 is here with advanced tools for creators! Make extraordinary music with AI & tag #UdioMusic on your creations.🤘 🔊 Improved Audio Quality 🎛️ Stem Downloads 🔑 Key Guidance 🎨 New Creation Workflows 🌍 Global Language Support 🎥 Shareable Lyric Videos 🎶 Audio to Audio

udio

102,585 次观看 • 1 年前

Introducing Wan 2.5 🚀 Create immersive AI videos with fluid motion, built-in audio and voice. Supporting text-to-video and start images Available now for all users

Introducing Wan 2.5 🚀 Create immersive AI videos with fluid motion, built-in audio and voice. Supporting text-to-video and start images Available now for all users

Magnific (formerly Freepik)

170,178 次观看 • 8 个月前

ByteDance just unveiled HuMo, a unified framework for human-centric video generation. With HuMo you can: • Generate videos from text + image • Create audio-synced videos from text + audio • Combine text, image, and audio for maximum control • Preserve consistent subjects across frames • Achieve natural audio-visual synchronization This isn’t just another release , it’s a big step toward fully controllable, coherent, and cinematic AI video.

ByteDance just unveiled HuMo, a unified framework for human-centric video generation. With HuMo you can: • Generate videos from text + image • Create audio-synced videos from text + audio • Combine text, image, and audio for maximum control • Preserve consistent subjects across frames • Achieve natural audio-visual synchronization This isn’t just another release , it’s a big step toward fully controllable, coherent, and cinematic AI video.

DStudioproject

29,408 次观看 • 9 个月前

Introducing Hedra Avatars 🔥 5-minute uncut videos. Any audio you drop. Any character you create. Perfect lip-sync, natural motion, full expressions. Your custom talking head that never stops talking. First 1,000 followers get the official Avatar Guide. Reply “HEDRA AVATAR” and get it dropped in DMs instantly. GO

Hedra

374,334 次观看 • 4 个月前

🚀Introducing Wan2.2-S2V — a 14B parameter model designed for film-grade, audio-driven human animation. 🎬Going beyond basic talking heads to deliver professional-level quality for film, TV, and digital content. And it’s open-source! ✨ Key features: 🔹 Long-video dynamic consistency 🔹 Cinema-quality audio-to-video generation 🔹 Advanced motion and environment control via instruction Perfect for filmmakers, content creators, and developers crafting immersive AI-powered stories. Try it now : Github: Project: Hugging Face Demo: Modelscope Demo: Hugging Face Weights: ModelScope Weights:

🚀Introducing Wan2.2-S2V — a 14B parameter model designed for film-grade, audio-driven human animation. 🎬Going beyond basic talking heads to deliver professional-level quality for film, TV, and digital content. And it’s open-source! ✨ Key features: 🔹 Long-video dynamic consistency 🔹 Cinema-quality audio-to-video generation 🔹 Advanced motion and environment control via instruction Perfect for filmmakers, content creators, and developers crafting immersive AI-powered stories. Try it now : Github: Project: Hugging Face Demo: Modelscope Demo: Hugging Face Weights: ModelScope Weights:

Wan

128,320 次观看 • 9 个月前

Wan2.7 is now live as a Partner Node in ComfyUI — a comprehensive upgrade over 2.6 with better image quality, audio, motion dynamics, stylization, and consistency. More workflows. More control. Here's what's new 👇

Wan2.7 is now live as a Partner Node in ComfyUI — a comprehensive upgrade over 2.6 with better image quality, audio, motion dynamics, stylization, and consistency. More workflows. More control. Here's what's new 👇

ComfyUI

32,808 次观看 • 2 个月前

Seamless play anywhere with all-day comfort. Introducing the G325 LIGHTSPEED: ✅ LIGHTSPEED Wireless and Bluetooth ✅ 212 grams ✅ 24-bit audio 🔗

Seamless play anywhere with all-day comfort. Introducing the G325 LIGHTSPEED: ✅ LIGHTSPEED Wireless and Bluetooth ✅ 212 grams ✅ 24-bit audio 🔗

Logitech G

22,230 次观看 • 4 个月前

Loopy: New Audio-to-Video Lipsyncing Model Looks Insane It generates lifelike facial expressions and movements from audio alone. It captures subtle details like sighs, expressive eyebrows, and natural head gestures, making your videos incredibly realistic. Sample videos 1/5 Rapping statue

Loopy: New Audio-to-Video Lipsyncing Model Looks Insane It generates lifelike facial expressions and movements from audio alone. It captures subtle details like sighs, expressive eyebrows, and natural head gestures, making your videos incredibly realistic. Sample videos 1/5 Rapping statue

el.cine

473,421 次观看 • 1 年前

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

Today we're announcing the open-source release of HunyuanVideo-Foley, our new end-to-end Text-Video-to-Audio (TV2A) framework for generating high-fidelity audio.🚀 This tool empowers creators in video production, filmmaking, and game development to generate professional-grade audio that precisely aligns with visual dynamics and semantic context, addressing key challenges in V2A generation.🔊 Key Innovations: 🔹Exceptional Generalization: Trained on a massive 100k-hour multimodal dataset, the model generates contextually-aware soundscapes for a wide range of scenes, from natural landscapes to animated shorts. 🔹Balanced Multimodal Response: Our innovative multimodal diffusion transformer (MMDiT) architecture ensures the model balances video and text cues, generating rich, layered sound effects that capture every detail—from the main subject to subtle background elements. 🔹High-Fidelity Audio: Using a Representation Alignment (REPA) loss function and a powerful Audio VAE, we've improved generation stability and producing professional-grade audio, free of noise and inconsistencies. HunyuanVideo-Foley achieves SOTA on multiple benchmarks, surpassing all open-source models in audio quality, visual-semantic alignment, and temporal alignment. 👉Try it now: 🌐Project Page: 🔗Code: 📄Technical Report: 🤗Hugging Face:

Tencent Hy

122,539 次观看 • 9 个月前

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

AI at Meta

1,248,668 次观看 • 6 个月前

Create anime videos on BudgetPixel AI with synchronized native audio, high-fidelity motion for difficult movements, and strong generation for complex scenes. Bring fast action, expressive characters, and rich anime worlds to life with more control.

Create anime videos on BudgetPixel AI with synchronized native audio, high-fidelity motion for difficult movements, and strong generation for complex scenes. Bring fast action, expressive characters, and rich anime worlds to life with more control.

BudgetPixel AI

121,551 次观看 • 3 天前