Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

🔊New NVIDIA paper: Audio-SDS🔊 We repurpose Score Distillation Sampling (SDS) for audio, turning any pretrained audio diffusion model into a tool for diverse tasks, including source separation, impact synthesis & more. 🎧 Demos, audio examples, paper:

Jonathan Lorraine

7,087 subscribers

39,375 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

17 Kommentare

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Intuitively, our update moves the audio in a direction to increase its probability given the prompt, by noising and denoising with our diffusion model, then “nudging” our audio towards it by propagating the update through our differentiable rendering to our audio parameters.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

We propose three novel audio tasks: ① FM Synthesis, ② Physical Impact Synthesis, and ③ Prompt-Guided Source Separation. This image briefly summarizes the use case, optimizable parameters, rendering function, and parameter update.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

① FM Synthesis: A toy setup where we generate settings aligning with prompts like “kick drum, bass, reverb” using sine oscillators modulating each other’s frequency as in a synthesizer. We visualize the final optimized parameters as the dial settings on a synthesizer instrument's user interface.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

② Physical Impact Synthesis: We generate impacts consistent with prompts like “hitting pot with wooden spoon” by convolving an impact with a learned object and reverb impulse. We learn the parametrized forms of the object and reverb impulses.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

③ Prompt-Guided Source Separation: A prompt-conditioning source separation for a given audio, such as separating a “sax …” and “cars …” from a music recording on a road, by using the audio-SDS update for each channel while forcing the sum of channels to reconstruct the audio.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Modifications to SDS for Audio Diffusion: 🅰 We use an augmented Decoder-SDS in audio space, 🅱 using a spectrogram emphasis to better weight transients, and 🅲️multiple denoising steps to increase fidelity. This image highlights these in red in the detailed overview of our update.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Results on Fully-Automatic In-the-Wild Source Separation: We demonstrate a pipeline that takes a video from the internet, captions the audio with a model (like AudioCaps), and provides that to an LLM-assistant who suggests source decompositions. We run our method on the suggested decompositions.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Results on Tuning FM Synthesizers & Impact Synthesis: We improve CLAP scores over training for prompts, along with qualitative results. Impact synthesis shows improved performance on impact-oriented prompts.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Results on Prompt-Guided Source Separation: We report an improved SDR to ground-truth sources when available and show improved CLAP scores after training.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

This project was led by the great work of @jrichterpowell, along with Antonio Torralba. See more work from the @NVIDIA Spatial Intelligence Lab: Work supported indirectly by @MIT_CSAIL @VectorInst #nvidia #mit

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

⚠️ Limitations ⚠️ Audio-Model Bias: We rely on Stable Audio Open, so when this struggles, e.g., on rare instruments, speech, audio without silence at the end, or out-of-domain SFX, our method can have difficulties. Other diffusion models can help here. Clip-Length Budget: We optimized on ≤10 s clips; minute-scale audio may have artifacts or blow up memory. A hierarchical/windowed Audio-SDS could help here.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

🔭 Next stops for Audio-SDS ① Working with longer, >minute-scale audio ② Non-text conditioning—tempo, spatial information, etc. ③ Leveraging stereo generation ④ New tasks: learning physical parameters, VR SFX, and beyond ⑤ Drop in other pretrained backbones

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

🚀 Vision of the Future: Content designers easily use one video + audio diffusion backbone with SDS-style updates to nudge any differentiable task—impacts, lighting, cloth, fluids—until the joint model says “looks & sounds right” given powerful user controls, like text.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

💡 SDS treats any differentiable parameter set as optimizable from a prompt. Source-guided separation emerged when we brainstormed novel uses. We hope for similarly practical tasks to surface—e.g., automatic Foley layering?—as the community experiments.

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

Our work is inspired by and builds on the SDS update of DreamFusion ( @poolio, @ajayjain, @jon_barron, @BenMildenHall), and related updates (VSD @zhengyiWang, SDI @ottogin1, @ocariz__, @vincesitzmann, SJC @DuXiaodan, @RaymondYeh, many more!)

Profilbild von Jonathan Lorraine

Jonathan Lorrainevor 1 Jahr

We find a new set of use-cases for Stable Audio Open (@jordiponsdotme, @StabilityAI, @huggingface) or other pretrained audio models (AudioLDM @LiuHaohe, @ZehuaChenICL, @markplumbley, and more)

Profilbild von HUDI

HUDIvor 1 Jahr

🚀 Just released: our groundbreaking documentation update for HUDI! 🐸 Dive deep into the innovative DataMask features and explore the future of decentralized data with our new Data Apps, including the revolutionary Health app. Secure, private, and now truly usable—welcome to the next level of Web3 data management! 🌐🔐 👉🔗 #Web3 #DataPrivacy #DataApps #HUDI #DeFi

Ähnliche Videos

Next gen AI audio separation is here 🤯 AudioSep is a model that can separate audio events, musical instruments, and even enhance speech with natural language queries which makes this a versatile tool for different audio tasks.

Next gen AI audio separation is here 🤯 AudioSep is a model that can separate audio events, musical instruments, and even enhance speech with natural language queries which makes this a versatile tool for different audio tasks.

Dreaming Tulpa 🥓👑

126,029 Aufrufe • vor 2 Jahren

This is big. SOTA audio reasoning. SOTA video reasoning. SOTA audio captioning. SOTA sound event detection. Better than Gemini. Better than Qwen. TAC: Timestamped Audio Captioning 📑 paper: 🌐 website with more demos:

This is big. SOTA audio reasoning. SOTA video reasoning. SOTA audio captioning. SOTA sound event detection. Better than Gemini. Better than Qwen. TAC: Timestamped Audio Captioning 📑 paper: 🌐 website with more demos:

Justin Salamon

19,275 Aufrufe • vor 3 Monaten

Introducing MLX-Audio Studio 🚀 An open-source UI for audio gen. This new UI will allow you to easily generate and transcribe audio locally using MLX-Audio, Transformers or any other backend you prefer (i.e. OpenAI). We will be adding more tasks soon, stay tuned! Get started on our GH:

Introducing MLX-Audio Studio 🚀 An open-source UI for audio gen. This new UI will allow you to easily generate and transcribe audio locally using MLX-Audio, Transformers or any other backend you prefer (i.e. OpenAI). We will be adding more tasks soon, stay tuned! Get started on our GH:

Prince Canuma

51,071 Aufrufe • vor 7 Monaten

Tamil Audio 🎧🔊 Varam 😍

Tamil Audio 🎧🔊 Varam 😍

Indian Gay

10,847 Aufrufe • vor 5 Monaten

Defeating a boss in the best way possible, just absolutely eating them and turning them into adventurer fuel for the next encounter. Audio edit com. for saydrean 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 @StormySquish Shorzie (FA)

Defeating a boss in the best way possible, just absolutely eating them and turning them into adventurer fuel for the next encounter. Audio edit com. for saydrean 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 @StormySquish Shorzie (FA)

J.Fiera/ck19 .°•Audio cömş open.°•

12,103 Aufrufe • vor 1 Jahr

Having fun playing with new native audio capabilities in Gemini 1.5 Pro! ♊ Here’s a demo using audio from the #GoogleIO keynote with examples you can try: transcription, word-level timecodes, and searching audio by drawing. (🔊Video has sound)

Having fun playing with new native audio capabilities in Gemini 1.5 Pro! ♊ Here’s a demo using audio from the #GoogleIO keynote with examples you can try: transcription, word-level timecodes, and searching audio by drawing. (🔊Video has sound)

Alexander Chen

77,091 Aufrufe • vor 2 Jahren

📢New paper We are announcing ReVISE, the first universal audio-visual speech enhancement model powered by SSL. paper: demo: w/ Yossi Adi @TalRemez BowenShi Jacob Donley

📢New paper We are announcing ReVISE, the first universal audio-visual speech enhancement model powered by SSL. paper: demo: w/ Yossi Adi @TalRemez BowenShi Jacob Donley

Wei-Ning Hsu

46,382 Aufrufe • vor 3 Jahren

New in-house plugin called Verticality Audio 🎧🔊 • Efficient dynamic reverberation based on room size & material • Interfaces with custom MetaSound node that imitates sound passing through floors • Doesn't interfere with how the player perceive the audio source point

New in-house plugin called Verticality Audio 🎧🔊 • Efficient dynamic reverberation based on room size & material • Interfaces with custom MetaSound node that imitates sound passing through floors • Doesn't interfere with how the player perceive the audio source point

Beautiful Light

45,063 Aufrufe • vor 10 Monaten

Loopy Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency paper page: With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals to stabilize movements, which may compromise the naturalness and freedom of motion. In this paper, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference. Extensive experiments show that Loopy outperforms recent audio-driven portrait diffusion models, delivering more lifelike and high-quality results across various scenarios.

AK

128,797 Aufrufe • vor 1 Jahr

audio ON 🔊

audio ON 🔊

Tjia💗✨지아

34,731 Aufrufe • vor 4 Monaten

Scones just can’t stop! Audio comm. for Fatherlaw 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Pom #vore | #furry | #furryart | #fatfur

Scones just can’t stop! Audio comm. for Fatherlaw 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Pom #vore | #furry | #furryart | #fatfur

J.Fiera/ck19 .°•Audio cömş open.°•

13,156 Aufrufe • vor 6 Monaten

This Soundbyte has a few more bites to convert/process on #tummytuesday Audio edit com. for Eagle 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 ReddYY (WOLFIE LOVER)

This Soundbyte has a few more bites to convert/process on #tummytuesday Audio edit com. for Eagle 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 ReddYY (WOLFIE LOVER)

J.Fiera/ck19 .°•Audio cömş open.°•

16,686 Aufrufe • vor 1 Jahr

🔊VORE AUDIO🌊 This is the audio version of this comic here: Full audio here: 🔊FA: 🔊Cohost: 🔊Weasyl: Characters belong to @samsonvee.bsky.social and @caoleroni.bsky.social🔞🔜 MFF Comic by find me on bsky!

Sensitive content

🔊VORE AUDIO🌊 This is the audio version of this comic here: Full audio here: 🔊FA: 🔊Cohost: 🔊Weasyl: Characters belong to @samsonvee.bsky.social and @caoleroni.bsky.social🔞🔜 MFF Comic by find me on bsky!

MonsterChow

32,107 Aufrufe • vor 1 Jahr

Just a normal delivery to feed Tom, the all devouring bottomless blob bunny~🐰 Audio commission for @smoothgilson 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Glaz 🦊💚 (Closed for coms) #fatfur | #vore | #audio | #blobfur

Just a normal delivery to feed Tom, the all devouring bottomless blob bunny~🐰 Audio commission for @smoothgilson 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Glaz 🦊💚 (Closed for coms) #fatfur | #vore | #audio | #blobfur

J.Fiera/ck19 .°•Audio cömş open.°•

14,369 Aufrufe • vor 1 Jahr

A heavenly belly, and one hell of a digestion for #tummytuesday Audio edit com for Techy 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 @StormySquish Shorzie (FA)

A heavenly belly, and one hell of a digestion for #tummytuesday Audio edit com for Techy 🔊 by J.Fiera/ck19 .°•Audio cömş open.°• 🎨 @StormySquish Shorzie (FA)

J.Fiera/ck19 .°•Audio cömş open.°•

13,312 Aufrufe • vor 1 Jahr

A team of vitamin “P” (Pokemon) 💊 Audio comm. for Beta 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Terito ooo! #fatfurry | #furry | #vore | #comic

Sensitive content

A team of vitamin “P” (Pokemon) 💊 Audio comm. for Beta 🔊 J.Fiera/ck19 .°•Audio cömş open.°• 🎨 Terito ooo! #fatfurry | #furry | #vore | #comic

J.Fiera/ck19 .°•Audio cömş open.°•

34,501 Aufrufe • vor 8 Monaten

🖼️🎞️🔊📄Excited to introduce Composable Diffusion (CoDi), a new generative-AI foundation model that can take any combo of input modalities & generate any combo of output modalities (text, audio, image, video)! Ziyi Yang Chenguang Zhu Mohit Bansal 🧵👇 #CoDi

🖼️🎞️🔊📄Excited to introduce Composable Diffusion (CoDi), a new generative-AI foundation model that can take any combo of input modalities & generate any combo of output modalities (text, audio, image, video)! Ziyi Yang Chenguang Zhu Mohit Bansal 🧵👇 #CoDi

Zineng Tang

105,269 Aufrufe • vor 3 Jahren

Audio Director Devan Kraushar is here to discuss our recent Patch Notes and give more insight into what the audio team has been focusing on 🔊

Audio Director Devan Kraushar is here to discuss our recent Patch Notes and give more insight into what the audio team has been focusing on 🔊

Apex Legends

187,700 Aufrufe • vor 1 Jahr

Lets Just Say Swagg ☢️ Isn't Getting Any Audio 🚫🔊

Lets Just Say Swagg ☢️ Isn't Getting Any Audio 🚫🔊

Nuke Squad

131,018 Aufrufe • vor 1 Jahr

Súbele el audio 🔊⚽️

Súbele el audio 🔊⚽️

Real Betis Balompié 🌴💚

39,613 Aufrufe • vor 10 Monaten