Video wird geladen...
Video konnte nicht geladen werden
introducing lipsync-1.9-beta, setting a new standard for lipsync quality it’s zero-shot—no training data needed. generate and edit natural speech seamlessly in any live action, animation, and AI-generated video this is the biggest update we’ve ever released, it’s the most natural lipsyncing model in the world 🔥 available now, try... show more
129,075 Aufrufe • vor 1 Jahr •via X (Twitter)
11 Kommentare

how do you design an intuitive product to expose a capability that’s never existed before? you iterate. try out the new platform here 👇 tell us what you like, don’t like. lot’s of exciting updates incoming :)

we’ve slowly rolled out early versions of this model to some of you even with a limited release in beta the response has been overwhelming :) across a small segment of users we’ve already seen this model become the most popular choice generating hundreds of hours in just a few days here’s a side by side comparison w/ 1.8.0 and 1.7.1:

Here are some examples showing how we’ve benchmarked against the latest in open source (lipsync-1.9-beta vs latentsync vs musetalk)

see some more examples of what lipsync-1.9 empowers you to do:

it works well across live action, animated, and even AI generated video (this video is completely ai generated)

video dubbing you can now follow the @lexfridman podcast with @ZelenskyyUa in fluent english as if it were his native tongue, no distracting mismatch between the audio and video

or you can replace dialogue in any scene (original vs resynced with two different dialogues)

this model is special our old pipelines accumulated errors as the video passed from one stage into another lipsync-1.9 is an end-to-end monolith that operates in a single shot. this helps it make very few mistakes across a wide range of videos it marks a profound shift in how we design our models. trained across millions of speakers + tens of thousands of hours of video, this new approach will pave the way to a future where any content can be made in a single take.

ps. we moved a large part of our company to be irl over the last few weeks to bring 1.9-beta into production — this time in India :) here’s a little bts into how we did:

check it out on yt:

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇
