Loading video...

Video Failed to Load

Go Home

Microsoft just dropped VASA-1. This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba 10 wild examples: 1. Mona Lisa rapping Paparazzi

7,298,891 views • 2 years ago •via X (Twitter)

13 Comments

Min Choi's profile picture
Min Choi2 years ago

2. Realism and liveliness - example 1

Min Choi's profile picture
Min Choi2 years ago

3. Realism and liveliness - example 2

Min Choi's profile picture
Min Choi2 years ago

4. Out-of-distribution generalization - singing audios

Min Choi's profile picture
Min Choi2 years ago

5. Controllability of generation 1 Example of eye gaze direction and head distance, and emotion offsets

Min Choi's profile picture
Min Choi2 years ago

6. Controllability of generation 2 Example of different emotion offsets

Min Choi's profile picture
Min Choi2 years ago

7. Power of disentanglement Example of same motion sequence with different photos

Min Choi's profile picture
Min Choi2 years ago

8. Power of disentanglement Pose and expression editing

Min Choi's profile picture
Min Choi2 years ago

9. Out-of-distribution generalization - singing audios

Min Choi's profile picture
Min Choi2 years ago

10. Realism and liveliness - example 2

Min Choi's profile picture
Min Choi2 years ago

READ MORE: Official Microsoft Research blog at

Min Choi's profile picture
Min Choi2 years ago

If you enjoyed this thread, Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

Min Choi's profile picture
Min Choi2 years ago

Also check out wild new AI Music Videos 👇

Min Choi's profile picture
Min Choi2 years ago

Also check out my series "AI will disrupt Hollywood (Part 36)" 👇

Related Videos