Loading video...
Video Failed to Load
Microsoft just dropped VASA-1. This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba 10 wild examples: 1. Mona Lisa rapping Paparazzi
7,298,891 views • 2 years ago •via X (Twitter)
13 Comments

2. Realism and liveliness - example 1

3. Realism and liveliness - example 2

4. Out-of-distribution generalization - singing audios

5. Controllability of generation 1 Example of eye gaze direction and head distance, and emotion offsets

6. Controllability of generation 2 Example of different emotion offsets

7. Power of disentanglement Example of same motion sequence with different photos

8. Power of disentanglement Pose and expression editing

9. Out-of-distribution generalization - singing audios

10. Realism and liveliness - example 2

READ MORE: Official Microsoft Research blog at

If you enjoyed this thread, Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

Also check out wild new AI Music Videos 👇

Also check out my series "AI will disrupt Hollywood (Part 36)" 👇
