正在加载视频...

视频加载失败

Microsoft just dropped VASA-1. This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba 10 wild examples: 1. Mona Lisa rapping Paparazzi

7,298,891 次观看 • 2 年前 •via X (Twitter)

13 条评论

Min Choi 的头像
Min Choi2 年前

2. Realism and liveliness - example 1

Min Choi 的头像
Min Choi2 年前

3. Realism and liveliness - example 2

Min Choi 的头像
Min Choi2 年前

4. Out-of-distribution generalization - singing audios

Min Choi 的头像
Min Choi2 年前

5. Controllability of generation 1 Example of eye gaze direction and head distance, and emotion offsets

Min Choi 的头像
Min Choi2 年前

6. Controllability of generation 2 Example of different emotion offsets

Min Choi 的头像
Min Choi2 年前

7. Power of disentanglement Example of same motion sequence with different photos

Min Choi 的头像
Min Choi2 年前

8. Power of disentanglement Pose and expression editing

Min Choi 的头像
Min Choi2 年前

9. Out-of-distribution generalization - singing audios

Min Choi 的头像
Min Choi2 年前

10. Realism and liveliness - example 2

Min Choi 的头像
Min Choi2 年前

READ MORE: Official Microsoft Research blog at

Min Choi 的头像
Min Choi2 年前

If you enjoyed this thread, Follow me @minchoi and please Bookmark, Like, Comment & Repost the first Post below to share with your friends:

Min Choi 的头像
Min Choi2 年前

Also check out wild new AI Music Videos 👇

Min Choi 的头像
Min Choi2 年前

Also check out my series "AI will disrupt Hollywood (Part 36)" 👇

相关视频