Loading video...

Video Failed to Load

Go Home

It’s finally here! 🎊 We open sourced our #vtuber motion capture solution at #GoogleIO! Our new MediaPipe model predicts 478 face landmarks + 52 blendshapes from your webcam and is compatible with any ARKit rigged avatar! 😺🧵

507,958 views • 3 years ago •via X (Twitter)

11 Comments

Rich 🍈's profile picture
Rich 🍈3 years ago

Try out our web demo at: Ready for Android, JavaScript, C++ or Python developers. We’re gonna see a lot more new Vtuber apps 👀

Rich 🍈's profile picture
Rich 🍈3 years ago

So glad to finally be able to share this. It’s been amazing to work with such talented teammates at Google AI to bring this into reality. Super cathartic moment. 😊

Rich 🍈's profile picture
Rich 🍈3 years ago

Oh hey, you can find my beginner example on the IO website but here's a direct link to the Codepen. It's a super minimal example showcasing Mediapipe's 52 blendshapes + transformation matrix for AR pinning. A great starting point to understand the API!

🥉 Pipkin Pippa 🔌🐰 Phase-Connect's profile picture
🥉 Pipkin Pippa 🔌🐰 Phase-Connect3 years ago

This looks amazing!!

Rich 🍈's profile picture
Rich 🍈3 years ago

Whoa it’s the real Pippa 👀! Thanks!

Fireproof 🐱 VTuber's profile picture
Fireproof 🐱 VTuber3 years ago

This is insanely cool! Is there a way to get it to work with .vrm models from VRoid or would I have to add ARKit support to that avatar?

Rich 🍈's profile picture
Rich 🍈3 years ago

One recommended way is to follow some tutorials on Vroid to PerfectSync blendshape conversion. If the avatar was created in Vroid Studio, Perfect Sync can autorig it to ARKit spec.

Butz Yung's profile picture
Butz Yung3 years ago

Checking the demo right now. Blink is a bit off when I have my glasses on. Things are fine when glasses are off. Now I need to think about how to map blendshapes to MMD morphs lol

TigerHix's profile picture
TigerHix3 years ago

Amazing work!! Congrats on the release. Surely integrating this into @hakuyalabs soon!

bell  ᶘ •̀ ᴥ•́ ᶅ✧'s profile picture
bell ᶘ •̀ ᴥ•́ ᶅ✧3 years ago

this feels like an overly paranoid question but I just want to be sure, is any of the image/cam data being sent back to a server on google's end, or is it kept entirely client-side? with the state of AI datasets being non-con by default I'm wary of anything with the AI label

Rich 🍈's profile picture
Rich 🍈3 years ago

Yup, it’s all done locally. These types of prediction models are all local on your device. There’s no need to run a server for this type of AI. Generative stuff is what typically uses a server because it requires much more processing power.

Related Videos