Loading video...
Video Failed to Load
It’s finally here! 🎊 We open sourced our #vtuber motion capture solution at #GoogleIO! Our new MediaPipe model predicts 478 face landmarks + 52 blendshapes from your webcam and is compatible with any ARKit rigged avatar! 😺🧵
507,958 views • 3 years ago •via X (Twitter)
11 Comments

Try out our web demo at: Ready for Android, JavaScript, C++ or Python developers. We’re gonna see a lot more new Vtuber apps 👀

So glad to finally be able to share this. It’s been amazing to work with such talented teammates at Google AI to bring this into reality. Super cathartic moment. 😊

Oh hey, you can find my beginner example on the IO website but here's a direct link to the Codepen. It's a super minimal example showcasing Mediapipe's 52 blendshapes + transformation matrix for AR pinning. A great starting point to understand the API!

This looks amazing!!

Whoa it’s the real Pippa 👀! Thanks!

This is insanely cool! Is there a way to get it to work with .vrm models from VRoid or would I have to add ARKit support to that avatar?

One recommended way is to follow some tutorials on Vroid to PerfectSync blendshape conversion. If the avatar was created in Vroid Studio, Perfect Sync can autorig it to ARKit spec.

Checking the demo right now. Blink is a bit off when I have my glasses on. Things are fine when glasses are off. Now I need to think about how to map blendshapes to MMD morphs lol

Amazing work!! Congrats on the release. Surely integrating this into @hakuyalabs soon!

this feels like an overly paranoid question but I just want to be sure, is any of the image/cam data being sent back to a server on google's end, or is it kept entirely client-side? with the state of AI datasets being non-con by default I'm wary of anything with the AI label

Yup, it’s all done locally. These types of prediction models are all local on your device. There’s no need to run a server for this type of AI. Generative stuff is what typically uses a server because it requires much more processing power.

