正在加载视频...
视频加载失败
Introducing TAPIR & RoboTAP, our latest research from Google DeepMind. It focuses on spatial intelligence via point tracking, outlining how it enables applications from robotics to video generation to augmented reality, and more!
9 条评论

Our robotic system can learn industry-relevant tasks from 4-6 demonstrations. Above, at each moment, the system automatically identifies which points must move (red) and where they must move to (cyan) to complete the task. Below, we show points as discovered from demos.

In video generation, we demonstrate a system which first generates motions and then generates pixels to match those motions, leading to generated videos containing complex motions while keeping textures consistent over time.

Powering it all is TAPIR, our open-source model which can track with high quality and in real time. Newly-released is our unsupervised clustering code, which lets you segment moving objects automatically from videos. Try it at:

Joint work with @yangyi02, Mel Vecerik, @joaocarreira @tdavchev, @JonathanScholz2, Andrew Zisserman, @yusufaytar, Stannis Zhou, @dilaragoekay, Ankush Gupta, @LourdesAgapito, @RaiaHadsell

@GoogleDeepMind This « points need to move » is a pretty cool way of formalizing the task, congrats!

@GoogleDeepMind

@GoogleDeepMind This is a great video visualization! The moving points immediately made me think about algorithms classes. 😁

@GoogleDeepMind will code for RoboTAP be open-sourced as well?

@DynamicWebPaige @GoogleDeepMind New GPU architecture when? Lol


