正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Introducing TAPIR & RoboTAP, our latest research from Google DeepMind. It focuses on spatial intelligence via point tracking, outlining how it enables applications from robotics to video generation to augmented reality, and more!

Carl Doersch

2,457 subscribers

47,833 次观看 • 2 年前 •via X (Twitter)

科学技术新闻政治教育

Anya Rossi• Live Now

Private livecam show

9 条评论

Carl Doersch 的头像

Carl Doersch2 年前

Our robotic system can learn industry-relevant tasks from 4-6 demonstrations. Above, at each moment, the system automatically identifies which points must move (red) and where they must move to (cyan) to complete the task. Below, we show points as discovered from demos.

Carl Doersch 的头像

Carl Doersch2 年前

In video generation, we demonstrate a system which first generates motions and then generates pixels to match those motions, leading to generated videos containing complex motions while keeping textures consistent over time.

Carl Doersch 的头像

Carl Doersch2 年前

Powering it all is TAPIR, our open-source model which can track with high quality and in real time. Newly-released is our unsupervised clustering code, which lets you segment moving objects automatically from videos. Try it at:

Carl Doersch 的头像

Carl Doersch2 年前

Joint work with @yangyi02, Mel Vecerik, @joaocarreira @tdavchev, @JonathanScholz2, Andrew Zisserman, @yusufaytar, Stannis Zhou, @dilaragoekay, Ankush Gupta, @LourdesAgapito, @RaiaHadsell

Lucas Beyer (bl16) 的头像

Lucas Beyer (bl16)2 年前

@GoogleDeepMind This « points need to move » is a pretty cool way of formalizing the task, congrats!

Get off X! @ChuckBaggett Chuck Baggett 的头像

Get off X! @ChuckBaggett Chuck Baggett2 年前

@GoogleDeepMind

Marcel Hussing 的头像

Marcel Hussing2 年前

@GoogleDeepMind This is a great video visualization! The moving points immediately made me think about algorithms classes. 😁

We'llmakeitbrahs 的头像

We'llmakeitbrahs2 年前

@GoogleDeepMind will code for RoboTAP be open-sourced as well?

Sam 的头像

Sam2 年前

@DynamicWebPaige @GoogleDeepMind New GPU architecture when? Lol

相关视频

What's next for the future of robotics? Tom and Yuval — Google DeepMind Research Scientists (and long-time friends & collaborators) — shared their paths to Google and their insights on what lies ahead. Learn more about Google's work in artificial intelligence and machine learning ➡️

What's next for the future of robotics? Tom and Yuval — Google DeepMind Research Scientists (and long-time friends & collaborators) — shared their paths to Google and their insights on what lies ahead. Learn more about Google's work in artificial intelligence and machine learning ➡️

Life at Google

39,370 次观看 • 1 个月前

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Introducing Nano Banana Pro (Gemini 3 Pro Image), our new state-of-the-art image generation and editing model from Google DeepMind. It improves on the original model while adding new advanced capabilities, enhanced world knowledge and text rendering, allowing you to create and edit studio-quality, production-ready visuals.

Google

1,896,824 次观看 • 7 个月前

Excellent new fine-grained tracking from DeepMind: TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement arxiv: project: tldr: TapNet for localization then PIPs-style refinement; outperforms everything!

Excellent new fine-grained tracking from DeepMind: TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement arxiv: project: tldr: TapNet for localization then PIPs-style refinement; outperforms everything!

Adam W. Harley

203,961 次观看 • 3 年前

As an AI researcher, are you interested in tracking trends from CV/NLP/ML to robotics—even Nature/Science. Our paper “Real Deep Research for AI, Robotics & Beyond” automates survey generation and trend/topic discovery across fields 🔥Explore RDR at

As an AI researcher, are you interested in tracking trends from CV/NLP/ML to robotics—even Nature/Science. Our paper “Real Deep Research for AI, Robotics & Beyond” automates survey generation and trend/topic discovery across fields 🔥Explore RDR at

Xueyan Zou

43,241 次观看 • 7 个月前

Is it possible to stop Google from tracking you? Learn how your data is monitored and explore simple ways to manage privacy settings and reduce online tracking. #GooglePrivacy #Google #TOKReports

Is it possible to stop Google from tracking you? Learn how your data is monitored and explore simple ways to manage privacy settings and reduce online tracking. #GooglePrivacy #Google #TOKReports

Times of Karachi

20,240 次观看 • 2 个月前

Excited to share Penzai, a JAX research toolkit from Google DeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub:

Excited to share Penzai, a JAX research toolkit from Google DeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub:

Daniel Johnson

338,642 次观看 • 2 年前

Google DeepMind announces Gemini Robotics On-Device - an efficient VLA model optimized to run locally with low-latency inference. It enables general-purpose dexterity, adapts to new tasks or robot hardware with fewer than 100 demos.

Google DeepMind announces Gemini Robotics On-Device - an efficient VLA model optimized to run locally with low-latency inference. It enables general-purpose dexterity, adapts to new tasks or robot hardware with fewer than 100 demos.

The Humanoid Hub

46,528 次观看 • 11 个月前

Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from Google DeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator 🧵👇

Generalist robots need a generalist evaluator. But how do you test safety without breaking things? 💥 🌎 Introducing our new work from Google DeepMind: Evaluating Gemini Robotics Policies in a Veo World Simulator 🧵👇

Anirudha Majumdar

237,388 次观看 • 6 个月前

How to prevent Google from tracking your conversations

How to prevent Google from tracking your conversations

Hub4Learning

11,927 次观看 • 4 个月前

How to prevent Google from tracking your conversations

How to prevent Google from tracking your conversations

THE WHITE RABBIT

17,275 次观看 • 4 个月前

SAM 3D enables accurate 3D reconstruction from a single image, supporting real-world applications in editing, robotics, and interactive scene generation. Matt, a SAM 3D researcher, explains how the two-model design makes this possible for both people and complex environments. 🔗 Read the SAM 3D Objects research paper: 🔗 Read the SAM 3D Body research paper:

SAM 3D enables accurate 3D reconstruction from a single image, supporting real-world applications in editing, robotics, and interactive scene generation. Matt, a SAM 3D researcher, explains how the two-model design makes this possible for both people and complex environments. 🔗 Read the SAM 3D Objects research paper: 🔗 Read the SAM 3D Body research paper:

AI at Meta

17,858 次观看 • 6 个月前

Last month we launched Lyria 3. Today, we’re introducing Lyria 3 Pro: our most advanced music model yet, from Google DeepMind. 🎶 Now you can create tracks up to 3 minutes long with more creative control. We’re also bringing Lyria to more Google products starting today.

Last month we launched Lyria 3. Today, we’re introducing Lyria 3 Pro: our most advanced music model yet, from Google DeepMind. 🎶 Now you can create tracks up to 3 minutes long with more creative control. We’re also bringing Lyria to more Google products starting today.

Google

304,970 次观看 • 2 个月前

Google DeepMind introduced two foundational models for embodied reasoning, enabling robots to comprehend, react, and take action in the physical world: ⦿ Gemini Robotics – built on Gemini 2.0. Integrates vision, language, and action for real-world dexterity, . ⦿ Gemini Robotics-ER – Enhances spatial reasoning for advanced robotic control. They are working with Apptronik to develop the next generation of humanoid robots.

Google DeepMind introduced two foundational models for embodied reasoning, enabling robots to comprehend, react, and take action in the physical world: ⦿ Gemini Robotics – built on Gemini 2.0. Integrates vision, language, and action for real-world dexterity, . ⦿ Gemini Robotics-ER – Enhances spatial reasoning for advanced robotic control. They are working with Apptronik to develop the next generation of humanoid robots.

The Humanoid Hub

73,097 次观看 • 1 年前

Omni from Google DeepMind just dropped 👀 It's a big step forward in video generation when it comes to character consistency, world knowledge, and editing. I've been testing it for the last few days - and I'm excited to walk through some of the key features + my clips 👇

Omni from Google DeepMind just dropped 👀 It's a big step forward in video generation when it comes to character consistency, world knowledge, and editing. I've been testing it for the last few days - and I'm excited to walk through some of the key features + my clips 👇

Justine Moore

84,306 次观看 • 1 个月前

Blur it to protect it! More and more people are blurring their homes on Google Maps to keep criminals from targeting them. The blur makes your home more anonymous and harder to case. Here’s how to do it:

Blur it to protect it! More and more people are blurring their homes on Google Maps to keep criminals from targeting them. The blur makes your home more anonymous and harder to case. Here’s how to do it:

ABC7 Eyewitness News

41,564 次观看 • 1 年前

With every generation of Pixel, Google delivers innovation to consumers, and our latest devices are no different. Hear about how Pixel uses AI to bring helpful experiences to life from VP of Product Management at Pixel, Venkat Rapaka. Learn more →

With every generation of Pixel, Google delivers innovation to consumers, and our latest devices are no different. Hear about how Pixel uses AI to bring helpful experiences to life from VP of Product Management at Pixel, Venkat Rapaka. Learn more →

Made by Google

22,267 次观看 • 1 年前

We created a concept album about AI with award-winning musicians and composers for #GoogleCloudNext and used the Music AI Sandbox, developed by Google DeepMind, and tools from Google Cloud to bring it to life. Press ▶️ to learn more ↓

We created a concept album about AI with award-winning musicians and composers for #GoogleCloudNext and used the Music AI Sandbox, developed by Google DeepMind, and tools from Google Cloud to bring it to life. Press ▶️ to learn more ↓

Google Cloud

30,319 次观看 • 1 年前

Introducing Imagine v0.9, our new video generation model with massive upgrades from v0.1 in visual quality, motion, audio generation, and more. Now available for free on all our products:

Introducing Imagine v0.9, our new video generation model with massive upgrades from v0.1 in visual quality, motion, audio generation, and more. Now available for free on all our products:

xAI

8,310,277 次观看 • 8 个月前

📁 Fei-Fei Li founder of World Labs, says the next leap in AI is not language. Human intelligence does not just speak, it moves, perceives, and acts in the physical world. Spatial intelligence is the real core of intelligence. From text to space, from models to 3D and 4D worlds, from understanding words to interacting with reality. The next chapter is not read, it is inhabited.

📁 Fei-Fei Li founder of World Labs, says the next leap in AI is not language. Human intelligence does not just speak, it moves, perceives, and acts in the physical world. Spatial intelligence is the real core of intelligence. From text to space, from models to 3D and 4D worlds, from understanding words to interacting with reality. The next chapter is not read, it is inhabited.

Jon Hernandez

20,577 次观看 • 4 个月前