Loading video...
Video Failed to Load
We just released TAVI -- a robotics framework that combines touch and vision to solve challenging dexterous tasks in under 1 hour. The key? Use human demonstrations to initialize a policy, followed by tactile-based online learning with vision-based rewards. Details in🧵(1/7)
138,536 views • 2 years ago •via X (Twitter)
10 Comments

TAVI works in three steps. 1. Collect a few (<6) demonstrations for the task to solve including vision and tactile information 2. Learn a reward function using the visual information using tools in Optimal Transport 3. Use RL to train tactile policy w/ vision rewards (2/7)

Here is a fun visualization of the RL training process for the 'sponge flipping' task. The robot starts off failing, and then over time, gets closer to succeeding. The measurement of 'success' is done by OT matching and requires no human labeling. (3/7)

To improve visual representations, both for rewards and policy learning, we introduce a new time-contrastive SSL technique that combines contrastive learning with robot state prediction. This simple technique improves success rate by 56% vs. prior SSL objectives. (4/7)

One fundamental finding is that while touch is crucial to solving tasks, vision is usually more indicative of success compared to touch. (5/7)

TAVI also generalizes quite well to new objects (~ 53%). Most failures are when the shape or the mass drifts significantly from the demonstrated one – see examples in the video below. (6/7)

All of our code and data is public! Project page: Code: TAVI was led by @irmakkguzey w/ @yinlongdai @justsomecsnerd and @soumithchintala (7/7)

this is probably ahead of tesla but it won't get any attention

This is great!

MORE, MORE, MORE, MORE, MUCH MORE ... TIME IS NOW

Impressive!
