Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

We just released TAVI -- a robotics framework that combines touch and vision to solve challenging dexterous tasks in under 1 hour. The key? Use human demonstrations to initialize a policy, followed by tactile-based online learning with vision-based rewards. Details in🧵(1/7)

Lerrel Pinto

9,550 subscribers

138,931 Aufrufe • vor 2 Jahren •via X (Twitter)

Gesundheit & Wellness Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

TAVI works in three steps. 1. Collect a few (<6) demonstrations for the task to solve including vision and tactile information 2. Learn a reward function using the visual information using tools in Optimal Transport 3. Use RL to train tactile policy w/ vision rewards (2/7)

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

Here is a fun visualization of the RL training process for the 'sponge flipping' task. The robot starts off failing, and then over time, gets closer to succeeding. The measurement of 'success' is done by OT matching and requires no human labeling. (3/7)

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

To improve visual representations, both for rewards and policy learning, we introduce a new time-contrastive SSL technique that combines contrastive learning with robot state prediction. This simple technique improves success rate by 56% vs. prior SSL objectives. (4/7)

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

One fundamental finding is that while touch is crucial to solving tasks, vision is usually more indicative of success compared to touch. (5/7)

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

TAVI also generalizes quite well to new objects (~ 53%). Most failures are when the shape or the mass drifts significantly from the demonstrated one – see examples in the video below. (6/7)

Profilbild von Lerrel Pinto

Lerrel Pintovor 2 Jahren

All of our code and data is public! Project page: Code: TAVI was led by @irmakkguzey w/ @yinlongdai @justsomecsnerd and @soumithchintala (7/7)

Profilbild von Furkan Gözükara

Furkan Gözükaravor 2 Jahren

this is probably ahead of tesla but it won't get any attention

Profilbild von Akash Bansal

Akash Bansalvor 2 Jahren

This is great!

Profilbild von Maurus Balmer

Maurus Balmervor 2 Jahren

MORE, MORE, MORE, MORE, MUCH MORE ... TIME IS NOW

Profilbild von TuringPost

TuringPostvor 2 Jahren

Impressive!

Ähnliche Videos

Tactile feedback is one of the most important modalities in manipulation, but has been underutilized in dexterous hands. T-Dex is a framework for learning dexterous policies from tactile play data, beating vision and torque-based methods by 1.7x. 🧵👇

Tactile feedback is one of the most important modalities in manipulation, but has been underutilized in dexterous hands. T-Dex is a framework for learning dexterous policies from tactile play data, beating vision and torque-based methods by 1.7x. 🧵👇

Lerrel Pinto

84,611 Aufrufe • vor 3 Jahren

Learning dexterous policies from human videos is challenging due to differences between human and robot hands. We present HuDOR, a method that learns dexterous policies within the robot's physical constraints using just one human video and an hour of online interactions! [1/n]

Learning dexterous policies from human videos is challenging due to differences between human and robot hands. We present HuDOR, a method that learns dexterous policies within the robot's physical constraints using just one human video and an hour of online interactions! [1/n]

Irmak Guzey

65,121 Aufrufe • vor 1 Jahr

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Can we bring human-like Touch to robots🤖? Introducing our CoRL work on 3D-ViTac. Humans rely on both vision 👁️ and touch 🫳 for complex tasks. With combined visual-tactile sensing, robots can now tackle challenging tasks, like precise in-hand reorientation, fragile objects grasping. Website: #Robotics #CoRL2024 #Touch #tactile #AI #ML

Binghao Huang

49,544 Aufrufe • vor 1 Jahr

Sim2Real RL for Vision-Based Dexterous Manipulation on Humanoids TLDR - we train a humanoid robot with two multifingered hands to perform a range of dexterous manipulation tasks robust generalization and high performance without human demonstration :D

Sim2Real RL for Vision-Based Dexterous Manipulation on Humanoids TLDR - we train a humanoid robot with two multifingered hands to perform a range of dexterous manipulation tasks robust generalization and high performance without human demonstration :D

Toru

49,561 Aufrufe • vor 1 Jahr

A touch-aware humanoid manipulation policy that cleans the lab for you🧹🧪 Introducing Humanoid Touch Dream: a real-world system for dexterous, contact-rich humanoid loco-manipulation. Our key idea is simple: the policy predicts future hand forces and tactile latents alongside actions, within a single-stage training framework. 1/7

A touch-aware humanoid manipulation policy that cleans the lab for you🧹🧪 Introducing Humanoid Touch Dream: a real-world system for dexterous, contact-rich humanoid loco-manipulation. Our key idea is simple: the policy predicts future hand forces and tactile latents alongside actions, within a single-stage training framework. 1/7

Yaru Niu

69,938 Aufrufe • vor 3 Monaten

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Max Fu

40,466 Aufrufe • vor 1 Jahr

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Zhongyu Li

11,720 Aufrufe • vor 1 Jahr

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Jiafei Duan

48,777 Aufrufe • vor 1 Jahr

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

Today at Meta FAIR we’re announcing three new cutting-edge developments in robotics and touch perception — and releasing a collection of artifacts to empower the community to build on this work. Details on all of this new work ➡️ 1️⃣ Meta Sparsh is the first general-purpose encoder for vision-based tactile sensing that works across many tactile sensors and many tasks. Trained on 460K+ tactile images using self-supervised learning. 2️⃣ Meta Digit 360 is a breakthrough artificial fingertip-based tactile sensor, equipped with 18+ sensing features to deliver detailed touch data with human-level precision and touch-sensing capabilities. 3️⃣ Meta Digit Plexus is a standardized platform for robotic sensor connections and interactions. It provides a hardware-software solution to integrate tactile sensors on a single robot hand and enables seamless data collection, control and analysis over a single cable. The potential impact of expanding capabilities and components like these for the open source community ranges from medical research to supply chain, manufacturing and much more. We’re excited to continue this work with the broader community.

AI at Meta

453,260 Aufrufe • vor 1 Jahr

How far can a very simple eye go in solving vision tasks? Like a 1-pixel camera? Humans have one of the greatest eyes in nature, while many animals have significantly simpler eyes and visual systems yet show complex perceptual behavior. In an interesting project, we find that many computer vision tasks can be solved without a typical camera and with such simple 1-pixel sensors (photoreceptors). We also find that proper design (e.g., where to place the photoreceptors strategically) makes a big difference, so we developed a computational design method to find them. 🌐 👁️[Solving Vision Tasks with Simple Photoreceptors Instead of Cameras] 🧵1/n

How far can a very simple eye go in solving vision tasks? Like a 1-pixel camera? Humans have one of the greatest eyes in nature, while many animals have significantly simpler eyes and visual systems yet show complex perceptual behavior. In an interesting project, we find that many computer vision tasks can be solved without a typical camera and with such simple 1-pixel sensors (photoreceptors). We also find that proper design (e.g., where to place the photoreceptors strategically) makes a big difference, so we developed a computational design method to find them. 🌐 👁️[Solving Vision Tasks with Simple Photoreceptors Instead of Cameras] 🧵1/n

Amir Zamir

75,892 Aufrufe • vor 2 Jahren

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

Edward Johns

74,683 Aufrufe • vor 1 Jahr

🤖ROBOTS ARE GETTING SMARTER AT TOUCHING THE REAL WORLD Researchers from UC Berkeley, NVIDIA and Stanford introduced T-Rex, a framework that combines vision, language and tactile sensing. Instead of relying on cameras alone, robots can now respond to physical contact in real time. Robots are no longer just seeing objects. They’re learning how to FEEL them.

🤖ROBOTS ARE GETTING SMARTER AT TOUCHING THE REAL WORLD Researchers from UC Berkeley, NVIDIA and Stanford introduced T-Rex, a framework that combines vision, language and tactile sensing. Instead of relying on cameras alone, robots can now respond to physical contact in real time. Robots are no longer just seeing objects. They’re learning how to FEEL them.

Coin Bureau

28,275 Aufrufe • vor 1 Monat

Robots are the bottleneck in scaling robotics, and learning from human video promises to solve it. But how can chaotic human data ever measure up to sanitized, lab-made teleoperation data? Introducing Do as I Do: establishing a much needed correspondence between human videos and dexterous robot data. Some fun insights below: 🧵

Robots are the bottleneck in scaling robotics, and learning from human video promises to solve it. But how can chaotic human data ever measure up to sanitized, lab-made teleoperation data? Introducing Do as I Do: establishing a much needed correspondence between human videos and dexterous robot data. Some fun insights below: 🧵

Mahi Shafiullah 🏠🤖

93,934 Aufrufe • vor 1 Monat

After two years in stealth mode, we're thrilled to unveil Mentee Robotics and our humanoid robot, Mentee Robotics! With AI integration at every layer, from Sim2Real machine learning to NeRF-based algorithms and LLMs, we've achieved a complete end-to-end cycle for tasks. This is just the beginning and we can’t wait to share more

After two years in stealth mode, we're thrilled to unveil Mentee Robotics and our humanoid robot, Mentee Robotics! With AI integration at every layer, from Sim2Real machine learning to NeRF-based algorithms and LLMs, we've achieved a complete end-to-end cycle for tasks. This is just the beginning and we can’t wait to share more

Amnon Shashua

223,875 Aufrufe • vor 2 Jahren

1/🧠Humans are the best robot data source — but video alone misses one thing: force. 2/🙁Tactile gloves capture force — but they're costly and block the real touch manipulation depends on. 3/💪Maybe the future of touch lives on your wrist: surface EMG reads the muscles that cause force — tactile sensing without ever touching a tactile sensor. 4/🔥Want a fully open-source framework — hardware + software — to train your own force-aware learn-from-human-data robot policy? 🚀We introduce ForceBand: Learning Forceful Manipulation with sEMG -- bring force into human videos with sEMG, for force-aware manipulation ⬇️ ✦ Zero-Shot Human-to-Robot Transfer ✦ Force Beyond Vision ✦ Free-Hand Force Sensing ✦ Collect by Anyone, Anytime, Anywhere ✦ Deploy on Any Robot, Any Camera, Any Environment ✦ Open-Source & Low-Cost & Easy-to-Implement Let's squeeze every bit of signal out of human data, and let robots feel the force! 🌐 Website: 📄 Paper: 💻 Code: 🎥 Video: 🧵 1/n

1/🧠Humans are the best robot data source — but video alone misses one thing: force. 2/🙁Tactile gloves capture force — but they're costly and block the real touch manipulation depends on. 3/💪Maybe the future of touch lives on your wrist: surface EMG reads the muscles that cause force — tactile sensing without ever touching a tactile sensor. 4/🔥Want a fully open-source framework — hardware + software — to train your own force-aware learn-from-human-data robot policy? 🚀We introduce ForceBand: Learning Forceful Manipulation with sEMG -- bring force into human videos with sEMG, for force-aware manipulation ⬇️ ✦ Zero-Shot Human-to-Robot Transfer ✦ Force Beyond Vision ✦ Free-Hand Force Sensing ✦ Collect by Anyone, Anytime, Anywhere ✦ Deploy on Any Robot, Any Camera, Any Environment ✦ Open-Source & Low-Cost & Easy-to-Implement Let's squeeze every bit of signal out of human data, and let robots feel the force! 🌐 Website: 📄 Paper: 💻 Code: 🎥 Video: 🧵 1/n

Zhi (Leo) Wang

50,525 Aufrufe • vor 1 Monat

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

We are back. After one year of quiet building. Introducing GENE-26.5, our first robotic brain that takes a major step toward human-level capability. For years, robotics has struggled to learn from the world’s largest and valuable data source: Humans. Solving it means rethinking the whole stack from the ground up: - A robotics-native foundation model. - A 1:1 human-like robotic hand. - A noninvasive data collection glove for motion, force, and touch. - A simulator that turns weeks of experiments into minutes. GENE-26.5 is trained across language, vision, proprioception, tactile, and action. We designed a set of tasks to test how far we can go with this new paradigm. Fully autonomous, 1x speed, one model, same weights. (Enjoy with sound on) We are approaching the endgame for robotics. And this is just a beginning.

Genesis AI

2,719,293 Aufrufe • vor 2 Monaten

Touch: A Game-Changer in Dexterity for Robots You can’t be responsive to the world if you can’t feel it. That’s true for people—and it's just as true for robots. We’re zeroed in on a practical, yet innovative, approach to creating general purpose robots designed to perform physical work. And that means hands, powered not just by vision, but by touch. Vision alone falls short. There are many instances where our vision is obstructed while performing work tasks–like grabbing items out of a bin–you lose line of sight. You need tactile intelligence—texture, pressure, contact—with real-time feedback to move faster, smarter, and with confidence. Watch a full explainer video on Sanctuary AI’s latest touch sensors on our blog: #Robotics #AI #HumanoidRobots #Dexterity #TactileSensing #SanctuaryAI

Touch: A Game-Changer in Dexterity for Robots You can’t be responsive to the world if you can’t feel it. That’s true for people—and it's just as true for robots. We’re zeroed in on a practical, yet innovative, approach to creating general purpose robots designed to perform physical work. And that means hands, powered not just by vision, but by touch. Vision alone falls short. There are many instances where our vision is obstructed while performing work tasks–like grabbing items out of a bin–you lose line of sight. You need tactile intelligence—texture, pressure, contact—with real-time feedback to move faster, smarter, and with confidence. Watch a full explainer video on Sanctuary AI’s latest touch sensors on our blog: #Robotics #AI #HumanoidRobots #Dexterity #TactileSensing #SanctuaryAI

Sanctuary AI

53,269 Aufrufe • vor 1 Jahr

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing discuss: We present a novel approach to synthesize dexterous motions for physically simulated hands in tasks that require coordination between the control of two hands with high temporal precision. Instead of directly learning a joint policy to control two hands, our approach performs bimanual control through cooperative learning where each hand is treated as an individual agent. The individual policies for each hand are first trained separately, and then synchronized through latent space manipulation in a centralized environment to serve as a joint policy for two-hand control. By doing so, we avoid directly performing policy learning in the joint state-action space of two hands with higher dimensions, greatly improving the overall training efficiency. We demonstrate the effectiveness of our proposed approach in the challenging guitar-playing task. The virtual guitarist trained by our approach can synthesize motions from unstructured reference data of general guitar-playing practice motions, and accurately play diverse rhythms with complex chord pressing and string picking patterns based on the input guitar tabs that do not exist in the references. Along with this paper, we provide the motion capture data that we collected as the reference for policy training.

AK

26,855 Aufrufe • vor 1 Jahr

After Xiao Wu, meet Xiao Liu. 🤖 Tencent Robotics X’s latest robot isn’t just learning massage moves. It’s learning the touch behind them — where to press and exactly how hard. A custom data system captures vision, touch, and force from human demonstrations. Using reinforcement learning, Xiao Liu learns to reproduce both the trajectory and the pressure. Getting to the right spot is one thing. Getting the pressure right is another. My shoulders volunteer as tribute.

After Xiao Wu, meet Xiao Liu. 🤖 Tencent Robotics X’s latest robot isn’t just learning massage moves. It’s learning the touch behind them — where to press and exactly how hard. A custom data system captures vision, touch, and force from human demonstrations. Using reinforcement learning, Xiao Liu learns to reproduce both the trajectory and the pressure. Getting to the right spot is one thing. Getting the pressure right is another. My shoulders volunteer as tribute.

RoboHub🤖

15,397 Aufrufe • vor 15 Tagen

JARVIS-VLA just dropped on Hugging Face Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse obtain VLA models in Minecraft that can follow human instructions on over 1k different atomic tasks, including crafting, smelting, cooking, mining, and killing. experiments demonstrate that post-training on non-trajectory tasks leads to a significant 40% improvement over the best agent baseline on a diverse set of atomic tasks. Furthermore, demonstrate that approach surpasses traditional imitation learning-based policies in Minecraft, achieving state-of-the-art performance.

JARVIS-VLA just dropped on Hugging Face Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse obtain VLA models in Minecraft that can follow human instructions on over 1k different atomic tasks, including crafting, smelting, cooking, mining, and killing. experiments demonstrate that post-training on non-trajectory tasks leads to a significant 40% improvement over the best agent baseline on a diverse set of atomic tasks. Furthermore, demonstrate that approach surpasses traditional imitation learning-based policies in Minecraft, achieving state-of-the-art performance.

AK

60,243 Aufrufe • vor 1 Jahr