Xiao Ma's banner

Xiao Ma

@yusufma555 • 1,632 subscribers

Research Scientist @ ByteDance Seed. Prev: @SeaAIL @NUSingapore @sjtu1896. All views are my own.

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

I've been working on deformable object manipulation since my PhD. It was totally a nightmare years ago and my PhD advisor was telling me not to work on it for my own good. Today, at ByteDance Seed, we are dropping GR-RL, a new VLA+RL system that manages long-horizon precise dexterous manipulation of deformable objects. This is probably the first real-world RL system to make a robot: ✅ Lace up your shoes end to end ✅ Hit millimeter tolerance repeatedly ✅ Recover from mistakes (See video!) ✅ And complete continuous shoelace threading on a real bimanual platform 📈 Success rate: ↑ from 45.7% → 83.3% Yes, robots can now actually do this. Project page: ArXiv:

I've been working on deformable object manipulation since my PhD. It was totally a nightmare years ago and my PhD advisor was telling me not to work on it for my own good. Today, at ByteDance Seed, we are dropping GR-RL, a new VLA+RL system that manages long-horizon precise dexterous manipulation of deformable objects. This is probably the first real-world RL system to make a robot: ✅ Lace up your shoes end to end ✅ Hit millimeter tolerance repeatedly ✅ Recover from mistakes (See video!) ✅ And complete continuous shoelace threading on a real bimanual platform 📈 Success rate: ↑ from 45.7% → 83.3% Yes, robots can now actually do this. Project page: ArXiv:

109,462 görüntüleme • 6 ay önce

Scaling vision-language-action (VLA) models to high-DoF dexterous hands has long been a "holy grail" challenge due to the high-dimensional action space and data scarcity. As a wrap up of the year 2025, we are releasing GR-Dexter, a holistic hardware-model-data framework for generalist manipulation on a bimanual dexterous-hand robot. This is the first VLA system to achieve: ✅ High-DoF Control: Managing a 56-DoF bimanual system (21-DoF per hand). ✅ Long-Horizon Tasks with tool use: Vacuuming, bread serving with tongs, and table decluttering. ✅ Open-World Generalization: Robust performance with unseen objects and abstract instructions. Project page: ArXiv:

Scaling vision-language-action (VLA) models to high-DoF dexterous hands has long been a "holy grail" challenge due to the high-dimensional action space and data scarcity. As a wrap up of the year 2025, we are releasing GR-Dexter, a holistic hardware-model-data framework for generalist manipulation on a bimanual dexterous-hand robot. This is the first VLA system to achieve: ✅ High-DoF Control: Managing a 56-DoF bimanual system (21-DoF per hand). ✅ Long-Horizon Tasks with tool use: Vacuuming, bread serving with tongs, and table decluttering. ✅ Open-World Generalization: Robust performance with unseen objects and abstract instructions. Project page: ArXiv:

93,692 görüntüleme • 5 ay önce

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable Vision-Language-Action (VLA) model with strong capabilities in complex long-horizon tasks. It understands unseen abstract concepts, manipulates deformable objects robustly, and adapts to novel settings with minimal human data. ✨ Generalization: Generalizes well to unseen objects, environments, and even instructions with abstract concepts. ✨ Long-Horizon Manipulation: Completes long-horizon tasks with strong instruction-following capabilities. ✨ Deformable Object Manipulation: Manipulate deformable objects robustly. Project Page: Arxiv: #ByteDance #ByteDanceSeed #GR3 #VLA #Robotics #FoundationModels

46,260 görüntüleme • 11 ay önce

Daha fazla içerik yok.