正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Excited to share my final PhD project😀 We show how simple, yet elegant changes enable diffusion transformers to learn SOTA robotic policies on real robots. Our method improves performance by 20% across a wide range of highly dexterous tasks - like cutting sushi! 1/n

Sudeep Dasari

1,482 subscribers

20,536 次观看 • 1 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

4 条评论

Sudeep Dasari 的头像

Sudeep Dasari1 年前

Our method, DiT-Block Policy, works by adding AdaLN layers to the decoder of a standard transformer diffusion policy. This significantly outperformed standard cross-attention blocks, especially when using fewer DDIM iterations during inference. 2/n

Sudeep Dasari 的头像

Sudeep Dasari1 年前

We release all data and code from our project. This includes BiPlay - a more diverse bi-manual manipulation dataset. Each episode in BiPlay consists of randomized objects, tasks, and settings with accompanied language annotations for scalable learning. 3/n

Sudeep Dasari 的头像

Sudeep Dasari1 年前

Finally, I’d like to give a shoutout to my collaborators @oier_mees, Sebastian Zhao, @mohansrirama, and @svlevine who made this project possible! For more information, check out our website: n/n

Sabeer Saeed 的头像

Sabeer Saeed1 年前

Superb Work!

相关视频

Diffusion models make great images. But can they drive robots? Usually that gets complicated really fast. We figured out how to get a Stable Diffusion model (based on Instruct pix2pix) to drive robotic instruction following. Simple recipe, works on a wide range of tasks. Thread👇

Diffusion models make great images. But can they drive robots? Usually that gets complicated really fast. We figured out how to get a Stable Diffusion model (based on Instruct pix2pix) to drive robotic instruction following. Simple recipe, works on a wide range of tasks. Thread👇

Sergey Levine

126,523 次观看 • 2 年前

Advancing dexterous manipulation through scalable visual sim-to-real transfer. We are excited to share our RSS paper, “ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation.” 🌐 Project page: 1/N 🧵

Advancing dexterous manipulation through scalable visual sim-to-real transfer. We are excited to share our RSS paper, “ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation.” 🌐 Project page: 1/N 🧵

Robotic Systems Lab

39,143 次观看 • 1 个月前

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Excited to finally share Generative Value Learning (GVL), my Google DeepMind project on extracting universal value functions from long-context VLMs via in-context learning! We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+ datasets using SOTA VLMs like Gemini (Try out the demo on our website on your robot video today!) I worked a lot on leveraging foundation models as guidance for robots in my PhD, and to me, this result forges a new frontier in how we can use foundation models for robot learning, given its broad applicability independent of embodiment and task types. Quite excited about how we can build on this work as a community!

Jason Ma

98,090 次观看 • 1 年前

🧵1/n LLMs significantly improve Evolutionary Algorithms for molecular discovery! For 18 different molecular optimization tasks, we demonstrate how to achieve SOTA performance by incorporating different LLMs! Learn more in our new paper! Website: Code)

🧵1/n LLMs significantly improve Evolutionary Algorithms for molecular discovery! For 18 different molecular optimization tasks, we demonstrate how to achieve SOTA performance by incorporating different LLMs! Learn more in our new paper! Website: Code)

Yuanqi Du

17,895 次观看 • 2 年前

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer. Project page: Github:

Yuan Liu

33,363 次观看 • 1 年前

1/n: We are excited to share that our paper on Chroma, a general purpose diffusion model for proteins, is out today in nature! A couple of my favorite highlights in the 🧵below 👇

1/n: We are excited to share that our paper on Chroma, a general purpose diffusion model for proteins, is out today in nature! A couple of my favorite highlights in the 🧵below 👇

Andrew Beam

362,063 次观看 • 2 年前

Our new virtual try-on feature uses a technique called diffusion to show you what clothes look like on a wide range of people. Learn more about the tech that's making it easier for you to get a better sense of what clothes will look like on you →

Our new virtual try-on feature uses a technique called diffusion to show you what clothes look like on a wide range of people. Learn more about the tech that's making it easier for you to get a better sense of what clothes will look like on you →

Google

803,423 次观看 • 3 年前

We’ve all seen videos of humanoid robots performing single tasks that are very impressive, like dancing or karate. But training humanoid robots to perform a wide range of complex motions is difficult. GMT is a general-purpose policy which can learn a wide range of robot motions. Watch Episode #32 of RoboPapers, with Zixuan Chen , co-hosted by Michael Cho - Rbt/Acc and Chris Paxton , to learn more.

We’ve all seen videos of humanoid robots performing single tasks that are very impressive, like dancing or karate. But training humanoid robots to perform a wide range of complex motions is difficult. GMT is a general-purpose policy which can learn a wide range of robot motions. Watch Episode #32 of RoboPapers, with Zixuan Chen , co-hosted by Michael Cho - Rbt/Acc and Chris Paxton , to learn more.

RoboPapers

16,605 次观看 • 9 个月前

Sim2Real RL for Vision-Based Dexterous Manipulation on Humanoids TLDR - we train a humanoid robot with two multifingered hands to perform a range of dexterous manipulation tasks robust generalization and high performance without human demonstration :D

Sim2Real RL for Vision-Based Dexterous Manipulation on Humanoids TLDR - we train a humanoid robot with two multifingered hands to perform a range of dexterous manipulation tasks robust generalization and high performance without human demonstration :D

Toru

49,554 次观看 • 1 年前

How far can a very simple eye go in solving vision tasks? Like a 1-pixel camera? Humans have one of the greatest eyes in nature, while many animals have significantly simpler eyes and visual systems yet show complex perceptual behavior. In an interesting project, we find that many computer vision tasks can be solved without a typical camera and with such simple 1-pixel sensors (photoreceptors). We also find that proper design (e.g., where to place the photoreceptors strategically) makes a big difference, so we developed a computational design method to find them. 🌐 👁️[Solving Vision Tasks with Simple Photoreceptors Instead of Cameras] 🧵1/n

How far can a very simple eye go in solving vision tasks? Like a 1-pixel camera? Humans have one of the greatest eyes in nature, while many animals have significantly simpler eyes and visual systems yet show complex perceptual behavior. In an interesting project, we find that many computer vision tasks can be solved without a typical camera and with such simple 1-pixel sensors (photoreceptors). We also find that proper design (e.g., where to place the photoreceptors strategically) makes a big difference, so we developed a computational design method to find them. 🌐 👁️[Solving Vision Tasks with Simple Photoreceptors Instead of Cameras] 🧵1/n

Amir Zamir

75,870 次观看 • 2 年前

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

AK

30,430 次观看 • 1 年前

(1/10) 🔥Thrilled to introduce OneDiffusion—our latest work in unified diffusion modeling! 🚀 This model bridges the gap between image synthesis and understanding, excelling in a wide range of tasks: T2I, conditional generation, image understanding, identity preservation, multiview generation, and even camera pose estimation. Learn more at: Project: arXiv: Code (on the way):

(1/10) 🔥Thrilled to introduce OneDiffusion—our latest work in unified diffusion modeling! 🚀 This model bridges the gap between image synthesis and understanding, excelling in a wide range of tasks: T2I, conditional generation, image understanding, identity preservation, multiview generation, and even camera pose estimation. Learn more at: Project: arXiv: Code (on the way):

Jiasen Lu

33,383 次观看 • 1 年前

Robotic intelligence requires dexterous tool use, but generalizing across tools is hard. Our CoRL23 paper combines semantics (affordances) with low-level control (sim2real) to show functional grasping that generalizes to hammers, drills and more! 1/n

Robotic intelligence requires dexterous tool use, but generalizing across tools is hard. Our CoRL23 paper combines semantics (affordances) with low-level control (sim2real) to show functional grasping that generalizes to hammers, drills and more! 1/n

Ananye Agarwal

27,759 次观看 • 2 年前

Very excited to share that our DELiVR method is now open access published @NatureMethods. We created a simple, brain-wide cell analysis deep learning tool, no coding needed! Fiji Plugin makes it accessible to all. by Doris Kaltenecker Rami @moritz_negwer

Very excited to share that our DELiVR method is now open access published @NatureMethods. We created a simple, brain-wide cell analysis deep learning tool, no coding needed! Fiji Plugin makes it accessible to all. by Doris Kaltenecker Rami @moritz_negwer

Ali Max Erturk

56,851 次观看 • 2 年前

Teaching robots real dexterity has always been a challenge. But what if they could handle tools like a human? DexterityGen (DexGen) is a new system that helps robots use their hands better. It improves how they grip, move, and handle objects… from holding a pen to using a screwdriver. DexGen learns in simulation and refines its skills in the real world, making robotic hands much more useful. What makes DexGen special? ✅ Smarter movements that refine rough actions into precise skills ✅ Trained on a massive collection of dexterous tasks for better learning ✅ Better teleoperation that makes robotic hand control easier and safer ✅ Handles real-world challenges like small objects, tricky angles, and gravity This moves robots closer to real dexterity. It makes tool use more natural, improves stability, and brings robotic hands one step closer to human-level skill. Seen at Zhao-Heng Yin 🫶 Github: Paper:

Teaching robots real dexterity has always been a challenge. But what if they could handle tools like a human? DexterityGen (DexGen) is a new system that helps robots use their hands better. It improves how they grip, move, and handle objects… from holding a pen to using a screwdriver. DexGen learns in simulation and refines its skills in the real world, making robotic hands much more useful. What makes DexGen special? ✅ Smarter movements that refine rough actions into precise skills ✅ Trained on a massive collection of dexterous tasks for better learning ✅ Better teleoperation that makes robotic hand control easier and safer ✅ Handles real-world challenges like small objects, tricky angles, and gravity This moves robots closer to real dexterity. It makes tool use more natural, improves stability, and brings robotic hands one step closer to human-level skill. Seen at Zhao-Heng Yin 🫶 Github: Paper:

Ilir Aliu

51,490 次观看 • 1 年前

Excited to share 🎨🖌️ 3D Paintbrush - a method for generating local stylized textures on meshes using text as input! Our method predicts a localization map & a highly detailed texture map which conforms to it (1/3)

Excited to share 🎨🖌️ 3D Paintbrush - a method for generating local stylized textures on meshes using text as input! Our method predicts a localization map & a highly detailed texture map which conforms to it (1/3)

Rana Hanocka

48,987 次观看 • 2 年前

Excited to share our #ICML2023 paper ✨LIV✨! Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control. Project: Code & Model: 🧵:

Excited to share our #ICML2023 paper ✨LIV✨! Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control. Project: Code & Model: 🧵:

Jason Ma

55,045 次观看 • 3 年前

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.

Mandi Zhao

120,304 次观看 • 1 年前

Excited to share our #CVPR2023 on synthesizing new views along a camera trajectory from a **single image**! How? 💡 The good old epipolar constraints in a pose-guided diffusion model! Paper: Project:

Excited to share our #CVPR2023 on synthesizing new views along a camera trajectory from a single image! How? 💡 The good old epipolar constraints in a pose-guided diffusion model! Paper: Project:

Jia-Bin Huang

94,196 次观看 • 3 年前

Excited to show a glimpse of the next generation of AlphaFold from our teams at Isomorphic Labs and Google DeepMind. This model expands beyond proteins to include other molecules like small molecules and nucleic acids, and improves accuracy on proteins 1/n

Excited to show a glimpse of the next generation of AlphaFold from our teams at Isomorphic Labs and Google DeepMind. This model expands beyond proteins to include other molecules like small molecules and nucleic acids, and improves accuracy on proteins 1/n

Max Jaderberg

117,438 次观看 • 2 年前