
Wenlong Huang @ CVPR
@wenlong_huang • 5,610 subscribers
PhD Student @StanfordSVL. Previously @MIT_CSAIL @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models, Spatial Intelligence.
Videos

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
Wenlong Huang @ CVPR190,836 次观看 • 1 年前

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this is nominated as Best Paper Finalist at #ICRA2025! 🧵👇
Wenlong Huang @ CVPR93,490 次观看 • 1 年前

Very cool to see how robots are now mastering these tasks, but as I read Benjie's blog ( I’m even more impressed by our human ingenuity in translating human dexterity into robot-specific behaviors (by the demonstrators): - wrapping sock on gripper to invert it - mashing plastic bag on table to open it - key reorientation by multiple bimanual handovers, etc Perhaps the next “Olympics” should shift from doing the most challenging tasks to replicating such human ingenuity at test time?
Wenlong Huang20,552 次观看 • 4 个月前
没有更多内容可加载