
Wenlong Huang @ CVPR
@wenlong_huang • 5,610 subscribers
PhD Student @StanfordSVL. Previously @MIT_CSAIL @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models, Spatial Intelligence.
Videos

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA
Wenlong Huang @ CVPR246,330 görüntüleme • 5 ay önce

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
Wenlong Huang @ CVPR190,836 görüntüleme • 1 yıl önce

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this is nominated as Best Paper Finalist at #ICRA2025! 🧵👇
Wenlong Huang @ CVPR93,490 görüntüleme • 1 yıl önce

Very cool to see how robots are now mastering these tasks, but as I read Benjie's blog ( I’m even more impressed by our human ingenuity in translating human dexterity into robot-specific behaviors (by the demonstrators): - wrapping sock on gripper to invert it - mashing plastic bag on table to open it - key reorientation by multiple bimanual handovers, etc Perhaps the next “Olympics” should shift from doing the most challenging tasks to replicating such human ingenuity at test time?
Wenlong Huang20,552 görüntüleme • 4 ay önce
Daha fazla içerik yok.