Wenlong Huang's banner

Wenlong Huang

@wenlong_huang • 5,771 subscribers

PhD Student @StanfordSVL. Previously @MIT_CSAIL @Berkeley_AI @GoogleDeepMind @NVIDIARobotics. Robotics, Foundation Models, Spatial Intelligence.

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

What if we can simulate an interactive 3D world, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

274,820 次观看 • 6 个月前

What representation enables open-world robot manipulation from generated videos? Introducing Dream2Flow, our recent work that bridges video generation and robot control with 3D object flow. Stanford University #ICRA2026 1/N

What representation enables open-world robot manipulation from generated videos? Introducing Dream2Flow, our recent work that bridges video generation and robot control with 3D object flow. Stanford University #ICRA2026 1/N

106,300 次观看 • 4 个月前

How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

How to harness foundation models for generalization in the wild in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

293,910 次观看 • 3 年前

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

190,914 次观看 • 1 年前

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this is nominated as Best Paper Finalist at #ICRA2025! 🧵👇

How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, all without manual labels. Very excited this is nominated as Best Paper Finalist at #ICRA2025! 🧵👇

93,662 次观看 • 1 年前

Very cool to see how robots are now mastering these tasks, but as I read Benjie's blog ( I’m even more impressed by our human ingenuity in translating human dexterity into robot-specific behaviors (by the demonstrators): - wrapping sock on gripper to invert it - mashing plastic bag on table to open it - key reorientation by multiple bimanual handovers, etc Perhaps the next “Olympics” should shift from doing the most challenging tasks to replicating such human ingenuity at test time?

Very cool to see how robots are now mastering these tasks, but as I read Benjie's blog ( I’m even more impressed by our human ingenuity in translating human dexterity into robot-specific behaviors (by the demonstrators): - wrapping sock on gripper to invert it - mashing plastic bag on table to open it - key reorientation by multiple bimanual handovers, etc Perhaps the next “Olympics” should shift from doing the most challenging tasks to replicating such human ingenuity at test time?

20,552 次观看 • 6 个月前

没有更多内容可加载