Yunzhu Li's banner
Yunzhu Li's profile picture

Yunzhu Li

@YunzhuLiYZ9,003 subscribers

Assistant Professor of Computer Science @Columbia @ColumbiaCompSci, Postdoc from @Stanford @StanfordSVL, PhD from @MIT_CSAIL. #Robotics #Vision #Learning

Shorts

You can actually interact with the world simulator directly in the browser. 🤖 Here is a quick screen recording (8x speed) of me playing with it: real-time action-conditioned video prediction across rigid objects, deformable objects, rope, and object piles. Try it yourself (no install required): Huge kudos to my student Yixuan Wang for making the interactive demo happen!

You can actually interact with the world simulator directly in the browser. 🤖 Here is a quick screen recording (8x speed) of me playing with it: real-time action-conditioned video prediction across rigid objects, deformable objects, rope, and object piles. Try it yourself (no install required): Huge kudos to my student Yixuan Wang for making the interactive demo happen!

41,513 просмотров

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

25,279 просмотров

I want to call out one of our most important references: SIMPLER ( by Xuanlin Li (Simon), Kyle Hsu, Jiayuan Gu@NeurIPS2025, Jiajun Wu, Hao Su, Quan Vuong, Ted Xiao, and colleagues, which laid the foundation for using simulation for policy evaluation through a systematic study of appearance and dynamics alignment and metrics for measuring sim–real correlation. (It took many nights of Kaifeng Zhang and Shuo Sha's grinding for that correlation to appear — and once it did, extending to new tasks worked like a charm!)

I want to call out one of our most important references: SIMPLER ( by Xuanlin Li (Simon), Kyle Hsu, Jiayuan Gu@NeurIPS2025, Jiajun Wu, Hao Su, Quan Vuong, Ted Xiao, and colleagues, which laid the foundation for using simulation for policy evaluation through a systematic study of appearance and dynamics alignment and metrics for measuring sim–real correlation. (It took many nights of Kaifeng Zhang and Shuo Sha's grinding for that correlation to appear — and once it did, extending to new tasks worked like a charm!)

11,836 просмотров

🚀 Excited to share our #ICLR2025 work on planning with neural dynamics models! While our lab has developed diverse neural dynamics models for manipulating rigid, deformable, and granular objects, having the model alone doesn’t solve the problem—planning with it remains a challenge. 💡 Enter BaB-ND, led by Keyi and Jiangwei! We propose a scalable, GPU-accelerated branch-and-bound algorithm, inspired by neural network verification, to enable effective planning for diverse objects modeled with neural dynamics. 🔗 Project page (open-source + detailed docs!): 🎥 Watch the video to see T being pushed around obstacles, and check out Keyi’s thread for more details!

🚀 Excited to share our #ICLR2025 work on planning with neural dynamics models! While our lab has developed diverse neural dynamics models for manipulating rigid, deformable, and granular objects, having the model alone doesn’t solve the problem—planning with it remains a challenge. 💡 Enter BaB-ND, led by Keyi and Jiangwei! We propose a scalable, GPU-accelerated branch-and-bound algorithm, inspired by neural network verification, to enable effective planning for diverse objects modeled with neural dynamics. 🔗 Project page (open-source + detailed docs!): 🎥 Watch the video to see T being pushed around obstacles, and check out Keyi’s thread for more details!

10,561 просмотров

Videos

YunzhuLiYZ's profile picture

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

Yunzhu Li

25,279 просмотров • 1 год назад

YunzhuLiYZ's profile picture

I was really impressed by the UMI gripper (Cheng Chi et al.), but a key limitation is that **force-related data wasn’t captured**: humans feel haptic feedback through the mechanical springs, but the robot couldn’t leverage that info, limiting the data’s value for fine-grained manipulation tasks. Led by my amazing students Yolanda Zhu and Binghao Huang, we designed a **portable visuo-tactile gripper** by integrating our dense, flexible tactile arrays with the UMI gripper to enable large-scale in-the-wild data collection. 🔗 We demonstrate **cross-modal representation learning** and **downstream policy learning** on tasks requiring in-hand state estimation (e.g., test tube reorientation) and fine-grained force sensing (e.g., pipette fluid transfer). Key takeaways: - Our flexible tactile arrays store the rich haptic information humans perceive as dense tactile signals. - Portability and robustness are key for in-the-wild data collection; our portable gripper is compact, lightweight, and durable. - Touch provides precise, robust measurements of in-hand object pose, invariant to lighting and viewpoint. - Cross-modal pretraining on large-scale in-the-wild data significantly improves policy robustness and sample efficiency (as shown many times before — and verified again here!). Also check out our previous investigations of dense, flexible tactile grids for understanding human-robot-environment interactions: - Dense tactile glove (Nature ’19): - 3D-ViTac (CoRL ’24):

Yunzhu Li

13,188 просмотров • 10 месяцев назад

Больше нет контента для загрузки