Siyuan Huang's banner

Siyuan Huang

@siyuanhuang95 • 5,038 subscribers

Research Scientist at BIGAI, Director of Center for Embodied AI and Robotics. Ph.D. in Statistics from @UCLA. Former intern at @DeepMind and @MetaAI.

Shorts

Watching the G1 robot doing Webster flip by the side🙀. Such moments make humanoid robot research so much fun 😃😃

Watching the G1 robot doing Webster flip by the side🙀. Such moments make humanoid robot research so much fun 😃😃

245,160 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

The CEO of Unitree, XingXing Wang, posted a dancing video at Rednote against the hype that the previous dance video was AI- or CG- generated. The dance is performed before a mirror and with sound, which makes it 100% real. Really cool demo! #Unitree #Humanoid #RobotDance

The CEO of Unitree, XingXing Wang, posted a dancing video at Rednote against the hype that the previous dance video was AI- or CG- generated. The dance is performed before a mirror and with sound, which makes it 100% real. Really cool demo! #Unitree #Humanoid #RobotDance

1,317,240 次观看 • 1 年前

You might have seen the WuBOT performing at the 2026 Spring Festival Gala; however, most high-dynamic extreme motions you see are executed by overfitted tracking policies. Until now, training a unified policy capable of performing various extreme motions with a high success rate remained an unsolved challenge. We spent an entire year digging into the barrier between general tracking and extreme physical behaviors. After burning through dozens of G1 robots, we finally identified the bottleneck of learning and physical executability. With these discoveries, we developed OmniXtreme: the first general policy that can execute diverse extreme motions, including consecutive flips, extreme balancing, and even breakdancing with rapid contact switches! This capability is achieved by pre-training a flow-based generative control policy and then post-training with actuation-aware residual RL for complex physical dynamics—a step we found critical for successful real-world transfer. This work is a joint collaboration with Unitree. Together, we are pushing the physical limits of humanoid robots. It is incredibly exciting to see a general "robot gymnast" and "robot breakdancer" come to life! It was also our first time publishing a paper with XingXing, which was an enlightening experience. The model checkpoints are now released—we welcome you to play with them! 📦 📄 Paper: 🌐 Project: 💻 Code:

You might have seen the WuBOT performing at the 2026 Spring Festival Gala; however, most high-dynamic extreme motions you see are executed by overfitted tracking policies. Until now, training a unified policy capable of performing various extreme motions with a high success rate remained an unsolved challenge. We spent an entire year digging into the barrier between general tracking and extreme physical behaviors. After burning through dozens of G1 robots, we finally identified the bottleneck of learning and physical executability. With these discoveries, we developed OmniXtreme: the first general policy that can execute diverse extreme motions, including consecutive flips, extreme balancing, and even breakdancing with rapid contact switches! This capability is achieved by pre-training a flow-based generative control policy and then post-training with actuation-aware residual RL for complex physical dynamics—a step we found critical for successful real-world transfer. This work is a joint collaboration with Unitree. Together, we are pushing the physical limits of humanoid robots. It is incredibly exciting to see a general "robot gymnast" and "robot breakdancer" come to life! It was also our first time publishing a paper with XingXing, which was an enlightening experience. The model checkpoints are now released—we welcome you to play with them! 📦 📄 Paper: 🌐 Project: 💻 Code:

107,008 次观看 • 4 个月前

🎉🎉🎉 We won the champion in the solo dance contest at the first World Humanoid Robot Games, partner with Unitree ! Here is the full video! Training the robot to perform a long-term dancing (2:30 mins) with stability, smoothness, and agility is much more challenging than we expected. The robot needs to dance with the rhythm, keep global position, move dynamically and cannot fall. You cannot cherry pick on the playing field. More technical details will be released in the future.

🎉🎉🎉 We won the champion in the solo dance contest at the first World Humanoid Robot Games, partner with Unitree ! Here is the full video! Training the robot to perform a long-term dancing (2:30 mins) with stability, smoothness, and agility is much more challenging than we expected. The robot needs to dance with the rhythm, keep global position, move dynamically and cannot fall. You cannot cherry pick on the playing field. More technical details will be released in the future.

117,176 次观看 • 11 个月前

🤖 Ever dreamed of controlling a humanoid robot to perform complex, long-horizon tasks — using just a single Vision Pro? 🎉 Meet CLONE: a holistic, closed-loop, whole-body teleoperation system for long-horizon humanoid control! 🏃‍♂️🧍 CLONE enables rich and coordinated interactive tasks: 🥊 boxing 🏓 table tennis 🤲 object pickup 📦 room arrangement 🤝 handover … and more! 🌀 Our closed-loop error correction powered by LiDAR odometry ensures precision, while motion-captured demonstrations supercharge policy learning — unlocking the full potential of the G1 robot. 🎥 It’s hard to squeeze the magic into 1 minute — check out the full video demo and project page here: 🔗

🤖 Ever dreamed of controlling a humanoid robot to perform complex, long-horizon tasks — using just a single Vision Pro? 🎉 Meet CLONE: a holistic, closed-loop, whole-body teleoperation system for long-horizon humanoid control! 🏃‍♂️🧍 CLONE enables rich and coordinated interactive tasks: 🥊 boxing 🏓 table tennis 🤲 object pickup 📦 room arrangement 🤝 handover … and more! 🌀 Our closed-loop error correction powered by LiDAR odometry ensures precision, while motion-captured demonstrations supercharge policy learning — unlocking the full potential of the G1 robot. 🎥 It’s hard to squeeze the magic into 1 minute — check out the full video demo and project page here: 🔗

66,737 次观看 • 1 年前

🥰Super excited that SceneWeaver ( won the best paper award at the IROS25 RoboGen workshop. SceneWeaver provides an agentic framework for tool-based 3D scene generation, given a language description as input, you can generate or edit a corresponding details with lots of details.

🥰Super excited that SceneWeaver ( won the best paper award at the IROS25 RoboGen workshop. SceneWeaver provides an agentic framework for tool-based 3D scene generation, given a language description as input, you can generate or edit a corresponding details with lots of details.

43,681 次观看 • 9 个月前

Scaling 3D scene data is a long-standing challenge in scene understanding, spatial reasoning, and robotics. Since scanning, reconstruction, and labeling are so labor-intensive, data scarcity has remained a major bottleneck. 🛑 To solve this, we propose SceneVerse++: Lifting Unlabeled Internet-level Data for 3D Scene Understanding (CVPR 2026). By reconstructing internet videos and annotating 3D scenes automatically, we’ve created a massive real-world dataset for end-to-end understanding. 🌐📐 SceneVerse++ makes it easy to scale "in-the-wild" 3D scenes toward more capable spatial reasoning systems. This significantly promotes progress in 3D VQA, visual navigation, and broader tasks in Embodied AI and Robotics. 🤖🦾 We are fully open-sourced! Check out the paper, code, and data here: 🌐 Project: 📄 Paper: 📊 Dataset: Code:

Scaling 3D scene data is a long-standing challenge in scene understanding, spatial reasoning, and robotics. Since scanning, reconstruction, and labeling are so labor-intensive, data scarcity has remained a major bottleneck. 🛑 To solve this, we propose SceneVerse++: Lifting Unlabeled Internet-level Data for 3D Scene Understanding (CVPR 2026). By reconstructing internet videos and annotating 3D scenes automatically, we’ve created a massive real-world dataset for end-to-end understanding. 🌐📐 SceneVerse++ makes it easy to scale "in-the-wild" 3D scenes toward more capable spatial reasoning systems. This significantly promotes progress in 3D VQA, visual navigation, and broader tasks in Embodied AI and Robotics. 🤖🦾 We are fully open-sourced! Check out the paper, code, and data here: 🌐 Project: 📄 Paper: 📊 Dataset: Code:

12,612 次观看 • 2 个月前

Excited to introduce COLA: Learning Human–Humanoid Coordination for Collaborative Object Carrying 🤝🤖 COLA makes humanoids truly helpful in human collaboration — capable of carrying objects, pushing carts, or responding to human push commands. It provides a proprioception-only policy for compliant human–humanoid coordination across diverse movement patterns. The core idea is simple yet effective: 👉 Fine-tune a collaborative policy from a locomotion policy using a residual teacher. 👉 Train in simulation and distill to a real-world student policy for deployment. Paper: Project:

Excited to introduce COLA: Learning Human–Humanoid Coordination for Collaborative Object Carrying 🤝🤖 COLA makes humanoids truly helpful in human collaboration — capable of carrying objects, pushing carts, or responding to human push commands. It provides a proprioception-only policy for compliant human–humanoid coordination across diverse movement patterns. The core idea is simple yet effective: 👉 Fine-tune a collaborative policy from a locomotion policy using a residual teacher. 👉 Train in simulation and distill to a real-world student policy for deployment. Paper: Project:

21,932 次观看 • 9 个月前

🎉🎉🎉Super excited to announce that we launched a joint lab of Embodied AI and Humanoid Robot between UniTree and BIGAI! I will direct the lab to foster the research in integrating the 3D scene understanding capability to Humanoid robot. Hope to see more generalist Humanoid robots in our home very soon😁

🎉🎉🎉Super excited to announce that we launched a joint lab of Embodied AI and Humanoid Robot between UniTree and BIGAI! I will direct the lab to foster the research in integrating the 3D scene understanding capability to Humanoid robot. Hope to see more generalist Humanoid robots in our home very soon😁

42,142 次观看 • 2 年前

📢📢📢 Excited to release ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning (CVPR25). 🤏🤙✌️With ManipTrans, we can transfer dexterous manipulation skills into robotic hands in simulation and deploy them on a real robot, using a residual policy learned for dex manipulation. 🤖🤖🤖The video below illustrates how the MoCap data can be transferred to Inspire, Shadow, Xhand, Allegro, and Mano. With ManipTrans, we can scale up dex manip data greatly with minimal effort. For more details, please check our -webpage: -code: -huggingface:

📢📢📢 Excited to release ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning (CVPR25). 🤏🤙✌️With ManipTrans, we can transfer dexterous manipulation skills into robotic hands in simulation and deploy them on a real robot, using a residual policy learned for dex manipulation. 🤖🤖🤖The video below illustrates how the MoCap data can be transferred to Inspire, Shadow, Xhand, Allegro, and Mano. With ManipTrans, we can scale up dex manip data greatly with minimal effort. For more details, please check our -webpage: -code: -huggingface:

20,918 次观看 • 1 年前

Humanoid robots shouldn't just follow pre-defined movements—they should perform! 🤖✨ Introducing UniAct: A unified model for multimodal motion generation and action streaming. Most humanoids are limited to pre-designed moves. UniAct changes the game by allowing robots to generate live action sequences from: 📝 Text instructions 🎶 Music rhythms 📉 Spatial trajectories 🔄 Cross-modal signals Whether it’s dancing to a beat or following a complex path, UniAct brings humanoid robots to life in real-time. 🚀 🔗 Project: 📄 Paper:

Humanoid robots shouldn't just follow pre-defined movements—they should perform! 🤖✨ Introducing UniAct: A unified model for multimodal motion generation and action streaming. Most humanoids are limited to pre-designed moves. UniAct changes the game by allowing robots to generate live action sequences from: 📝 Text instructions 🎶 Music rhythms 📉 Spatial trajectories 🔄 Cross-modal signals Whether it’s dancing to a beat or following a complex path, UniAct brings humanoid robots to life in real-time. 🚀 🔗 Project: 📄 Paper:

10,569 次观看 • 6 个月前

🎤🎤 Excited to introduce COME-robot🤖🤖, Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V. It is the first closed-loop framework utilizing the vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot demonstrates a significant improvement in task success rate (~25%) compared to SOTA methods. Project: Arxiv:

🎤🎤 Excited to introduce COME-robot🤖🤖, Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V. It is the first closed-loop framework utilizing the vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. COME-robot demonstrates a significant improvement in task success rate (~25%) compared to SOTA methods. Project: Arxiv:

22,291 次观看 • 2 年前

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

22,710 次观看 • 2 年前

Getting up and walking around, it seems he is very familiar with indoor environments and human interaction.

Getting up and walking around, it seems he is very familiar with indoor environments and human interaction.

15,577 次观看 • 1 年前

📢📢📢 Excited to share our ICLR25 work, ArtGS, which can reconstruct articulated objects and scenes from human-scene interactions. 💻🖱️🧱🪑📖 Understanding physics and articulation is extremely important. With ArtGS, we can build a digital and simulatable replica of the real 3D world with minimal human effort!

📢📢📢 Excited to share our ICLR25 work, ArtGS, which can reconstruct articulated objects and scenes from human-scene interactions. 💻🖱️🧱🪑📖 Understanding physics and articulation is extremely important. With ArtGS, we can build a digital and simulatable replica of the real 3D world with minimal human effort!

12,819 次观看 • 1 年前

🤖🤖🤖 Following RoboVerse, we introduce another work focused on Robotic Tactile Simulation - Taccel Simulator. Taccel is a high-performance simulation platform for vision-based tactile sensors and robots. 🚀🚀🚀 Boosted by Nvidia Warp, we optimize Taccel with highly parallelized simulations and support 900fps simulation with 4k+ parallel training envs. 🤝🤝🤝 Taccel is designed with user-friendly APIs and is easy to use. We open-sourced all the code and documentation. Feel free to try! Project: Preprint: Code:

🤖🤖🤖 Following RoboVerse, we introduce another work focused on Robotic Tactile Simulation - Taccel Simulator. Taccel is a high-performance simulation platform for vision-based tactile sensors and robots. 🚀🚀🚀 Boosted by Nvidia Warp, we optimize Taccel with highly parallelized simulations and support 900fps simulation with 4k+ parallel training envs. 🤝🤝🤝 Taccel is designed with user-friendly APIs and is easy to use. We open-sourced all the code and documentation. Feel free to try! Project: Preprint: Code:

10,668 次观看 • 1 年前

📢📢📢 Excited to share our new work *Autonomous Character-Scene Interaction Synthesis from Text Instruction* (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

📢📢📢 Excited to share our new work Autonomous Character-Scene Interaction Synthesis from Text Instruction (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

11,340 次观看 • 1 年前

没有更多内容可加载