Boyi Li's banner

Boyi Li

@Boyiliee • 2,542 subscribers

Efficient Multimodality for Embodied Intelligence

Shorts

Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)💡 Why is 3DHM unique? With 3D Control, 3DHM can animate a 𝗿𝗮𝗻𝗱𝗼𝗺 human photo with 𝗮𝗻𝘆 poses in a 𝟯𝟲𝟬-𝗱𝗲𝗴𝗿𝗲𝗲 camera view and 𝗮𝗻𝘆 camera azimuths from 𝗮𝗻𝘆 video!

Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)💡 Why is 3DHM unique? With 3D Control, 3DHM can animate a 𝗿𝗮𝗻𝗱𝗼𝗺 human photo with 𝗮𝗻𝘆 poses in a 𝟯𝟲𝟬-𝗱𝗲𝗴𝗿𝗲𝗲 camera view and 𝗮𝗻𝘆 camera azimuths from 𝗮𝗻𝘆 video!

74,471 görüntüleme

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Introducing FoundationMotion. A large-scale, video-derived motion annotation dataset & auto-labeling pipeline + advanced models for motion understanding. Fully open-source: code, datasets, and models, free to use and build on. Understanding motion is core to physical reasoning, yet today’s leading models still struggle with simple spatial actions like “turn right” or “move up” or “flip the toast” - mainly due to the lack of large, fine-grained motion datasets. We present FoundationMotion, a fully automated pipeline that: • detects & tracks objects in videos • extracts trajectories • uses LLMs + frames to generate rich motion captions & QA pairs → creating large-scale, high-quality motion datasets at scale. After fine-tuning the open-source models Qwen and NVILA on our annotations, these models now outperform the closed-source Gemini-3-Flash and GPT-5.1 on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. 📜Paper: 🌐Webpage: 💻 Code: 🕸️Model: 📊 Dataset: 👉 Interactive Demo: Let’s move research forward together. FoundationMotion is also referred to as Wolf V2 🐺, the second chapter in the Wolf series:

Introducing FoundationMotion. A large-scale, video-derived motion annotation dataset & auto-labeling pipeline + advanced models for motion understanding. Fully open-source: code, datasets, and models, free to use and build on. Understanding motion is core to physical reasoning, yet today’s leading models still struggle with simple spatial actions like “turn right” or “move up” or “flip the toast” - mainly due to the lack of large, fine-grained motion datasets. We present FoundationMotion, a fully automated pipeline that: • detects & tracks objects in videos • extracts trajectories • uses LLMs + frames to generate rich motion captions & QA pairs → creating large-scale, high-quality motion datasets at scale. After fine-tuning the open-source models Qwen and NVILA on our annotations, these models now outperform the closed-source Gemini-3-Flash and GPT-5.1 on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. 📜Paper: 🌐Webpage: 💻 Code: 🕸️Model: 📊 Dataset: 👉 Interactive Demo: Let’s move research forward together. FoundationMotion is also referred to as Wolf V2 🐺, the second chapter in the Wolf series:

66,999 görüntüleme • 7 ay önce

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! Paper: Github: Webpage: Proudly working with the great Junming (Leo) Chen, , Yossi Gandelsman, Alyosha Efros and Jitendra MALIK😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! Paper: Github: Webpage: Proudly working with the great Junming (Leo) Chen, , Yossi Gandelsman, Alyosha Efros and Jitendra MALIK😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.

52,482 görüntüleme • 1 yıl önce

🚘Excited to share LLaDA @cvpr #CVPR2024, featured in #GTC2024! LLaDA is a simple yet powerful tool that enables human drivers and autonomous vehicles alike to 𝐃𝐫𝐢𝐯𝐞 𝐄𝐯𝐞𝐫𝐲𝐰𝐡𝐞𝐫𝐞 by adapting their tasks and motion plans to traffic rules.

🚘Excited to share LLaDA @cvpr #CVPR2024, featured in #GTC2024! LLaDA is a simple yet powerful tool that enables human drivers and autonomous vehicles alike to 𝐃𝐫𝐢𝐯𝐞 𝐄𝐯𝐞𝐫𝐲𝐰𝐡𝐞𝐫𝐞 by adapting their tasks and motion plans to traffic rules.

30,582 görüntüleme • 2 yıl önce

Daha fazla içerik yok.