
Boyi Li
@Boyiliee • 2,535 subscribers
Efficient Multimodality for Embodied Intelligence
Shorts
Videos

Introducing FoundationMotion. A large-scale, video-derived motion annotation dataset & auto-labeling pipeline + advanced models for motion understanding. Fully open-source: code, datasets, and models, free to use and build on. Understanding motion is core to physical reasoning, yet today’s leading models still struggle with simple spatial actions like “turn right” or “move up” or “flip the toast” - mainly due to the lack of large, fine-grained motion datasets. We present FoundationMotion, a fully automated pipeline that: • detects & tracks objects in videos • extracts trajectories • uses LLMs + frames to generate rich motion captions & QA pairs → creating large-scale, high-quality motion datasets at scale. After fine-tuning the open-source models Qwen and NVILA on our annotations, these models now outperform the closed-source Gemini-3-Flash and GPT-5.1 on spatial understanding tasks across autonomous driving, robotics, and everyday scenarios. 📜Paper: 🌐Webpage: 💻 Code: 🕸️Model: 📊 Dataset: 👉 Interactive Demo: Let’s move research forward together. FoundationMotion is also referred to as Wolf V2 🐺, the second chapter in the Wolf series:
Boyi Li66,307 görüntüleme • 5 ay önce

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality! 🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human image. #Animation #GenAI #AI #3DHM ✨ The magic of 3D control? Turning 2D pixels into lifelike, animated humans. 🎥 Check out our demo (and Merry Christmas)! Paper: Github: Webpage: Proudly working with the great Junming (Leo) Chen, , Yossi Gandelsman @CVPR, Alyosha Efros and Jitendra MALIK😃 Kindly note: This video is intended solely for research purposes and is not authorized for commercial use.
Boyi Li52,428 görüntüleme • 1 yıl önce
Daha fazla içerik yok.