Video yükleniyor...
Video Yüklenemedi
Introducing TraceVLA: a fully open-source Vision-Language-Action model reimagining spatial-temporal awareness: ✨ 3.5x gains on real robots, SOTA in simulation 💡 Fine-tunes on just 150K trajectories ⚡ Compact 4B model = 7B performance
39,378 görüntüleme • 1 yıl önce •via X (Twitter)
11 Yorum

We introduce visual trace prompting:🔹Track robot's movement via point-tracking (Co-Tracker) 🔹Overlay traces on observations Model processes: 1️⃣ Original view (preserve full info) 2️⃣ View with traces as prompts A simple yet powerful technique to boost VLA's spatial understanding

TraceVLA in action: Watch it excel at diverse manipulation tasks on a real WidowX-250 robot! From soft-object handling to precision pick-and-place, TraceVLA consistently outperforms OpenVLA in both in-distribution and out-of-distribution tasks.

Superior simulation results: On Google’s SimplerEnv robot tasks, TraceVLA outshines OpenVLA across all metrics in both 7B and 4B versions! 🚀 20% boost in handling: ▪️ Camera changes ▪️ Distractors ▪️ Varied visual backgrounds

Efficient and lightweight: 🔸 TraceVLA requires <10GB memory on 8 H100 GPUs 🔸 Adds only 0.036s per timestep A powerful VLA upgrade with minimal overhead!

Available resources include: ▫️7B TraceVLA checkpoints ▫️Lightweight 4B Phi3V-OpenVLA & TraceVLA models ▫️Fine-tuned TraceVLA models 💻 Code: 🤗 Models: Try TraceVLA family models today!

Check out our project page: ArXiv: Joint work with @ruijie_zheng12 @ShuaiyiH @JianfengGao0217 @haldaume3 @Andrey__Kolobov @furongh @jw2yang4ai

Want to learn how practical AI skills and automations for your business and work? Check out our 50+ step-by-step video tutorials 100% FREE 20+ hours of Ai and Automation goodness absolutely free 🥳

Congratulations! Really interesting work on applying visual prompts on VLA tasks!

Thanks!!!

thank you very much for sharing, great idea and rather impressive results!

real-time engagement is key. we help brands connect with their audience 24/7, no burnout.
