Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing TraceVLA: a fully open-source Vision-Language-Action model reimagining spatial-temporal awareness: ✨ 3.5x gains on real robots, SOTA in simulation 💡 Fine-tunes on just 150K trajectories ⚡ Compact 4B model = 7B performance

Yongyuan Liang

2,821 subscribers

39,378 просмотров • 1 год назад •via X (Twitter)

Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

We introduce visual trace prompting:🔹Track robot's movement via point-tracking (Co-Tracker) 🔹Overlay traces on observations Model processes: 1️⃣ Original view (preserve full info) 2️⃣ View with traces as prompts A simple yet powerful technique to boost VLA's spatial understanding

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

TraceVLA in action: Watch it excel at diverse manipulation tasks on a real WidowX-250 robot! From soft-object handling to precision pick-and-place, TraceVLA consistently outperforms OpenVLA in both in-distribution and out-of-distribution tasks.

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Superior simulation results: On Google’s SimplerEnv robot tasks, TraceVLA outshines OpenVLA across all metrics in both 7B and 4B versions! 🚀 20% boost in handling: ▪️ Camera changes ▪️ Distractors ▪️ Varied visual backgrounds

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Efficient and lightweight: 🔸 TraceVLA requires <10GB memory on 8 H100 GPUs 🔸 Adds only 0.036s per timestep A powerful VLA upgrade with minimal overhead!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Available resources include: ▫️7B TraceVLA checkpoints ▫️Lightweight 4B Phi3V-OpenVLA & TraceVLA models ▫️Fine-tuned TraceVLA models 💻 Code: 🤗 Models: Try TraceVLA family models today!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Check out our project page: ArXiv: Joint work with @ruijie_zheng12 @ShuaiyiH @JianfengGao0217 @haldaume3 @Andrey__Kolobov @furongh @jw2yang4ai

Фото профиля Yang

Yang1 год назад

Want to learn how practical AI skills and automations for your business and work? Check out our 50+ step-by-step video tutorials 100% FREE 20+ hours of Ai and Automation goodness absolutely free 🥳

Фото профиля Mu Cai @ Industry Job Market

Mu Cai @ Industry Job Market1 год назад

Congratulations! Really interesting work on applying visual prompts on VLA tasks!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Thanks!!!

Фото профиля Dmytro Kuzmenko

Dmytro Kuzmenko1 год назад

thank you very much for sharing, great idea and rather impressive results!

Фото профиля Ray | AI marketer - Social Media Assistant

Ray | AI marketer - Social Media Assistant1 год назад

real-time engagement is key. we help brands connect with their audience 24/7, no burnout.

Похожие видео

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

Moo Jin Kim

226,922 просмотров • 2 лет назад

DynamicVLA A compact 0.4B Vision-Language-Action model that finally lets robots manipulate *moving* objects in real-time, closing the perception-execution gap with Continuous Inference and Latent-aware Action Streaming.

DynamicVLA A compact 0.4B Vision-Language-Action model that finally lets robots manipulate moving objects in real-time, closing the perception-execution gap with Continuous Inference and Latent-aware Action Streaming.

DailyPapers

16,291 просмотров • 4 месяцев назад

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

Akshay 🚀

251,941 просмотров • 9 месяцев назад

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.

Akshay 🚀

166,315 просмотров • 2 месяцев назад

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,685 просмотров • 9 месяцев назад

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

📢 First contact between a frontier model and robots! Gemini Robotics is a SOTA generalist Vision-Language-Action model bringing frontier model intelligence to the physical world. It's an extremely capable model enabling dexterous, steerable, and general robot control. 🧵⬇️

Ted Xiao

152,395 просмотров • 1 год назад

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

Seonghyeon Ye

33,232 просмотров • 1 год назад

SpatialLM just dropped on Hugging Face Large Language Model for Spatial Understanding

SpatialLM just dropped on Hugging Face Large Language Model for Spatial Understanding

AK

673,493 просмотров • 1 год назад

CMU Vision-Language-Autonomy update: The team just released SORT3D, the first general spatial relation toolbox for autonomous vision-language navigation that is fully integrated into real-robot systems! 🤖👀 Simulation and real-robot data is provided!:

CMU Vision-Language-Autonomy update: The team just released SORT3D, the first general spatial relation toolbox for autonomous vision-language navigation that is fully integrated into real-robot systems! 🤖👀 Simulation and real-robot data is provided!:

CMU Robotics Institute

18,234 просмотров • 1 год назад

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks. Learn more about how PLM can help the open source community build more capable computer vision systems. Read the research paper, and download the code and dataset:

AI at Meta

93,811 просмотров • 1 год назад

Cohere transcribe Sota open source transcription model running in the browser :) Weights on Hugging Face link below

Cohere transcribe Sota open source transcription model running in the browser :) Weights on Hugging Face link below

Nick Frosst

190,668 просмотров • 2 месяцев назад

Introducing Jan-Code-4B 💻 A compact coding model tuned for practical day-to-day tasks. Generation, refactors, debugging, tests — all runnable locally in Jan. Download Jan: Model:

Introducing Jan-Code-4B 💻 A compact coding model tuned for practical day-to-day tasks. Generation, refactors, debugging, tests — all runnable locally in Jan. Download Jan: Model:

👋 Jan

109,873 просмотров • 3 месяцев назад

No words. Just wow. LingBot-World: A playable open-source world model built on Wan2.2+Qwen3-VL-2B; - real-time interactive simulation at 720p@16fps, <1s latency; - minute-long contextual memory. - open source!

No words. Just wow. LingBot-World: A playable open-source world model built on Wan2.2+Qwen3-VL-2B; - real-time interactive simulation at 720p@16fps, <1s latency; - minute-long contextual memory. - open source!

Wildminder

36,604 просмотров • 4 месяцев назад

Sheepy-T: A fully open-source instruction-tuned language model based on GPT-J running locally on iPhone 14. Reply for beta access via TestFlight.

Sheepy-T: A fully open-source instruction-tuned language model based on GPT-J running locally on iPhone 14. Reply for beta access via TestFlight.

Kevin Kwok

102,354 просмотров • 3 лет назад

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Moo Jin Kim

84,133 просмотров • 1 год назад

OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: 💸 Free 🔒 Private 🔧 Customizable

OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: 💸 Free 🔒 Private 🔧 Customizable

dylan

21,568 просмотров • 10 месяцев назад

Microsoft just dropped MineWorld on Hugging Face a Real-Time and Open-Source Interactive World Model on Minecraft

Microsoft just dropped MineWorld on Hugging Face a Real-Time and Open-Source Interactive World Model on Minecraft

AK

95,035 просмотров • 1 год назад

Introducing Jan-v1: 4B model for web search, an open-source alternative to Perplexity Pro. In our evals, Jan v1 delivers 91% SimpleQA accuracy, slightly outperforming Perplexity Pro while running fully locally. Use cases: - Web search - Deep Research Built on the new version of Qwen's Qwen3-4B-Thinking (up to 256k context length), fine-tuned for reasoning and tool use in Jan. You can run the model in Jan, llama.cpp, or vLLM. To enable search in Jan, go to Settings → Experimental Features → On, then Settings → MCP Servers → enable a search-related MCP such as Serper. Use the model: - Jan-v1-4B: - Jan-v1-4B-GGUF: Credit to the Qwen team for Qwen3 4B Thinking & Georgi Gerganov for llama.cpp.

Introducing Jan-v1: 4B model for web search, an open-source alternative to Perplexity Pro. In our evals, Jan v1 delivers 91% SimpleQA accuracy, slightly outperforming Perplexity Pro while running fully locally. Use cases: - Web search - Deep Research Built on the new version of Qwen's Qwen3-4B-Thinking (up to 256k context length), fine-tuned for reasoning and tool use in Jan. You can run the model in Jan, llama.cpp, or vLLM. To enable search in Jan, go to Settings → Experimental Features → On, then Settings → MCP Servers → enable a search-related MCP such as Serper. Use the model: - Jan-v1-4B: - Jan-v1-4B-GGUF: Credit to the Qwen team for Qwen3 4B Thinking & Georgi Gerganov for llama.cpp.

👋 Jan

689,722 просмотров • 10 месяцев назад

We just dropped a new SoTA lipsync model on fal: Hummingbird-0 Available now as a research preview, it's the most accurate zero-shot lipsync model we’ve tested, open or closed source.

We just dropped a new SoTA lipsync model on fal: Hummingbird-0 Available now as a research preview, it's the most accurate zero-shot lipsync model we’ve tested, open or closed source.

Tavus

460,422 просмотров • 1 год назад