Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing TraceVLA: a fully open-source Vision-Language-Action model reimagining spatial-temporal awareness: ✨ 3.5x gains on real robots, SOTA in simulation 💡 Fine-tunes on just 150K trajectories ⚡ Compact 4B model = 7B performance

Yongyuan Liang

2,857 subscribers

39,500 просмотров • 1 год назад •via X (Twitter)

Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

We introduce visual trace prompting:🔹Track robot's movement via point-tracking (Co-Tracker) 🔹Overlay traces on observations Model processes: 1️⃣ Original view (preserve full info) 2️⃣ View with traces as prompts A simple yet powerful technique to boost VLA's spatial understanding

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

TraceVLA in action: Watch it excel at diverse manipulation tasks on a real WidowX-250 robot! From soft-object handling to precision pick-and-place, TraceVLA consistently outperforms OpenVLA in both in-distribution and out-of-distribution tasks.

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Superior simulation results: On Google’s SimplerEnv robot tasks, TraceVLA outshines OpenVLA across all metrics in both 7B and 4B versions! 🚀 20% boost in handling: ▪️ Camera changes ▪️ Distractors ▪️ Varied visual backgrounds

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Efficient and lightweight: 🔸 TraceVLA requires <10GB memory on 8 H100 GPUs 🔸 Adds only 0.036s per timestep A powerful VLA upgrade with minimal overhead!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Available resources include: ▫️7B TraceVLA checkpoints ▫️Lightweight 4B Phi3V-OpenVLA & TraceVLA models ▫️Fine-tuned TraceVLA models 💻 Code: 🤗 Models: Try TraceVLA family models today!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Check out our project page: ArXiv: Joint work with @ruijie_zheng12 @ShuaiyiH @JianfengGao0217 @haldaume3 @Andrey__Kolobov @furongh @jw2yang4ai

Фото профиля Yang

Yang1 год назад

Want to learn how practical AI skills and automations for your business and work? Check out our 50+ step-by-step video tutorials 100% FREE 20+ hours of Ai and Automation goodness absolutely free 🥳

Фото профиля Mu Cai @ Industry Job Market

Mu Cai @ Industry Job Market1 год назад

Congratulations! Really interesting work on applying visual prompts on VLA tasks!

Фото профиля Yongyuan Liang

Yongyuan Liang1 год назад

Thanks!!!

Фото профиля Dmytro Kuzmenko

Dmytro Kuzmenko1 год назад

thank you very much for sharing, great idea and rather impressive results!

Фото профиля Ray | AI marketer - Social Media Assistant

Ray | AI marketer - Social Media Assistant1 год назад

real-time engagement is key. we help brands connect with their audience 24/7, no burnout.

Похожие видео

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

Moo Jin Kim

226,991 просмотров • 2 лет назад

DynamicVLA A compact 0.4B Vision-Language-Action model that finally lets robots manipulate *moving* objects in real-time, closing the perception-execution gap with Continuous Inference and Latent-aware Action Streaming.

DynamicVLA A compact 0.4B Vision-Language-Action model that finally lets robots manipulate moving objects in real-time, closing the perception-execution gap with Continuous Inference and Latent-aware Action Streaming.

DailyPapers

16,357 просмотров • 5 месяцев назад

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

Akshay 🚀

252,074 просмотров • 11 месяцев назад

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Apple just released and open-sourced FastVLM! FastVLM is a lightning-fast vision-language model that combines rapid image and text understanding with efficient on-device performance. 100% Open Source

Sumanth

43,704 просмотров • 10 месяцев назад

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

🚀 First step to unlocking Generalist Robots! Introducing 🤖LAPA🤖, a new SOTA open-sourced 7B VLA pretrained without using action labels. 💪SOTA VLA trained with Open X (outperforming OpenVLA on cross and multi embodiment) 😯LAPA enables learning from human videos, unlocking potential for robotic foundation model ❗Over 30x pretraining efficiency for VLA training 🤗Code and checkpoints are all open-sourced!

Seonghyeon Ye

33,239 просмотров • 1 год назад

Molmo by Ai2 - Open source SoTA Multimodal (Vision) Language model, beating Claude 3.5 Sonnet, GPT4V and comparable to GPT4o 🔥 They release four model checkpoints: 1. MolmoE-1B, a mixture of experts model with 1B (active) 7B (total) 2. Molmo-7B-O, most open 7B model 3. Molmo-7B-D, demo model 4. Molmo-72B, best model System Architecture > Input: Multi-scale, multi-crop images generated from the original image. > Vision Encoder: OpenAI's ViT-L/14 336px CLIP model, a powerful ViT, encodes images into vision tokens. > Connector: MLP projects tokens to LLM input space, followed by pooling for dimensionality reduction. > LLM: Decoder-only Transformer, various options (OLMo, OLMoE, Qwen2, Mistral, Gemma2, Phi) with diverse scales and openness. Model Variants > Vision Encoder: Consistent ViT-L/14 CLIP model across variants. > LLM: OLMo-7B-1024, OLMoE-1B-7B-0924, Qwen2 (7B, 72B), Mistral 7B, Gemma2 9B, Phi 3 Medium, offering different capacities and openness levels. Training Strategy > Stage 1: Multimodal pre-training for caption generation with new captioning data. > Stage 2: Supervised fine-tuning on a dataset mixture, updating all parameters. > No RLHF involved, Learning rates adjusted based on component types and pre-training status. > All the weights are available on Hugging Face Hub 🤗 > Compatible with Transformers (Remote Code) Kudos Ai2 for such a brilliant and open work! 🐐 Video credits: Allen AI YT Channel

Molmo by Ai2 - Open source SoTA Multimodal (Vision) Language model, beating Claude 3.5 Sonnet, GPT4V and comparable to GPT4o 🔥 They release four model checkpoints: 1. MolmoE-1B, a mixture of experts model with 1B (active) 7B (total) 2. Molmo-7B-O, most open 7B model 3. Molmo-7B-D, demo model 4. Molmo-72B, best model System Architecture > Input: Multi-scale, multi-crop images generated from the original image. > Vision Encoder: OpenAI's ViT-L/14 336px CLIP model, a powerful ViT, encodes images into vision tokens. > Connector: MLP projects tokens to LLM input space, followed by pooling for dimensionality reduction. > LLM: Decoder-only Transformer, various options (OLMo, OLMoE, Qwen2, Mistral, Gemma2, Phi) with diverse scales and openness. Model Variants > Vision Encoder: Consistent ViT-L/14 CLIP model across variants. > LLM: OLMo-7B-1024, OLMoE-1B-7B-0924, Qwen2 (7B, 72B), Mistral 7B, Gemma2 9B, Phi 3 Medium, offering different capacities and openness levels. Training Strategy > Stage 1: Multimodal pre-training for caption generation with new captioning data. > Stage 2: Supervised fine-tuning on a dataset mixture, updating all parameters. > No RLHF involved, Learning rates adjusted based on component types and pre-training status. > All the weights are available on Hugging Face Hub 🤗 > Compatible with Transformers (Remote Code) Kudos Ai2 for such a brilliant and open work! 🐐 Video credits: Allen AI YT Channel

Vaibhav (VB) Srivastav

80,474 просмотров • 1 год назад

No words. Just wow. LingBot-World: A playable open-source world model built on Wan2.2+Qwen3-VL-2B; - real-time interactive simulation at 720p@16fps, <1s latency; - minute-long contextual memory. - open source!

No words. Just wow. LingBot-World: A playable open-source world model built on Wan2.2+Qwen3-VL-2B; - real-time interactive simulation at 720p@16fps, <1s latency; - minute-long contextual memory. - open source!

Wildminder

36,604 просмотров • 5 месяцев назад

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Introducing OFT—an Optimized Fine-Tuning recipe for VLAs! Fine-tuning OpenVLA w/ OFT, we see: -25-50x faster inference ⚡️ -SOTA 97.1% avg SR in LIBERO 💪 -high-freq control w/ 7B model on real bimanual robot -outperforms π₀, RDT-1B, DiT Policy, MDT, Diffusion Policy, ACT 🧵👇

Moo Jin Kim

84,206 просмотров • 1 год назад

Introducing Jan-v1: 4B model for web search, an open-source alternative to Perplexity Pro. In our evals, Jan v1 delivers 91% SimpleQA accuracy, slightly outperforming Perplexity Pro while running fully locally. Use cases: - Web search - Deep Research Built on the new version of Qwen's Qwen3-4B-Thinking (up to 256k context length), fine-tuned for reasoning and tool use in Jan. You can run the model in Jan, llama.cpp, or vLLM. To enable search in Jan, go to Settings → Experimental Features → On, then Settings → MCP Servers → enable a search-related MCP such as Serper. Use the model: - Jan-v1-4B: - Jan-v1-4B-GGUF: Credit to the Qwen team for Qwen3 4B Thinking & Georgi Gerganov for llama.cpp.

Introducing Jan-v1: 4B model for web search, an open-source alternative to Perplexity Pro. In our evals, Jan v1 delivers 91% SimpleQA accuracy, slightly outperforming Perplexity Pro while running fully locally. Use cases: - Web search - Deep Research Built on the new version of Qwen's Qwen3-4B-Thinking (up to 256k context length), fine-tuned for reasoning and tool use in Jan. You can run the model in Jan, llama.cpp, or vLLM. To enable search in Jan, go to Settings → Experimental Features → On, then Settings → MCP Servers → enable a search-related MCP such as Serper. Use the model: - Jan-v1-4B: - Jan-v1-4B-GGUF: Credit to the Qwen team for Qwen3 4B Thinking & Georgi Gerganov for llama.cpp.

👋 Jan

690,311 просмотров • 11 месяцев назад

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

Rohan Paul

228,836 просмотров • 1 год назад

🕹️ Is this the smallest language model in the world? I just managed to squeeze JAM, real artificial intelligence into 30 kilobytes, running on a 1979 Atari 800. Just A Model. Fully generative, deterministic language model, powered by a neural network and built to run on 8‑bit hardware. Atari Forever.

🕹️ Is this the smallest language model in the world? I just managed to squeeze JAM, real artificial intelligence into 30 kilobytes, running on a 1979 Atari 800. Just A Model. Fully generative, deterministic language model, powered by a neural network and built to run on 8‑bit hardware. Atari Forever.

Marek Spanel

113,623 просмотров • 3 месяцев назад

Microsoft just dropped VITRA-VLA, a new Vision-Language-Action model for robotics on Hugging Face. It learns dexterous manipulation from over 1 million real-life human hand activity videos.

Microsoft just dropped VITRA-VLA, a new Vision-Language-Action model for robotics on Hugging Face. It learns dexterous manipulation from over 1 million real-life human hand activity videos.

DailyPapers

19,177 просмотров • 7 месяцев назад

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

Xenova

94,356 просмотров • 4 месяцев назад

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

465,968 просмотров • 1 год назад

💥 A 450M model just beat bigger VLAs on real robot tasks, and it’s 100% open source [📍 bookmark for later] Came across SmolVLA, a new vision-language-action model for robotics that’s compact, fast, and trained entirely on open community datasets from LeRobot via Hugging Face. What stood out to me is how it matches or outperforms much larger models like ACT using noisy, real-world community data instead of giant private datasets. Why it’s worth a look ✅ 26% performance boost from pretraining on open-source data ✅ Runs on consumer hardware, even a MacBook ✅ 30% faster responses with async inference and smart architecture tweaks ✅ Strong results across Meta-World, LIBERO, SO100, and SO101 ✅ Fully open source: weights, code, training pipeline, eval stack They also introduced smart efficiency tricks like using fewer visual tokens, pulling outputs from mid-layer, and separating perception from action to make it all run fast. SmolVLA is a strong case for what can happen when the robotics community shares data and builds in the open. Definitely worth keeping an eye on.

💥 A 450M model just beat bigger VLAs on real robot tasks, and it’s 100% open source [📍 bookmark for later] Came across SmolVLA, a new vision-language-action model for robotics that’s compact, fast, and trained entirely on open community datasets from LeRobot via Hugging Face. What stood out to me is how it matches or outperforms much larger models like ACT using noisy, real-world community data instead of giant private datasets. Why it’s worth a look ✅ 26% performance boost from pretraining on open-source data ✅ Runs on consumer hardware, even a MacBook ✅ 30% faster responses with async inference and smart architecture tweaks ✅ Strong results across Meta-World, LIBERO, SO100, and SO101 ✅ Fully open source: weights, code, training pipeline, eval stack They also introduced smart efficiency tricks like using fewer visual tokens, pulling outputs from mid-layer, and separating perception from action to make it all run fast. SmolVLA is a strong case for what can happen when the robotics community shares data and builds in the open. Definitely worth keeping an eye on.

Ilir Aliu - eu/acc

17,353 просмотров • 10 месяцев назад

🔥 #ICRA2026 Best Paper Finalist The era of "robot VLA = single-arm gripper" is ending. Introducing Dexora — the first open-source Vision-Language-Action system for dual-arm, dual-hand, 36-DoF dexterous manipulation. 🦾 Dual Arms 🖐️ Dual Hands 🎯 36 DoF Control 🌍 Open Source Trained on: • 100K simulated trajectories • 10K real-world demonstrations Dexora achieves: ✓ 90%+ success on basic manipulation ✓ Strong dexterous manipulation performance ✓ Cross-embodiment generalization Our key hypothesis: Train on the hardest embodiment. Transfer to simpler robots later. Instead of scaling up gripper policies, we train directly in the most expressive action space and project downward to simpler embodiments. This may be a practical path toward universal robot controllers. 🎥 Demos: 📄 Paper:

🔥 #ICRA2026 Best Paper Finalist The era of "robot VLA = single-arm gripper" is ending. Introducing Dexora — the first open-source Vision-Language-Action system for dual-arm, dual-hand, 36-DoF dexterous manipulation. 🦾 Dual Arms 🖐️ Dual Hands 🎯 36 DoF Control 🌍 Open Source Trained on: • 100K simulated trajectories • 10K real-world demonstrations Dexora achieves: ✓ 90%+ success on basic manipulation ✓ Strong dexterous manipulation performance ✓ Cross-embodiment generalization Our key hypothesis: Train on the hardest embodiment. Transfer to simpler robots later. Instead of scaling up gripper policies, we train directly in the most expressive action space and project downward to simpler embodiments. This may be a practical path toward universal robot controllers. 🎥 Demos: 📄 Paper:

Hao Zhao

17,048 просмотров • 1 месяц назад

Robbyant just open-sourced LingBot-VLA 2.0 and it's much more than a bigger Vision-Language-Action model. Behind the 60,000-hour pretraining dataset run are several clever engineering ideas that push robots closer to reliable real-world deployment. Here are 6 things that stood out to me 👇

Robbyant just open-sourced LingBot-VLA 2.0 and it's much more than a bigger Vision-Language-Action model. Behind the 60,000-hour pretraining dataset run are several clever engineering ideas that push robots closer to reliable real-world deployment. Here are 6 things that stood out to me 👇

Md Riyazuddin

54,251 просмотров • 9 дней назад

Introducing UI-TARS-1.5, a vision-language model that beats OpenAI Operator and Claude 3.7 on GUI Agent and Game Agent tasks. We've open-sourced a small-size version model for research purposes, more details can be found in our blog. TARS learns solely from a screen, but generalizes beyond a screen! Blog: Model: App:

Introducing UI-TARS-1.5, a vision-language model that beats OpenAI Operator and Claude 3.7 on GUI Agent and Game Agent tasks. We've open-sourced a small-size version model for research purposes, more details can be found in our blog. TARS learns solely from a screen, but generalizes beyond a screen! Blog: Model: App:

Yujia Qin

85,174 просмотров • 1 год назад

Jan v1 is now more reliable when handling searches. It's an open-source alternative to Perplexity Pro, and it now avoids infinite loops with Jan v1 2509. Performance updates: - Gains on reasoning & creativity benchmarks - Small drop on SimpleQA (91.1 -> 90.7), still competitive with Perplexity Pro The updated version is live on Jan Hub. You can run the model in Jan, llama.cpp, or vLLM. Use the model: - Model: - GGUF: Shoutout to the Qwen team for Qwen3-4B Thinking and Georgi Gerganov for llama.cpp.

Jan v1 is now more reliable when handling searches. It's an open-source alternative to Perplexity Pro, and it now avoids infinite loops with Jan v1 2509. Performance updates: - Gains on reasoning & creativity benchmarks - Small drop on SimpleQA (91.1 -> 90.7), still competitive with Perplexity Pro The updated version is live on Jan Hub. You can run the model in Jan, llama.cpp, or vLLM. Use the model: - Model: - GGUF: Shoutout to the Qwen team for Qwen3-4B Thinking and Georgi Gerganov for llama.cpp.

👋 Jan

17,759 просмотров • 10 месяцев назад

🚀 Introducing AgentCPM-Explore: The First Open-Source 4B-Agent Model to Conquer GAIA & Complex Real-World Tasks! 🤗 Hugging Face: 🔗 GitHub: ✨ Key Highlights: ✅ SOTA Agentic Performance: Sets a new benchmark for 4B-scale agent models—outperforming all peers, surpassing 8B models, and rivaling select 30B+ and closed-source LLMs. 🧠 Deep Research Capability: Excels at long-horizon reasoning, supports 100+ turns of autonomous interaction with multi-source cross-validation, human-like self-correction, and dynamic tool use + strategy adaptation—just like a real researcher! 🔓 Full-Stack Open Source: We’re open-sourcing the entire end-to-end agent stack—not just the model! Empower your own innovations with - AgentRL: Asynchronous reinforcement learning framework - AgentDock: Secure, extensible tool sandbox - AgentToLeaP: An one-click evaluation platform for agent tool-learning capabilitie - Full training data pipeline & reproducible workflows #AgentCPM #OpenSourceAI #AgenticAI #AI #GAIA #LLM #OpenBMB #AIAgents #HuggingFace

🚀 Introducing AgentCPM-Explore: The First Open-Source 4B-Agent Model to Conquer GAIA & Complex Real-World Tasks! 🤗 Hugging Face: 🔗 GitHub: ✨ Key Highlights: ✅ SOTA Agentic Performance: Sets a new benchmark for 4B-scale agent models—outperforming all peers, surpassing 8B models, and rivaling select 30B+ and closed-source LLMs. 🧠 Deep Research Capability: Excels at long-horizon reasoning, supports 100+ turns of autonomous interaction with multi-source cross-validation, human-like self-correction, and dynamic tool use + strategy adaptation—just like a real researcher! 🔓 Full-Stack Open Source: We’re open-sourcing the entire end-to-end agent stack—not just the model! Empower your own innovations with - AgentRL: Asynchronous reinforcement learning framework - AgentDock: Secure, extensible tool sandbox - AgentToLeaP: An one-click evaluation platform for agent tool-learning capabilitie - Full training data pipeline & reproducible workflows #AgentCPM #OpenSourceAI #AgenticAI #AI #GAIA #LLM #OpenBMB #AIAgents #HuggingFace

OpenBMB

13,996 просмотров • 6 месяцев назад