正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Reasoning is central to purposeful action. Today we introduce MolmoAct — a fully open Action Reasoning Model (ARM) for robotics. Grounded in large-scale pre-training with action reasoning data, every predicted action is interpretable and user-steerable via visual trace. We are open-sourcing everything!

Jiafei Duan

6,199 subscribers

99,944 次观看 • 10 个月前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵

🤖✨ What if models that take action in the physical world could think through your instructions? Meet MolmoAct, our new fully open Action Reasoning Model (ARM) that does just that. 🧵

Ai2

180,346 次观看 • 10 个月前

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Most capable generalist robotics models today are closed or at best, open weights. But robotics won’t reach its ChatGPT moment without real openness. That GPT moment was built on years of open tools and datasets such as Python, PyTorch, ImageNet and more, that let researchers inspect, reproduce, and build. Today, we’re introducing MolmoAct 2: a fully open-source action reasoning model for real-world robotics. We rethought and reshaped everything! 🧵👇

Jiafei Duan

105,282 次观看 • 1 个月前

Today we’re releasing K2 Think V2, our most capable open-source reasoning model to date. This is a fully sovereign model: trained end-to-end on IFM-curated and synthesized data, with complete transparency from pre-training through final reasoning alignment.

Today we’re releasing K2 Think V2, our most capable open-source reasoning model to date. This is a fully sovereign model: trained end-to-end on IFM-curated and synthesized data, with complete transparency from pre-training through final reasoning alignment.

MBZUAI

287,725 次观看 • 5 个月前

Introducing Cosmos 3: Our latest frontier model for Physical AI Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation. Today we’re releasing Super (32B) and Nano (8B) variants.

Introducing Cosmos 3: Our latest frontier model for Physical AI Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation. Today we’re releasing Super (32B) and Nano (8B) variants.

NVIDIA AI

421,345 次观看 • 29 天前

#AGIBOTAIWeek Day 3: Introducing GO-2: the next-generation foundation model for embodied AI, built to unify reasoning and action. To truly bridge “thinking” and “doing,” embodied AI must solve two challenges at once: • It generate executable action plans through deep spatial reasoning • It deliver stable execution in real-world environments GO-2 tackles both with a comprehensive architecture: Action Chain-of-Thought for action reasoning, and an Asynchronous Dual-System for robust execution. #AGIBOT #AGIBOTAIWeek #Foundation #model #EmbodiedAI Official account tags X: AGIBOT LinkedIn: agibot

#AGIBOTAIWeek Day 3: Introducing GO-2: the next-generation foundation model for embodied AI, built to unify reasoning and action. To truly bridge “thinking” and “doing,” embodied AI must solve two challenges at once: • It generate executable action plans through deep spatial reasoning • It deliver stable execution in real-world environments GO-2 tackles both with a comprehensive architecture: Action Chain-of-Thought for action reasoning, and an Asynchronous Dual-System for robust execution. #AGIBOT #AGIBOTAIWeek #Foundation #model #EmbodiedAI Official account tags X: AGIBOT LinkedIn: agibot

Hasan Toor

36,181 次观看 • 2 个月前

Google DeepMind introduced two foundational models for embodied reasoning, enabling robots to comprehend, react, and take action in the physical world: ⦿ Gemini Robotics – built on Gemini 2.0. Integrates vision, language, and action for real-world dexterity, . ⦿ Gemini Robotics-ER – Enhances spatial reasoning for advanced robotic control. They are working with Apptronik to develop the next generation of humanoid robots.

Google DeepMind introduced two foundational models for embodied reasoning, enabling robots to comprehend, react, and take action in the physical world: ⦿ Gemini Robotics – built on Gemini 2.0. Integrates vision, language, and action for real-world dexterity, . ⦿ Gemini Robotics-ER – Enhances spatial reasoning for advanced robotic control. They are working with Apptronik to develop the next generation of humanoid robots.

The Humanoid Hub

73,097 次观看 • 1 年前

MolmoAct 2 builds on MolmoAct, our first Action Reasoning Model (ARM). Like its predecessor, MolmoAct 2 reasons about the world in 3D before taking actions. It now runs up to 37x faster & handles two-armed tasks out of the box without per-task fine-tuning.

MolmoAct 2 builds on MolmoAct, our first Action Reasoning Model (ARM). Like its predecessor, MolmoAct 2 reasons about the world in 3D before taking actions. It now runs up to 37x faster & handles two-armed tasks out of the box without per-task fine-tuning.

Ai2

25,563 次观看 • 1 个月前

NVIDIA Cosmos 3 launches with a full model family — Cosmos Super for highest-accuracy robotics and AV post-training, Cosmos Nano for high-speed video and action reasoning, and Cosmos Edge for real-time edge inference. Read the release ➡️

NVIDIA Cosmos 3 launches with a full model family — Cosmos Super for highest-accuracy robotics and AV post-training, Cosmos Nano for high-speed video and action reasoning, and Cosmos Edge for real-time edge inference. Read the release ➡️

NVIDIA Newsroom

46,581 次观看 • 29 天前

The next frontier of autonomous driving is unlocked by reasoning models. NVIDIA Alpamayo brings together open AI models with reasoning capabilities, closed-loop simulation tools, and massive real-world driving datasets. Alpamayo 1 is a vision–language–action model that explains its own decisions through explicit reasoning traces, enabling trustworthy, humanlike decision-making. Together with NVIDIA’s Physical AI dataset and AlpaSim simulation, Alpamayo provides the tools and scale required to enable level 4 autonomous vehicles. ▶️ Watch now:

The next frontier of autonomous driving is unlocked by reasoning models. NVIDIA Alpamayo brings together open AI models with reasoning capabilities, closed-loop simulation tools, and massive real-world driving datasets. Alpamayo 1 is a vision–language–action model that explains its own decisions through explicit reasoning traces, enabling trustworthy, humanlike decision-making. Together with NVIDIA’s Physical AI dataset and AlpaSim simulation, Alpamayo provides the tools and scale required to enable level 4 autonomous vehicles. ▶️ Watch now:

NVIDIA DRIVE

35,324 次观看 • 5 个月前

Introducing Huntress's Tale, an action fantasy short film. It was made entirely with Seedance 2.0 in 4K. We are open-sourcing every keyframe and prompt.

Introducing Huntress's Tale, an action fantasy short film. It was made entirely with Seedance 2.0 in 4K. We are open-sourcing every keyframe and prompt.

Higgsfield AI 🧩

35,485 次观看 • 4 天前

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group Zhuoyuan Wu! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute. Everything is public, including training data. 📄 Paper: 🌐 Project: 💻 Code: 🤗 Robot data: 🤗 Human data: 🤗 Weights: #Robotics #WorldModels #AI #OpenSource

1/5 🚀 Thrilled to open-source OSCAR 🤖 — an action-conditioned world model for robotics, led by the visiting student in my group Zhuoyuan Wu! It generalizes across different robot embodiments with precise action controllability. All trained on a single GH200 GPU, and outperforms existing open-sourced baselines, which have larger model capacity and need more compute. Everything is public, including training data. 📄 Paper: 🌐 Project: 💻 Code: 🤗 Robot data: 🤗 Human data: 🤗 Weights: #Robotics #WorldModels #AI #OpenSource

Jun Gao

103,300 次观看 • 20 天前

Today, we are introducing RFM-1, our Robotics Foundation Model giving robots human-like reasoning capabilities.

Today, we are introducing RFM-1, our Robotics Foundation Model giving robots human-like reasoning capabilities.

Covariant

118,306 次观看 • 2 年前

Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning.

Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning.

Jeff Dean

105,751 次观看 • 1 年前

Open AI “make flappy bird” 1 year difference o3 mini ( a model for coding tasks and reasoning) Vs GPT 5.4 thinking a general reasoning model I don’t think we hit a wall..

Open AI “make flappy bird” 1 year difference o3 mini ( a model for coding tasks and reasoning) Vs GPT 5.4 thinking a general reasoning model I don’t think we hit a wall..

Chris

249,664 次观看 • 3 个月前

We are pleased to introduce our sixth and newest Prize category, Climate Action. Nominate an organisation that is providing an innovative solution that promotes climate action for a cleaner planet today and help them scale their impact. #ZayedSustainabilityPrize #andCounting

We are pleased to introduce our sixth and newest Prize category, Climate Action. Nominate an organisation that is providing an innovative solution that promotes climate action for a cleaner planet today and help them scale their impact. #ZayedSustainabilityPrize #andCounting

Zayed Sustainability Prize

808,444 次观看 • 3 年前

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

Want to see our open models in action? Watch how gpt-oss builds a video game—using tools step-by-step within chain-of-thought reasoning 👾🍓

OpenAI

488,957 次观看 • 10 个月前

$AI agents are about to redefine the internet. The mistake we made with Large Language Models? We let a handful of corporations capture all the value. Action Model is building a different path. By training through our extension, users gain fractional ownership in the Large Action Model, giving them a real stake in the future of AI. When LLMs emerged, the upside flowed to Big Tech. This time, it doesn’t have to. They’re building AI on our data, and keeping the upside for themselves. Community-owned Large Action Model is how we take it back.$

AI agents are about to redefine the internet. The mistake we made with Large Language Models? We let a handful of corporations capture all the value. Action Model is building a different path. By training through our extension, users gain fractional ownership in the Large Action Model, giving them a real stake in the future of AI. When LLMs emerged, the upside flowed to Big Tech. This time, it doesn’t have to. They’re building AI on our data, and keeping the upside for themselves. Community-owned Large Action Model is how we take it back.

Action Model

76,623 次观看 • 4 个月前

One core bottleneck in VLA models is action representation. Discrete tokens scale beautifully with VLM pretraining—but lose precision. Continuous actions are precise—but often break VLM reasoning. In our new work, we resolve this tension at the representation level. 🧵

One core bottleneck in VLA models is action representation. Discrete tokens scale beautifully with VLM pretraining—but lose precision. Continuous actions are precise—but often break VLM reasoning. In our new work, we resolve this tension at the representation level. 🧵

Jianlan Luo

47,954 次观看 • 6 个月前

We introduce HumanOmniV2, an omni-modal model designed to address two core problems in multimodal reasoning: insufficient global context understanding and the shortcut problem. By analyzing visual, auditory, and textual signals, the model performs deep reasoning on complex human intentions, emotions, and social interactions.

We introduce HumanOmniV2, an omni-modal model designed to address two core problems in multimodal reasoning: insufficient global context understanding and the shortcut problem. By analyzing visual, auditory, and textual signals, the model performs deep reasoning on complex human intentions, emotions, and social interactions.

Tongyi Lab

1,221,672 次观看 • 11 个月前

NEWS: NVIDIA just announced Alpamayo, what CEO Jensen Huang calls the world’s first thinking, reasoning autonomous vehicle AI, launching on U.S. roads later this year, starting with the Mercedes CLA. Jensen: "It's trained end-to-end. Literally from camera in to actuation out; It reasons what action it is about to take, the reason by which is came about that action, and the trajectory." Alpamayo introduces Vision-Language-Action (VLA) models, which enable self-driving systems to interpret what they see, reason about complex driving scenarios, and generate driving actions. The platform includes large reasoning models, simulation tools for testing rare and edge-case scenarios, and open datasets for training and validation. NVIDIA says the approach improves transparency, safety, and robustness in autonomous systems, particularly in complex real-world environments, and supports progress toward higher levels of vehicle autonomy: "With a 10-billion-parameter architecture, Alpamayo 1 uses video input to generate trajectories alongside reasoning traces, showing the logic behind each decision. Developers can adapt Alpamayo 1 into smaller runtime models for vehicle development, or use it as a foundation for AV development tools such as reasoning-based evaluators and auto-labeling systems. Alpamayo 1 provides open model weights and open-source inferencing scripts. Future models in the family will feature larger parameter counts, more detailed reasoning capabilities, more input and output flexibility, and options for commercial usage."

NEWS: NVIDIA just announced Alpamayo, what CEO Jensen Huang calls the world’s first thinking, reasoning autonomous vehicle AI, launching on U.S. roads later this year, starting with the Mercedes CLA. Jensen: "It's trained end-to-end. Literally from camera in to actuation out; It reasons what action it is about to take, the reason by which is came about that action, and the trajectory." Alpamayo introduces Vision-Language-Action (VLA) models, which enable self-driving systems to interpret what they see, reason about complex driving scenarios, and generate driving actions. The platform includes large reasoning models, simulation tools for testing rare and edge-case scenarios, and open datasets for training and validation. NVIDIA says the approach improves transparency, safety, and robustness in autonomous systems, particularly in complex real-world environments, and supports progress toward higher levels of vehicle autonomy: "With a 10-billion-parameter architecture, Alpamayo 1 uses video input to generate trajectories alongside reasoning traces, showing the logic behind each decision. Developers can adapt Alpamayo 1 into smaller runtime models for vehicle development, or use it as a foundation for AV development tools such as reasoning-based evaluators and auto-labeling systems. Alpamayo 1 provides open model weights and open-source inferencing scripts. Future models in the family will feature larger parameter counts, more detailed reasoning capabilities, more input and output flexibility, and options for commercial usage."

Sawyer Merritt

1,603,163 次观看 • 5 个月前