Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Announcing a new Memory system for robots on Dimensional Robots in production generate thousands of hours of video, lidar, odometry, far too large to fit into your Agent context SpatialMemory2 builds a multimodal data store in latent space for your Agents Fully open source

stash

8,571 subscribers

228,452 Aufrufe • vor 3 Monaten •via X (Twitter)

Nachrichten & Politik Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Vibecode your robots on Dimensional. Simple demo of our physical agent stack running on the Unitree go2 quadruped. Developers can now program physical space & build dimensional applications via natural language. Fully open source.

Vibecode your robots on Dimensional. Simple demo of our physical agent stack running on the Unitree go2 quadruped. Developers can now program physical space & build dimensional applications via natural language. Fully open source.

stash

32,476 Aufrufe • vor 6 Monaten

Announcing the Dimensional Residency in Shenzhen. Deploy agents into the physical world. Real customers. Real deployments. We provide robots, $10,000 of LLM credits, free housing, office space, and customer intros to jumpstart your robotics company.

Announcing the Dimensional Residency in Shenzhen. Deploy agents into the physical world. Real customers. Real deployments. We provide robots, $10,000 of LLM credits, free housing, office space, and customer intros to jumpstart your robotics company.

stash

32,771 Aufrufe • vor 4 Monaten

Someone just put OpenClaw inside a Unitree G1 humanoid — and it walks. Same agent framework running on your phone. Now integrated with lidar, stereo + RGB cameras, understanding 3D space and temporal context. Works on drones and quadrupeds too. Fully open source. Cool work! Video Source: stash

Someone just put OpenClaw inside a Unitree G1 humanoid — and it walks. Same agent framework running on your phone. Now integrated with lidar, stereo + RGB cameras, understanding 3D space and temporal context. Works on drones and quadrupeds too. Fully open source. Cool work! Video Source: stash

Bo Wang

102,582 Aufrufe • vor 5 Monaten

3rd Place: GrokWorld uses Grok Imagine as a world model to generate synthetic training data for robots — augmenting or replacing months of manual collection in hours. Aptura AI

3rd Place: GrokWorld uses Grok Imagine as a world model to generate synthetic training data for robots — augmenting or replacing months of manual collection in hours. Aptura AI

SpaceXAI

846,237 Aufrufe • vor 6 Monaten

We’re launching Judgment Labs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

We’re launching Judgment Labs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

Alex Shan

3,567,279 Aufrufe • vor 2 Monaten

Announcing Agent Bricks: auto-optimize agents for your domain tasks. Provide a high-level description of the agent’s task, and connect your enterprise data — Agent Bricks handles the rest. Agent Bricks builds out an agent system that automatically optimizes against your goals and generates domain-specific synthetic datasets, accelerating agent development without relying on manual labeling or external sources.

Announcing Agent Bricks: auto-optimize agents for your domain tasks. Provide a high-level description of the agent’s task, and connect your enterprise data — Agent Bricks handles the rest. Agent Bricks builds out an agent system that automatically optimizes against your goals and generates domain-specific synthetic datasets, accelerating agent development without relying on manual labeling or external sources.

Databricks

47,044 Aufrufe • vor 1 Jahr

My team is hiring in multimodal representation learning. We are working on tactile representations to equip robots with a physical understanding of the world. Bored of scraping disembodied internet data and looking for a new challenge? Apply here:

My team is hiring in multimodal representation learning. We are working on tactile representations to equip robots with a physical understanding of the world. Bored of scraping disembodied internet data and looking for a new challenge? Apply here:

Mustafa Mukadam

56,704 Aufrufe • vor 2 Jahren

Your Openclaw / Agent can now control multiple robots in realtime. We’ve abstracted and standardized all hardware and control interfaces so your agent can run our spatial tool calls on ANY robot. Fleet command of humanoid, quadruped, xARM, and Piper arm in unison. Fully open source.

Your Openclaw / Agent can now control multiple robots in realtime. We’ve abstracted and standardized all hardware and control interfaces so your agent can run our spatial tool calls on ANY robot. Fleet command of humanoid, quadruped, xARM, and Piper arm in unison. Fully open source.

stash

17,190 Aufrufe • vor 4 Monaten

Excited to announce RoboCasa, a large-scale simulation framework of everyday tasks! We use generative AI tools to create diverse objects, scenes, and tasks. Simulation plays a pivotal role in our Data Pyramid for training generalist robots. Open-source at

Excited to announce RoboCasa, a large-scale simulation framework of everyday tasks! We use generative AI tools to create diverse objects, scenes, and tasks. Simulation plays a pivotal role in our Data Pyramid for training generalist robots. Open-source at

Yuke Zhu

141,486 Aufrufe • vor 2 Jahren

New short course Multimodal RAG: Chat with Videos, developed with Intel and taught by vasudevlal! In this course, you’ll work with LLaVA (Large Language and Vision Assistant), a Large Vision Language Model (LVLM) that can process both images and text. For example, given an image of a person doing a handstand on a skateboard at the beach, LLaVA doesn't just caption the scene, it’s able to predict possible outcomes, like the person losing balance or falling off. By understanding not just what's in a video frame, but what might happen next, your application can provide more insightful answers to questions about video. You'll build a full multimodal RAG pipeline that can chat about video content: - Use the BridgeTower model to create joint text-image embeddings in a 512-dimensional multimodal semantic space. - Learn video processing techniques to extract keyframes, generate transcripts using Whisper, and create captions. - Use the LanceDB vector database to store and retrieve high-dimensional multimodal embeddings. - Integrate the LLaVA model, combining CLIP's (Contrastive Language Image Pretraining) vision transformer with Llama, for advanced visual-textual reasoning. Your final system will ingest video data, generate embeddings for frames and text, perform similarity searches for relevant content, and use the retrieved multimodal context to inform LVLM-based response generation. The result is a system capable of answering nuanced questions about video content, effectively chatting about the video it has processed. Please sign up here!

New short course Multimodal RAG: Chat with Videos, developed with Intel and taught by vasudevlal! In this course, you’ll work with LLaVA (Large Language and Vision Assistant), a Large Vision Language Model (LVLM) that can process both images and text. For example, given an image of a person doing a handstand on a skateboard at the beach, LLaVA doesn't just caption the scene, it’s able to predict possible outcomes, like the person losing balance or falling off. By understanding not just what's in a video frame, but what might happen next, your application can provide more insightful answers to questions about video. You'll build a full multimodal RAG pipeline that can chat about video content: - Use the BridgeTower model to create joint text-image embeddings in a 512-dimensional multimodal semantic space. - Learn video processing techniques to extract keyframes, generate transcripts using Whisper, and create captions. - Use the LanceDB vector database to store and retrieve high-dimensional multimodal embeddings. - Integrate the LLaVA model, combining CLIP's (Contrastive Language Image Pretraining) vision transformer with Llama, for advanced visual-textual reasoning. Your final system will ingest video data, generate embeddings for frames and text, perform similarity searches for relevant content, and use the retrieved multimodal context to inform LVLM-based response generation. The result is a system capable of answering nuanced questions about video content, effectively chatting about the video it has processed. Please sign up here!

Andrew Ng

107,825 Aufrufe • vor 1 Jahr

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

New short course: LLMs as Operating Systems: Agent Memory, created with Letta, and taught by its founders Charles Packer and Sarah Wooders. An LLM's input context window has limited space. Using a longer input context also costs more and results in slower processing. So, managing what's stored in this context window is important. In the innovative paper MemGPT: Towards LLMs as Operating Systems, its authors (which include the instructors) proposed using an LLM agent to manage this context window. Their system uses a large persistent memory that stores everything that could be included in the input context, and an agent decides what is actually included. Take the example of building a chatbot that needs to remember what's been said earlier in a conversation (perhaps over many days of interaction with a user). As the conversation's length grows, the memory management agent will move information from the input context to a persistent searchable database; summarize information to keep relevant facts in the input context; and restore relevant conversation elements from further back in time. This allows a chatbot to keep what's currently most relevant in its input context memory to generate the next response. When I read the original MemGPT paper, I thought it was an innovative technique for handling memory for LLMs. The open-source Letta framework, which we'll use in this course, makes MemGPT easy to implement. It adds memory to your LLM agents and gives them transparent long-term memory. In detail, you’ll learn: - How to build an agent that can edit its own limited input context memory, using tools and multi-step reasoning - What is a memory hierarchy (an idea from computer operating systems, which use a cache to speed up memory access), and how these ideas apply to managing the LLM input context (where the input context window is a "cache" storing the most relevant information; and an agent decides what to move in and out of this to/from a larger persistent storage system) - How to implement multi-agent collaboration by letting different agents share blocks of memory This course will give you a sophisticated understanding of memory management for LLMs, which is important for chatbots having long conversations, and for complex agentic workflows. Please sign up here!

Andrew Ng

200,788 Aufrufe • vor 1 Jahr

ELON: GROK TO ORCHESTRA-CONDUCT THOUSANDS OF OPTIMUS ROBOTS BUILDING FACTORIES & REFINERIES Elon just laid out the vision: Grok will act as the "orchestra conductor" directing massive teams of Optimus robots, potentially thousands, to construct huge projects like factories or rare earth refineries. “For managing a large team of Optimus robots to build a factory or a refinery, a hypothetical rare earth refinery which we desperately need in America, you start asking who’s going to manage all of that. That’s where AI comes in.” Source: Tesla

ELON: GROK TO ORCHESTRA-CONDUCT THOUSANDS OF OPTIMUS ROBOTS BUILDING FACTORIES & REFINERIES Elon just laid out the vision: Grok will act as the "orchestra conductor" directing massive teams of Optimus robots, potentially thousands, to construct huge projects like factories or rare earth refineries. “For managing a large team of Optimus robots to build a factory or a refinery, a hypothetical rare earth refinery which we desperately need in America, you start asking who’s going to manage all of that. That’s where AI comes in.” Source: Tesla

Mario Nawfal

87,502 Aufrufe • vor 6 Monaten

Can robots self-improve by collecting data autonomously🤖? Introducing SOAR: a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions .

Can robots self-improve by collecting data autonomously🤖? Introducing SOAR: a system for large-scale autonomous data collection 🚀 and autonomous improvement📈of a multi-task language-conditioned policy in diverse scenes without human interventions .

Paul Zhou

47,667 Aufrufe • vor 2 Jahren

What if robots could think longer on harder problems without saying a single word?🤔 We introduce RD-VLA (Recurrent-Depth VLA): a latent, iterative reasoning architecture for robot control. ❌No Chain-of-Thought tokens. ❌No extra memory overhead. ✅Just reasoning—directly in latent space. 🧠🤖 Project page: 👇🧵

What if robots could think longer on harder problems without saying a single word?🤔 We introduce RD-VLA (Recurrent-Depth VLA): a latent, iterative reasoning architecture for robot control. ❌No Chain-of-Thought tokens. ❌No extra memory overhead. ✅Just reasoning—directly in latent space. 🧠🤖 Project page: 👇🧵

Jiafei Duan

13,888 Aufrufe • vor 5 Monaten

This is Repo Prompt 1.5 The new Context Builder connects to your agent of choice, using your existing subscriptions, to fully automate the process of building the perfect context for a given token budget. All of Repo Prompt's power, fully automated, with 1 click

This is Repo Prompt 1.5 The new Context Builder connects to your agent of choice, using your existing subscriptions, to fully automate the process of building the perfect context for a given token budget. All of Repo Prompt's power, fully automated, with 1 click

eric provencher

195,235 Aufrufe • vor 9 Monaten

This is the most insane application of MCP I've used. Context is king! Your coding agents thrive on high-quality context. Qodo Aware is a production-ready deep research agent that solves this. This is huge! Watch how I use it in Codex to retrieve context from large codebases.

This is the most insane application of MCP I've used. Context is king! Your coding agents thrive on high-quality context. Qodo Aware is a production-ready deep research agent that solves this. This is huge! Watch how I use it in Codex to retrieve context from large codebases.

elvis

80,225 Aufrufe • vor 10 Monaten

🔗 Announcing LangChain OSS Skills LangChain has the most popular frameworks for building AI agents — and now your coding agent can be an expert in it. We're excited to release the first iteration of LangChain OSS Skills, giving your agent expertise in our open source frameworks. The skills include guidance on how to use langchain, langgraph, and deepagents to effectively build agents. ➡️ Install our OSS skills for your coding agent here: ➡️ Read more:

🔗 Announcing LangChain OSS Skills LangChain has the most popular frameworks for building AI agents — and now your coding agent can be an expert in it. We're excited to release the first iteration of LangChain OSS Skills, giving your agent expertise in our open source frameworks. The skills include guidance on how to use langchain, langgraph, and deepagents to effectively build agents. ➡️ Install our OSS skills for your coding agent here: ➡️ Read more:

LangChain OSS

65,840 Aufrufe • vor 5 Monaten

Because of this video, this may be the single thread that allows modern robots in 2040 to weave a basket. I have encoded this into AI models and if the wind is at my back, it will be in your robot open source.

Because of this video, this may be the single thread that allows modern robots in 2040 to weave a basket. I have encoded this into AI models and if the wind is at my back, it will be in your robot open source.

Brian Roemmele

47,966 Aufrufe • vor 7 Monaten

A new paper by Disney Research introduces a 2-stage technique for motion tracking in physics-based character animation, using a pretrained latent space and RL to control diverse and unseen motions on both virtual and real-world humanoid robots. Paper:

A new paper by Disney Research introduces a 2-stage technique for motion tracking in physics-based character animation, using a pretrained latent space and RL to control diverse and unseen motions on both virtual and real-world humanoid robots. Paper:

The Humanoid Hub

20,083 Aufrufe • vor 1 Jahr