Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Exploration is key for robots to generalize, especially in open-ended environments with vague goals and sparse rewards. BUT, how do we go beyond random poking? Wouldn't it be great to have a robot that explores an environment just like a kid? Introducing Imagine, Verify, Execute (IVE)! IVE leverages Vision-Language... show more

Jia-Bin Huang

79,137 subscribers

45,340 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

5 Kommentare

Profilbild von Jia-Bin Huang

Jia-Bin Huangvor 1 Jahr

Brought to you by the amazing @umdcs students Seungjae Lee @JayLEE_0301, Daniel Ekpo (@daniekpo7), Haowen Liu, and my colleagues @furongh and @abhi2610 Check out the project page for more visual results!

Profilbild von The Rundown AI

The Rundown AIvor 2 Jahren

AI won't replace you, but a person using AI will. Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).

Profilbild von Wenhu Chen

Wenhu Chenvor 1 Jahr

Great work! Congrats!

Profilbild von Jia-Bin Huang

Jia-Bin Huangvor 1 Jahr

Thanks, @WenhuChen !

Profilbild von Roei Herzig

Roei Herzigvor 1 Jahr

Very cool!

Ähnliche Videos

💡Can robots autonomously design their own tools and figure out how to use them? We present VLMgineer 🛠️, a framework that leverages Vision Language Models with Evolutionary Search to automatically generate and refine physical tool designs alongside corresponding robot action plans. ✨ VLMgineer can fully automate tool and action design with AI-driven physical creativity. No human intervention. No pre-defined templates or few-shot examples. ✨ VLMgineer outperforms human-specified designs and existing everyday tools. ✨ We let the VLM fully decide how to evolve designs. Deep dive with me: 🧵

💡Can robots autonomously design their own tools and figure out how to use them? We present VLMgineer 🛠️, a framework that leverages Vision Language Models with Evolutionary Search to automatically generate and refine physical tool designs alongside corresponding robot action plans. ✨ VLMgineer can fully automate tool and action design with AI-driven physical creativity. No human intervention. No pre-defined templates or few-shot examples. ✨ VLMgineer outperforms human-specified designs and existing everyday tools. ✨ We let the VLM fully decide how to evolve designs. Deep dive with me: 🧵

Junyao Shi

29,764 Aufrufe • vor 1 Jahr

Watters: I predict space exploration is going to be a problem for liberals and here’s why…

Watters: I predict space exploration is going to be a problem for liberals and here’s why…

Acyn

44,746 Aufrufe • vor 10 Monaten

Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled assets and open-ended language instructions 1/n

Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled assets and open-ended language instructions 1/n

Fan-Yun Sun

92,545 Aufrufe • vor 1 Jahr

I'd like to introduce what I've been working on the last few months at Hello Robot: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration. The goal is to allow researchers and developers to quickly build and deploy AI-enabled home robot applications. A thread ->

I'd like to introduce what I've been working on the last few months at Hello Robot: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration. The goal is to allow researchers and developers to quickly build and deploy AI-enabled home robot applications. A thread ->

Chris Paxton

69,271 Aufrufe • vor 1 Jahr

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. [1/N]

Max Fu

40,392 Aufrufe • vor 1 Jahr

Vision-language models can control robots, but what if the prompt is too complex for the robot to follow directly? We developed a way to get robots to “think through” complex instructions, feedback, and interjections. We call it the Hierarchical Interactive Robot (Hi Robot).

Vision-language models can control robots, but what if the prompt is too complex for the robot to follow directly? We developed a way to get robots to “think through” complex instructions, feedback, and interjections. We call it the Hierarchical Interactive Robot (Hi Robot).

Physical Intelligence

116,845 Aufrufe • vor 1 Jahr

Wow. This is a massive transformation Do you have the vision to buy a house that looks bad and imagine how it could be?

Wow. This is a massive transformation Do you have the vision to buy a house that looks bad and imagine how it could be?

Julie Chang

196,537 Aufrufe • vor 4 Monaten

Rewinding Cadence | Exploration Gameplay Exploration looks fun! I love how interactive the environment is—you can even attack NPCs and they'll fight back. The mount system looks great too. Honestly, it feels more like an MMO than a typical gacha. #RewindingCadence #归环

Rewinding Cadence | Exploration Gameplay Exploration looks fun! I love how interactive the environment is—you can even attack NPCs and they'll fight back. The mount system looks great too. Honestly, it feels more like an MMO than a typical gacha. #RewindingCadence #归环

GEMA

64,752 Aufrufe • vor 1 Jahr

From intelligent driving to robotics, from land to sky — XPENG’s exploration goes beyond technology. It’s about innovation that connects, inspires, and brings warmth to life. The future we imagine is taking shape — this is Emergence. $XPEV

From intelligent driving to robotics, from land to sky — XPENG’s exploration goes beyond technology. It’s about innovation that connects, inspires, and brings warmth to life. The future we imagine is taking shape — this is Emergence. $XPEV

XPENG

785,356 Aufrufe • vor 7 Monaten

Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.

Cooking in kitchens is fun. BUT doing it collaboratively with two robots is even more satisfying! We introduce MOSAIC, a modular framework that coordinates multiple robots to closely collaborate and cook with humans via natural language interaction and a repository of skills.

Sanjiban Choudhury

26,373 Aufrufe • vor 2 Jahren

WorldExplorer: Towards Generating Fully Navigable 3D Scenes Contributions: • We introduce the first method for generating 3D scenes from text that supports high-quality view synthesis while enabling exploration across a wide range of camera poses. • We propose an iterative scene expansion strategy using video diffusion models, driven by trajectory sampling and adaptive collision detection. • We design a scene memory mechanism that conditions each video generation step on relevant past frames, improving view consistency and overall scene coherence.

WorldExplorer: Towards Generating Fully Navigable 3D Scenes Contributions: • We introduce the first method for generating 3D scenes from text that supports high-quality view synthesis while enabling exploration across a wide range of camera poses. • We propose an iterative scene expansion strategy using video diffusion models, driven by trajectory sampling and adaptive collision detection. • We design a scene memory mechanism that conditions each video generation step on relevant past frames, improving view consistency and overall scene coherence.

MrNeRF

23,814 Aufrufe • vor 1 Jahr

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

465,704 Aufrufe • vor 1 Jahr

I think what makes IVE popular with kids is that they are the epitome of a group that just makes you happy to look at them. And I think, as the world seems to get more chaotic, more and more adults need the pure uncomplicated happiness that is IVE.

I think what makes IVE popular with kids is that they are the epitome of a group that just makes you happy to look at them. And I think, as the world seems to get more chaotic, more and more adults need the pure uncomplicated happiness that is IVE.

Cassie ୨୧

19,145 Aufrufe • vor 11 Monaten

and now its KiiiKiii turn to celebrating IVE win on music core today, looks like kiiikiii prepared a cake for ive and they camet to the stage after ive done with their encore 🥹

and now its KiiiKiii turn to celebrating IVE win on music core today, looks like kiiikiii prepared a cake for ive and they camet to the stage after ive done with their encore 🥹

안녕zz 2.0 | 𝙁𝙊𝙍𝘾𝙀 𝘽𝙇𝘼𝘾𝙆𝙃𝙊𝙇𝙀 💙

87,242 Aufrufe • vor 3 Monaten

I found a tool to extract topical graphs with Wiki entities This tool is a hammer! Great for Semantic Topical Maps and Semantic SEO Comment "SEO" & I will DM the tool to you (Must be following)

I found a tool to extract topical graphs with Wiki entities This tool is a hammer! Great for Semantic Topical Maps and Semantic SEO Comment "SEO" & I will DM the tool to you (Must be following)

Michal Barus

38,764 Aufrufe • vor 2 Jahren

Can robots leverage their entire body to sense and interact with their environment, rather than just relying on a centralized camera and end-effector? Introducing RoboPanoptes, a robot system that achieves whole-body dexterity through whole-body vision.

Can robots leverage their entire body to sense and interact with their environment, rather than just relying on a centralized camera and end-effector? Introducing RoboPanoptes, a robot system that achieves whole-body dexterity through whole-body vision.

Xiaomeng Xu

76,351 Aufrufe • vor 1 Jahr

Introducing an approach to directly ground video generation models to policy execution without needing any action labels! Our approach uses a generic goal-conditioned exploration procedure to learn a policy that works across robots / embodiments!

Introducing an approach to directly ground video generation models to policy execution without needing any action labels! Our approach uses a generic goal-conditioned exploration procedure to learn a policy that works across robots / embodiments!

Yilun Du

21,392 Aufrufe • vor 1 Jahr

Excited to share RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. We propose a novel approach to multi-robot collaboration that leverages LLMs for both high-level communication and low-level path planning. w/ Shreeya Jain, Shuran Song

Excited to share RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. We propose a novel approach to multi-robot collaboration that leverages LLMs for both high-level communication and low-level path planning. w/ Shreeya Jain, Shuran Song

Mandi Zhao

88,704 Aufrufe • vor 2 Jahren

World models hold a lot of promise for robotics, but they're data hungry and often struggle with long horizons. We learn models from a few (< 10) human demos that enable a robot to plan in completely novel scenes! Our key idea is to model *symbols* not pixels 👇

World models hold a lot of promise for robotics, but they're data hungry and often struggle with long horizons. We learn models from a few (< 10) human demos that enable a robot to plan in completely novel scenes! Our key idea is to model symbols not pixels 👇

Nishanth Kumar

82,931 Aufrufe • vor 9 Monaten

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 Aufrufe • vor 1 Jahr