Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Exploration is key for robots to generalize, especially in open-ended environments with vague goals and sparse rewards. BUT, how do we go beyond random poking? Wouldn't it be great to have a robot that explores an environment just like a kid? Introducing Imagine, Verify, Execute (IVE)! IVE leverages Vision-Language...

45,340 Aufrufe • vor 1 Jahr •via X (Twitter)

5 Kommentare

Profilbild von Jia-Bin Huang
Jia-Bin Huangvor 1 Jahr

Brought to you by the amazing @umdcs students Seungjae Lee @JayLEE_0301, Daniel Ekpo (@daniekpo7), Haowen Liu, and my colleagues @furongh and @abhi2610 Check out the project page for more visual results!

Profilbild von The Rundown AI
The Rundown AIvor 2 Jahren

AI won't replace you, but a person using AI will. Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).

Profilbild von Wenhu Chen
Wenhu Chenvor 1 Jahr

Great work! Congrats!

Profilbild von Jia-Bin Huang
Jia-Bin Huangvor 1 Jahr

Thanks, @WenhuChen !

Profilbild von Roei Herzig
Roei Herzigvor 1 Jahr

Very cool!

Ähnliche Videos

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

465,704 Aufrufe • vor 1 Jahr