Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

We’ve seen humanoid robots walk around for a while, but when will they actually help with useful tasks in daily life? The challenge here is the diversity and complexity of real-world scenes. Our new work tackles this problem via 3D visuomotor policy learning. Using data from only 1 scene,... show more

Yanjie Ze

5,040 subscribers

75,194 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Kunst Bildung

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Elon Musk: Current humanoid robots are sort of gimmicks. Tesla will make the first actually useful humanoid robots.

Elon Musk: Current humanoid robots are sort of gimmicks. Tesla will make the first actually useful humanoid robots.

ELON CLIPS

20,003 Aufrufe • vor 3 Monaten

Static 3D generation isn't enough. We need assets ready for animation. Our new #SIGGRAPH work, AniGen, takes a single image and generates the 3D shape, skeleton, and skinning weights all at once. Code is fully open-sourced! Kudos to Yihua and VAST AI Research 🧵(1/4)

Static 3D generation isn't enough. We need assets ready for animation. Our new #SIGGRAPH work, AniGen, takes a single image and generates the 3D shape, skeleton, and skinning weights all at once. Code is fully open-sourced! Kudos to Yihua and VAST AI Research 🧵(1/4)

Yanpei Cao

144,220 Aufrufe • vor 2 Monaten

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.

MrNeRF

17,037 Aufrufe • vor 1 Jahr

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: Paper link:

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: Paper link:

Mike Shou

87,622 Aufrufe • vor 6 Monaten

We are excited to announce our company to the world - We are working on affordable humanoid robots for the world, designed and made in India.

We are excited to announce our company to the world - We are working on affordable humanoid robots for the world, designed and made in India.

Cosine Robots

30,174 Aufrufe • vor 2 Jahren

The two humanoid robots (Niya) at the Zhongguancun Forum Information Desk are in a dormant state. When the meeting starts, they will continue to receive participants. This humanoid robot with humanoid design is specially made for consultation, reception, welcoming and social companionship. It is driven by AI and provides information services and emotional value.

The two humanoid robots (Niya) at the Zhongguancun Forum Information Desk are in a dormant state. When the meeting starts, they will continue to receive participants. This humanoid robot with humanoid design is specially made for consultation, reception, welcoming and social companionship. It is driven by AI and provides information services and emotional value.

CyberRobo

34,137 Aufrufe • vor 1 Jahr

Humanoid's humanoid robots are undergoing training at Schaeffler, Siemens, and Ford factories. By 2026, the value of humanoid robots will no longer be judged by data or models, but by their real-world reliability and usefulness at the customer's site.

Humanoid's humanoid robots are undergoing training at Schaeffler, Siemens, and Ford factories. By 2026, the value of humanoid robots will no longer be judged by data or models, but by their real-world reliability and usefulness at the customer's site.

CyberRobo

16,225 Aufrufe • vor 5 Monaten

NVIDIA DROPPED A MOTION DIFFUSION MODEL FOR HUMANOID ROBOTS trained on 700 hours of mocap data kimodo generates high-quality 3D human and robot motions from text prompts you control it with: → full-body pose keyframes → end-effector positions/rotations → 2D paths and waypoints works on human skeletons and unitree G1 robot plug the outputs directly into mujoco or retarget to other robots using GMR has a web-based interactive demo with a timeline editor. runs locally needs ~17GB VRAM to run inference open source under apache 2.0

NVIDIA DROPPED A MOTION DIFFUSION MODEL FOR HUMANOID ROBOTS trained on 700 hours of mocap data kimodo generates high-quality 3D human and robot motions from text prompts you control it with: → full-body pose keyframes → end-effector positions/rotations → 2D paths and waypoints works on human skeletons and unitree G1 robot plug the outputs directly into mujoco or retarget to other robots using GMR has a web-based interactive demo with a timeline editor. runs locally needs ~17GB VRAM to run inference open source under apache 2.0

Vaishnavi

17,520 Aufrufe • vor 2 Monaten

Humanoid teleoperation is difficult to scale A new paper explores a more scalable data source: first-person videos of humans doing tasks with their bare hands. The human-humanoid behavior policy unifies the state-action space, enabling effective retargeting to robot actions.

Humanoid teleoperation is difficult to scale A new paper explores a more scalable data source: first-person videos of humans doing tasks with their bare hands. The human-humanoid behavior policy unifies the state-action space, enabling effective retargeting to robot actions.

The Humanoid Hub

21,660 Aufrufe • vor 1 Jahr

NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the Unitree H2 humanoid robot, Sharpa Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement:

NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the Unitree H2 humanoid robot, Sharpa Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement:

NVIDIA Robotics

163,386 Aufrufe • vor 25 Tagen

1/ NVIDIA just open-sourced Cosmos 3 at GTC Taipei! It's the first fully open "omnimodel" for physical AI - one model that understands the real world, predicts what happens next, and generates the actions a robot should take. Weights, code, datasets. All open. And this is really big. Lets dig into everything: 🧵

1/ NVIDIA just open-sourced Cosmos 3 at GTC Taipei! It's the first fully open "omnimodel" for physical AI - one model that understands the real world, predicts what happens next, and generates the actions a robot should take. Weights, code, datasets. All open. And this is really big. Lets dig into everything: 🧵

Chubby♨️

18,067 Aufrufe • vor 25 Tagen

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !

Ilir Aliu - eu/acc

22,843 Aufrufe • vor 1 Jahr

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

Jonathan Stephens

17,712 Aufrufe • vor 5 Monaten

Learning bimanual, contact-rich robot manipulation policies that generalize over diverse objects has long been a challenge. Excited to share our work: Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation! 🧵1/n

Learning bimanual, contact-rich robot manipulation policies that generalize over diverse objects has long been a challenge. Excited to share our work: Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation! 🧵1/n

Xuanlin Li (Simon)

20,877 Aufrufe • vor 1 Jahr

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

Animesh Garg

24,662 Aufrufe • vor 1 Jahr

Space objects in 3D NASA created 3D models of stars and supernova remnants by combining data from the Chandra X-ray Observatory with computer calculations. These models can not only be viewed on the screen, but also printed on a 3D printer. Here they are, from left to right — the supernova remnant Cassiopeia A, the young star BP Tauri, the planetary nebula Cygnus Loop, and the supernova remnant G292.0+1.8.

Space objects in 3D NASA created 3D models of stars and supernova remnants by combining data from the Chandra X-ray Observatory with computer calculations. These models can not only be viewed on the screen, but also printed on a 3D printer. Here they are, from left to right — the supernova remnant Cassiopeia A, the young star BP Tauri, the planetary nebula Cygnus Loop, and the supernova remnant G292.0+1.8.

Black Hole

13,055 Aufrufe • vor 1 Jahr

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share PhysTwin: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines Gaussian splatting with inverse dynamics optimization based on simple spring-mass systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

Yunzhu Li

25,279 Aufrufe • vor 1 Jahr

HOPEJr is an open-source, DIY humanoid robot built by Hugging Face and The Robot Studio. It features 3D-printed parts, dexterous hands, and costs under $3,000. Here's the first fully assembled HOPEJr. Blueprint:

HOPEJr is an open-source, DIY humanoid robot built by Hugging Face and The Robot Studio. It features 3D-printed parts, dexterous hands, and costs under $3,000. Here's the first fully assembled HOPEJr. Blueprint:

The Humanoid Hub

34,380 Aufrufe • vor 1 Jahr

Can we leverage VLMs for robot manipulation in the open world? Checkout our new work MOKA, a simple and effective visual prompting method!

Can we leverage VLMs for robot manipulation in the open world? Checkout our new work MOKA, a simple and effective visual prompting method!

Fangchen Liu

81,140 Aufrufe • vor 2 Jahren