Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Converting video models ➡️ ego-centric world models capable of humanoids manipulating objects, robots navigating through paintings, and beyond! Huge effort led by Anurag Bagchi over a year! Special thanks to 1X for releasing so much data!

Homanga Bharadhwaj

3,014 subscribers

10,478 Aufrufe • vor 4 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Humanoid's humanoid robots are undergoing training at Schaeffler, Siemens, and Ford factories. By 2026, the value of humanoid robots will no longer be judged by data or models, but by their real-world reliability and usefulness at the customer's site.

Humanoid's humanoid robots are undergoing training at Schaeffler, Siemens, and Ford factories. By 2026, the value of humanoid robots will no longer be judged by data or models, but by their real-world reliability and usefulness at the customer's site.

CyberRobo

16,225 Aufrufe • vor 5 Monaten

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: Paper link:

(1/6) X-Humanoid 🤖: Scaling up data for Humanoid Robots. We convert human daily activity videos (from Ego-Exo4D) into humanoid videos (i.e., Tesla Optimus) performing tasks like cooking or fixing a bike. This data can be potentially used to train robot policies and world models. 🔥 Project page: Paper link:

Mike Shou

87,622 Aufrufe • vor 6 Monaten

NEW RELEASE: Today we're releasing CortexMAE: a family of fMRI foundation models trained on 2.1K hours of open fMRI data. We're also releasing Brainmarks: an open benchmark suite for evaluating fMRI foundation models. Full paper is on arXiv (accepted to ICML 2026) A thread:

NEW RELEASE: Today we're releasing CortexMAE: a family of fMRI foundation models trained on 2.1K hours of open fMRI data. We're also releasing Brainmarks: an open benchmark suite for evaluating fMRI foundation models. Full paper is on arXiv (accepted to ICML 2026) A thread:

Sophont

40,600 Aufrufe • vor 25 Tagen

Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents.

Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents.

xAI

8,620,279 Aufrufe • vor 4 Tagen

🚀 Excited to share our #ICLR2025 work on planning with neural dynamics models! While our lab has developed diverse neural dynamics models for manipulating rigid, deformable, and granular objects, having the model alone doesn’t solve the problem—planning with it remains a challenge. 💡 Enter BaB-ND, led by Keyi and Jiangwei! We propose a scalable, GPU-accelerated branch-and-bound algorithm, inspired by neural network verification, to enable effective planning for diverse objects modeled with neural dynamics. 🔗 Project page (open-source + detailed docs!): 🎥 Watch the video to see T being pushed around obstacles, and check out Keyi’s thread for more details!

🚀 Excited to share our #ICLR2025 work on planning with neural dynamics models! While our lab has developed diverse neural dynamics models for manipulating rigid, deformable, and granular objects, having the model alone doesn’t solve the problem—planning with it remains a challenge. 💡 Enter BaB-ND, led by Keyi and Jiangwei! We propose a scalable, GPU-accelerated branch-and-bound algorithm, inspired by neural network verification, to enable effective planning for diverse objects modeled with neural dynamics. 🔗 Project page (open-source + detailed docs!): 🎥 Watch the video to see T being pushed around obstacles, and check out Keyi’s thread for more details!

Yunzhu Li

10,561 Aufrufe • vor 1 Jahr

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !

Ilir Aliu - eu/acc

22,843 Aufrufe • vor 1 Jahr

Latest models and animations for STEAL THE BRAINROT 🔥 Huge thanks to the GOAT FeRinS for the opportunity! #UEFN

Latest models and animations for STEAL THE BRAINROT 🔥 Huge thanks to the GOAT FeRinS for the opportunity! #UEFN

Nasty

15,008 Aufrufe • vor 6 Monaten

This pack of 3D models is releasing tomorrow! Comes with walls, objects, animated characters, wheelchair, animals, vehicles and more. Guess the price? 👀

This pack of 3D models is releasing tomorrow! Comes with walls, objects, animated characters, wheelchair, animals, vehicles and more. Guess the price? 👀

Kenney

37,940 Aufrufe • vor 1 Jahr

These models will learn from a huge set of extremely diverse data from the Tesla fleet

These models will learn from a huge set of extremely diverse data from the Tesla fleet

Tesla AI

196,273 Aufrufe • vor 3 Jahren

Here's a video of Bertie running, and a few other photos worth sharing. Thanks for reading through this thread, I want to do more of this where I explain all the work that goes into these models.

Here's a video of Bertie running, and a few other photos worth sharing. Thanks for reading through this thread, I want to do more of this where I explain all the work that goes into these models.

Covey

19,221 Aufrufe • vor 1 Jahr

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: (Thanks to @gmicloud for the support!) 🔗 Blog: 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: (Thanks to @gmicloud for the support!) 🔗 Blog: 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

Hao AI Lab

78,660 Aufrufe • vor 10 Monaten

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

Animesh Garg

24,662 Aufrufe • vor 1 Jahr

Some random thoughts reading through the new RobbyAnt VLA paper: - 20,000 hours of data across 9 robots!! - damn, Chinese companies are going to trivially outscale the American ones on real robot data - It's really cool they train on depth; it means they can handle transparent objects really well for example - You can never tell how good these models are without trying them, since everyone trains on different robots, but from the results they show it does seem to clean up - cross embodiment scaling laws are really cool - if you have lots of robots do you need human video??

Some random thoughts reading through the new RobbyAnt VLA paper: - 20,000 hours of data across 9 robots!! - damn, Chinese companies are going to trivially outscale the American ones on real robot data - It's really cool they train on depth; it means they can handle transparent objects really well for example - You can never tell how good these models are without trying them, since everyone trains on different robots, but from the results they show it does seem to clean up - cross embodiment scaling laws are really cool - if you have lots of robots do you need human video??

Chris Paxton

19,915 Aufrufe • vor 4 Monaten

Artificial Intelligence (#AI) models that process and interpret visual data are called large vision models. These models, like large language models, are trained on large amounts of data. #midjourney #dalle #stablediffusion (1/8) 🧵 ⤵️

Artificial Intelligence (#AI) models that process and interpret visual data are called large vision models. These models, like large language models, are trained on large amounts of data. #midjourney #dalle #stablediffusion (1/8) 🧵 ⤵️

Nuklai

13,614 Aufrufe • vor 2 Jahren

So our 45-person team developed entirely new AI capabilities, enabling them to: 🎨 Fine-tune custom Veo and Imagen models on their paintings and artwork 📹 Provide a desired look through rough animations, which the models transformed into stylized videos 🎭 Edit specific regions without regenerating entire shots from scratch

So our 45-person team developed entirely new AI capabilities, enabling them to: 🎨 Fine-tune custom Veo and Imagen models on their paintings and artwork 📹 Provide a desired look through rough animations, which the models transformed into stylized videos 🎭 Edit specific regions without regenerating entire shots from scratch

Google DeepMind

93,938 Aufrufe • vor 4 Monaten

Robots are learning from data like this, at massive scale. Better models start with better data. Better data needs humans scoring it. Coming soon.

Robots are learning from data like this, at massive scale. Better models start with better data. Better data needs humans scoring it. Coming soon.

PrismaX

26,741 Aufrufe • vor 13 Tagen

With so many AI models out there, which ones will last? Check out this 17-second clip to see how Grass’s Live Context Retrieval (LCR) engine can give models a unique edge through access to real-time, unbiased web data.

With so many AI models out there, which ones will last? Check out this 17-second clip to see how Grass’s Live Context Retrieval (LCR) engine can give models a unique edge through access to real-time, unbiased web data.

Grass

347,793 Aufrufe • vor 1 Jahr

happy Good Friday.. can we get more underwear models like this all over the world? not asking for much really

happy Good Friday.. can we get more underwear models like this all over the world? not asking for much really

Muscle Gallery

21,004 Aufrufe • vor 3 Jahren

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share **PhysTwin**: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines **Gaussian splatting** with **inverse dynamics optimization** based on simple **spring-mass** systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

📢 Our lab has been exploring 3D world models for years — and we’re thrilled to share PhysTwin: a milestone that reconstructs object appearance, geometry, and dynamics from just a few seconds of interaction! Led by the amazing Hanxiao Jiang 👉 PhysTwin combines Gaussian splatting with inverse dynamics optimization based on simple spring-mass systems. ⚙️ The result? Real-time, action-conditioned 3D video prediction under novel interactions (i.e., 3D world models). 🔑 A few key takeaways: 1. Having the right structure (e.g., particles/masses) helps navigate the trade-off between sample efficiency, generalization, and broad applicability. 2. Visual foundation models (VFMs) have matured to the point where they can provide rich supervision for world modeling (e.g., tracking, shape completion). 3. Beyond VFMs, many crucial components have come together in recent years: Gaussian splats for rendering, NVIDIA Warp for high-performance simulation, and scene/asset generation from a wide range of labs and companies. The future of 3D world models is looking bright! ✨ 4. The resulting digital twin supports a wide range of downstream applications—especially in data generation and policy evaluation, thanks to its realistic rendering and simulation capabilities. 🎥 All code and data to reproduce the results, along with interactive demos, are available on the website. Check the following visualizations of: (1) observations, (2) reconstructed state/actions, (3) interactive digital twins, and (4) the overlays between real-world robot teleoperation and our model’s open-loop predictions.

Yunzhu Li

25,279 Aufrufe • vor 1 Jahr

In November 2023, China issued a 9-page call to action for advancing humanoid robots, targeting major tech breakthroughs by 2025 and making humanoids a key economic driver by 2027. How much progress did Chinese companies make in 2024? Major highlights🧵

In November 2023, China issued a 9-page call to action for advancing humanoid robots, targeting major tech breakthroughs by 2025 and making humanoids a key economic driver by 2027. How much progress did Chinese companies make in 2024? Major highlights🧵

The Humanoid Hub

626,425 Aufrufe • vor 1 Jahr