Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

This also shows up in the representations learned by the model. We plot the model’s representations of human and robot images. As pre-training is scaled up, the representation of humans and robots become more aligned: to a scaled-up model, human videos "look" like robot demos.

Physical Intelligence

44,596 subscribers

120,595 Aufrufe • vor 7 Monaten •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

How can we leverage diverse human videos to improve robot manipulation? Excited to introduce EgoVLA — a Vision-Language-Action model trained on egocentric human videos by explicitly modeling wrist & hand motion. We build a shared action space between humans and robots, enabling seamless transfer. With some robot demos, EgoVLA becomes a powerful, generalizable robot policy.

How can we leverage diverse human videos to improve robot manipulation? Excited to introduce EgoVLA — a Vision-Language-Action model trained on egocentric human videos by explicitly modeling wrist & hand motion. We build a shared action space between humans and robots, enabling seamless transfer. With some robot demos, EgoVLA becomes a powerful, generalizable robot policy.

Ruihan Yang

58,712 Aufrufe • vor 1 Jahr

Text-to-image diffusion transformer models learn to align text and image representations as a byproduct of their conditional denoising task. By taking the dot product between the text and image representations of a DiT model (like Flux 2), you can create rich saliency maps.

Text-to-image diffusion transformer models learn to align text and image representations as a byproduct of their conditional denoising task. By taking the dot product between the text and image representations of a DiT model (like Flux 2), you can create rich saliency maps.

Alec Helbling

94,095 Aufrufe • vor 7 Monaten

how Smooth Is Human-Robot Table Tennis Match? Intelligent robots play table tennis accurately, bringing fun human-machine competitions. The combination of sports and technology shows the infinite vitality of scientific innovation.#ICIF #Shenzhen #tennis

how Smooth Is Human-Robot Table Tennis Match? Intelligent robots play table tennis accurately, bringing fun human-machine competitions. The combination of sports and technology shows the infinite vitality of scientific innovation.#ICIF #Shenzhen #tennis

Qingdao Feel

83,474 Aufrufe • vor 2 Monaten

Came to the office and ended up training a robot 🤖

Came to the office and ended up training a robot 🤖

Katja Sirazitdinova

140,506 Aufrufe • vor 25 Tagen

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

(1/2) MonoNPHM will be presented as a #CVPR2024 Highlight! Our Neural Parametric Head Model parametrizes both geometry and appearance. With the learned model, we can then 3D reconstruct and track human heads from images or videos.

Matthias Niessner

17,209 Aufrufe • vor 2 Jahren

TESLA’S NEXT-GEN OPTIMUS V3 HAND IS GETTING CLOSE TO HUMAN-LEVEL Tesla engineers just shared new details on the Optimus V3 hand — and the progress is striking. They’re now saying that as they move into Gen-3 and mass production, the hand is getting very close to human functionality and form factor. One engineer described it clearly: “It won’t even look like a robot. It will look like a human in a superhero suit. It will be something revolutionary.” This level of dexterity and human-like design is a critical milestone. The hand is one of the hardest parts of building a truly useful humanoid robot — and Tesla is iterating extremely fast.

TESLA’S NEXT-GEN OPTIMUS V3 HAND IS GETTING CLOSE TO HUMAN-LEVEL Tesla engineers just shared new details on the Optimus V3 hand — and the progress is striking. They’re now saying that as they move into Gen-3 and mass production, the hand is getting very close to human functionality and form factor. One engineer described it clearly: “It won’t even look like a robot. It will look like a human in a superhero suit. It will be something revolutionary.” This level of dexterity and human-like design is a critical milestone. The hand is one of the hardest parts of building a truly useful humanoid robot — and Tesla is iterating extremely fast.

Tesla Owners Silicon Valley

235,984 Aufrufe • vor 25 Tagen

In the future, it might just be humanoid robots doing the refueling!🤖⛽️ Sinopec plans to introduce humanoid robot workers across its gas stations. They will handle tasks like fueling, picking up items, restocking, and inspections,working alongside human staff, especially during peak holiday periods in China to ease service pressure. With more than 30,000 gas stations across the country (most of them equipped with convenience stores), this implies a deployment of at least 30,000 humanoid robots. The robot is a wheeled humanoid developed by Beijing-based company FIVEAGES.

In the future, it might just be humanoid robots doing the refueling!🤖⛽️ Sinopec plans to introduce humanoid robot workers across its gas stations. They will handle tasks like fueling, picking up items, restocking, and inspections,working alongside human staff, especially during peak holiday periods in China to ease service pressure. With more than 30,000 gas stations across the country (most of them equipped with convenience stores), this implies a deployment of at least 30,000 humanoid robots. The robot is a wheeled humanoid developed by Beijing-based company FIVEAGES.

CyberRobo

23,536 Aufrufe • vor 3 Monaten

So… the 1X NEO home robot is not actually autonomous. Behind the scenes, it’ll often be teleoperated by humans, meaning someone, somewhere, could literally remote-control a robot inside your living room. I wanted the dawn of embodied AI. Instead, I’m apparently paying $499/month for a robot avatar with a human pilot. It’s impressive tech. But also… kind of dystopian? A robot that looks alive, yet secretly puppeteered, the uncanny valley just got a new basement level. Feels less like the “post-labor future,” and more like we just outsourced physical presence itself.

VraserX e/acc

723,711 Aufrufe • vor 9 Monaten

This isn’t a robot it’s a Dominican Papibot $tsla Boston Dynamics just raised the bar again. The new Atlas moves with fluid, human-like coordination—then goes past human limits. Balance, recovery, and whole-body control are happening in real time, not pre-scripted demos. $nvda

This isn’t a robot it’s a Dominican Papibot $tsla Boston Dynamics just raised the bar again. The new Atlas moves with fluid, human-like coordination—then goes past human limits. Balance, recovery, and whole-body control are happening in real time, not pre-scripted demos. $nvda

Special Situations 🌐 Research Newsletter (Jay)

21,803 Aufrufe • vor 4 Monaten

The context size of video world models is only a few frames. Like a human with severe memory loss! We design a long-term memory for world models based on explicit 3D representations inspired by the human mind. This enables long-term consistency. 1/3

The context size of video world models is only a few frames. Like a human with severe memory loss! We design a long-term memory for world models based on explicit 3D representations inspired by the human mind. This enables long-term consistency. 1/3

Gordon Wetzstein

35,075 Aufrufe • vor 1 Jahr

Turns out you can train humanoid hands without any robot data. The idea in HUG is quite simple: (a) collect human data with smart glasses, (b) train a human manipulation model, (c) retarget to multi-fingered robot hands.

Turns out you can train humanoid hands without any robot data. The idea in HUG is quite simple: (a) collect human data with smart glasses, (b) train a human manipulation model, (c) retarget to multi-fingered robot hands.

Lerrel Pinto

32,446 Aufrufe • vor 1 Monat

New startup out of stealth in Cambridge MA: Eka Robotics. The company is building intelligence for the physical world in its native language: FORCE. Their core solution to mastering fast, reliable adaptive robots: Vision-Force-Action (VFA) model: - sim-only reinforcement learning - no human teleop data - robot practices thousands of hours in a physics-rich simulator (mass, inertia) and comes up with its own solutions, AlphaZero-style. - custom grippers add touch sensing; the model maps pixels + felt force to actions.

New startup out of stealth in Cambridge MA: Eka Robotics. The company is building intelligence for the physical world in its native language: FORCE. Their core solution to mastering fast, reliable adaptive robots: Vision-Force-Action (VFA) model: - sim-only reinforcement learning - no human teleop data - robot practices thousands of hours in a physics-rich simulator (mass, inertia) and comes up with its own solutions, AlphaZero-style. - custom grippers add touch sensing; the model maps pixels + felt force to actions.

The Humanoid Hub

12,719 Aufrufe • vor 2 Monaten

The Hidden Language of Diffusion Models paper page: tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation

The Hidden Language of Diffusion Models paper page: tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation

AK

41,746 Aufrufe • vor 3 Jahren

🔥WATCH: HUMANOID ROBOT ALMOST THROWS HANDS A humanoid robot appeared ready to square up after a man interrupted its play. The reaction was way too human! This robot is basically a kid with metal joints. 🤖

🔥WATCH: HUMANOID ROBOT ALMOST THROWS HANDS A humanoid robot appeared ready to square up after a man interrupted its play. The reaction was way too human! This robot is basically a kid with metal joints. 🤖

Coin Bureau

36,938 Aufrufe • vor 1 Monat

Your model can't zoom up as much as your friends? Send this to your rigger❗️ - In the video, left side is adjusted exported param, right side is the default. - The left model has a more close up default state and can zoom up more than the right The export parameters that you should pay attention to - Center of model Y vertically affects the center of exported model - Canvas scale unit: How big the default state of your model is Personally I would recommend having - Center of model Y at 0.25 - 0.15 ( to focus on the face) - Canvas scale unit: 4.0 #live2d #live2dtip #vtubestudio

Your model can't zoom up as much as your friends? Send this to your rigger❗️ - In the video, left side is adjusted exported param, right side is the default. - The left model has a more close up default state and can zoom up more than the right The export parameters that you should pay attention to - Center of model Y vertically affects the center of exported model - Canvas scale unit: How big the default state of your model is Personally I would recommend having - Center of model Y at 0.25 - 0.15 ( to focus on the face) - Canvas scale unit: 4.0 #live2d #live2dtip #vtubestudio

ALKANimate | 2d rigging comm (closed for 2025) |

160,779 Aufrufe • vor 2 Jahren

As the world gets more digital, it should also be more human. That's why we're making banking at TD feel more simple, more empathetic, and more intuitive. Watch this new chapter of our story about a little robot going on a big journey. The very same robot we’re casting here!

As the world gets more digital, it should also be more human. That's why we're making banking at TD feel more simple, more empathetic, and more intuitive. Watch this new chapter of our story about a little robot going on a big journey. The very same robot we’re casting here!

TD (Canada)

26,196 Aufrufe • vor 5 Monaten

we open sourced the code to transform human videos into robot trajectories, so you can train robots with your hands 👐🏻 we used it in our recent paper R+X: Retrieval and Execution from Everyday Human Videos (ICRA 2025 🇺🇸) link and details in thread 🧵

we open sourced the code to transform human videos into robot trajectories, so you can train robots with your hands 👐🏻 we used it in our recent paper R+X: Retrieval and Execution from Everyday Human Videos (ICRA 2025 🇺🇸) link and details in thread 🧵

Norman Di Palo

10,578 Aufrufe • vor 1 Jahr

It’s unreal to see this El Niño continue to astonish. Take a look at the subsurface heat build up in the last few frames of this. The anomalies are backing up, and heating up, like a traffic jam, in the Central Pacific. There’s a ton more fuel left in this tank! And it seems evident it will end up living up to the hype and seasonal model forecasts. Thanks Alex Boreham for the great visual.

It’s unreal to see this El Niño continue to astonish. Take a look at the subsurface heat build up in the last few frames of this. The anomalies are backing up, and heating up, like a traffic jam, in the Central Pacific. There’s a ton more fuel left in this tank! And it seems evident it will end up living up to the hype and seasonal model forecasts. Thanks Alex Boreham for the great visual.

Jeff Berardelli

73,684 Aufrufe • vor 4 Tagen