正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Domain Randomization (DR) is a key component of the data augmentation pipeline at Axis Robotics. By applying DR, we are able to scale verified, high-quality human trajectories by 10x to 100x. During training, we systematically introduce variances in environmental parameters. This prevents the model from relying on spurious visual... correlations. The objective is to ensure the policy learns rather than overfitting. To demonstrate the necessity and effectiveness of this approach, we evaluated both DR and No-DR models on Task 74 (pour_water_into_mug). The empirical results show a definitive impact on real-world deployment reliability: integrating DR into the pipeline increased the success rate from 0% to 90% (Fig. 1). This divergence stems from how the respective policies process visual observations (Fig. 2). The baseline (No DR) model overfits to the static visual background. It essentially memorizes the poses from the training dataset but fails to generalize when subjected to the inevitable variances of real-world deployment. Consequently, it cannot execute the correct manipulation on the target object. Conversely, the DR-trained model learns to extract essential geometric features and physical constraints, filtering out superficial visual noise. This leads to significantly higher robustness in dynamic environments. The structural difference in execution is clearly reflected in the end-effector trajectory data: These real-world deployment recordings further illustrate this difference (Videos 1 and 2). Scaling Physical AI requires turning raw trajectory data into robust policies, and a rigorously engineered DR infrastructure is an essential bridge to close the Sim2Real gap.show more

Axis Robotics

24,564 subscribers

27,125 次观看 • 3 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

World Labs CEO Dr. Fei-Fei Li & SceniX Co-Founder Yunzhu Li on the data bottleneck in robotics and how world models help: "The lack of data in training, the lack of data in evaluation, this is very, very different from language models, where data is abundant on the internet." "And we know that in order for robotics to work, we have to somehow unlock the power of scaling law. But where does that come from?" "It's a profound problem that everybody's battling with in robotics." "We see a lot of unlock in being able to do this whole process through the modeling of the environments." "Being able to create these digital worlds... that's just going to unlock so much more potential for being able to replace all the costly and the unsafe data in the real environments with the data generated from the worlds for robots to be able to do scalable learning and evaluations." Fei-Fei Li Yunzhu Li martin_casado

World Labs CEO Dr. Fei-Fei Li & SceniX Co-Founder Yunzhu Li on the data bottleneck in robotics and how world models help: "The lack of data in training, the lack of data in evaluation, this is very, very different from language models, where data is abundant on the internet." "And we know that in order for robotics to work, we have to somehow unlock the power of scaling law. But where does that come from?" "It's a profound problem that everybody's battling with in robotics." "We see a lot of unlock in being able to do this whole process through the modeling of the environments." "Being able to create these digital worlds... that's just going to unlock so much more potential for being able to replace all the costly and the unsafe data in the real environments with the data generated from the worlds for robots to be able to do scalable learning and evaluations." Fei-Fei Li Yunzhu Li martin_casado

a16z

62,817 次观看 • 4 天前

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

RoboPapers

38,334 次观看 • 1 个月前

One of the most followed cancer researchers in the world, Dr. William Makis, is honored and proud to be part of a large research study spearheaded by yours truly. A study of this scale, involving over 4,000 patients, requires strong support. We are counting on your help so we can gather and publish the data in a peer-reviewed journal and bring this important research to the world.

One of the most followed cancer researchers in the world, Dr. William Makis, is honored and proud to be part of a large research study spearheaded by yours truly. A study of this scale, involving over 4,000 patients, requires strong support. We are counting on your help so we can gather and publish the data in a peer-reviewed journal and bring this important research to the world.

Marivic Villa, MD, FCCP

17,436 次观看 • 9 个月前

At Axis, every trajectory submitted by our community undergoes a strict replay validation process. We run each submission through checker to verify whether the target task was successfully completed. To see how strict it is, check this demo (Task: Place The Toy Train On The Board Game Box). Real human data passes smoothly (Video 1). However, bots or manually altered data will fail (Video 2). Why? Faking numbers breaks the simulation's physics causality. Even tiny tweaks cause error accumulation, resulting in failed movements. This invalid data is automatically rejected. Because of this mechanism, data submitted via bots will ultimately fail our replay verification. Invalid data is strictly excluded from model training, and the task slot is reopened to the community to collect genuine, high-quality trajectories. Furthermore, we actively monitor for duplicated data. Trajectories that are identical lack the diversity required for robot learning and will not be credited by our scoring system. If we detect accounts submitting a massive volume of identical trajectories, all associated addresses will be permanently banned. For Axis, the quality and diversity of data are the only ways to solve the robotics generalization gap. They will always be our absolute top priorities.

At Axis, every trajectory submitted by our community undergoes a strict replay validation process. We run each submission through checker to verify whether the target task was successfully completed. To see how strict it is, check this demo (Task: Place The Toy Train On The Board Game Box). Real human data passes smoothly (Video 1). However, bots or manually altered data will fail (Video 2). Why? Faking numbers breaks the simulation's physics causality. Even tiny tweaks cause error accumulation, resulting in failed movements. This invalid data is automatically rejected. Because of this mechanism, data submitted via bots will ultimately fail our replay verification. Invalid data is strictly excluded from model training, and the task slot is reopened to the community to collect genuine, high-quality trajectories. Furthermore, we actively monitor for duplicated data. Trajectories that are identical lack the diversity required for robot learning and will not be credited by our scoring system. If we detect accounts submitting a massive volume of identical trajectories, all associated addresses will be permanently banned. For Axis, the quality and diversity of data are the only ways to solve the robotics generalization gap. They will always be our absolute top priorities.

Axis Robotics

30,876 次观看 • 3 个月前

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

AK

367,108 次观看 • 1 年前

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

AK

40,474 次观看 • 2 年前

Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the motion of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are building tools to enable everyone in the ecosystem to scale up with us. Links in thread:

Jim Fan

364,380 次观看 • 2 年前

Who are “THEY”? “We have to name the names” in the worst miscarriage of medical science in history. Is it the World Health Organization? Dr. David Martin says Tedros is just a puppet with a “giant stick up his ass.” “But who’s moving the stick for the puppet?” The answer is, according to Dr. David Martin: • Bill Gates • The Wellcome Trust • The Rockefeller Foundation “By 2023, which is kind of where we are right now, Gates represents 88% of the donations to the World Health Organization from donor organizations and agencies. By any definition, that’s a controlling interest.” Bill Gates is no philanthropist; he’s laundering money into the World Health Organization. And when Gates and others state that having vaccinations where they are needed could potentially lead to as much as a 20% reduction in the Earth’s population, according to Dr. Martin, “those are not allegations; those are stated objectives.” Much more was revealed in this 4-minute clip. Watch and listen to Dr. David Martin.

Who are “THEY”? “We have to name the names” in the worst miscarriage of medical science in history. Is it the World Health Organization? Dr. David Martin says Tedros is just a puppet with a “giant stick up his ass.” “But who’s moving the stick for the puppet?” The answer is, according to Dr. David Martin: • Bill Gates • The Wellcome Trust • The Rockefeller Foundation “By 2023, which is kind of where we are right now, Gates represents 88% of the donations to the World Health Organization from donor organizations and agencies. By any definition, that’s a controlling interest.” Bill Gates is no philanthropist; he’s laundering money into the World Health Organization. And when Gates and others state that having vaccinations where they are needed could potentially lead to as much as a 20% reduction in the Earth’s population, according to Dr. Martin, “those are not allegations; those are stated objectives.” Much more was revealed in this 4-minute clip. Watch and listen to Dr. David Martin.

The Vigilant Fox 🦊

551,023 次观看 • 2 年前

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

466,333 次观看 • 1 年前

Disturbing data from an Italian study of nearly 300,000 individuals reveals a significant increase in cancer risk post-COVID vaccination. Dr. John Campbell breaks down the alarming findings: • The rate of first cancer hospitalization in the UNvaccinated cohort was 0.85%. • In the VACCINATED cohort, the rate jumped to 1.15%—a concerning and statistically significant increase. This wasn't a fluke. The results have a p-value of <0.001, meaning there's less than a 1 in 1000 chance this happened by random chance. Why is this happening? Dr. Campbell outlines plausible biological mechanisms: • Ongoing spike protein production driving chronic inflammation, a known precursor to cancer. • DNA contamination in the vaccines, potentially inhibiting tumor suppressor genes. • Frameshifting from pseudouridine, creating rogue proteins that can cause cellular damage and mutation. The most damning part? This crucial data is being withheld from the public in the UK and other Western nations. Our governments are depriving scientists of the vaxxed vs. unvaxxed data needed to perform this exact analysis. This lack of transparency is, in Dr. Campbell's view, "quite outrageous" and compromises a potential cover-up. This Italian province's data is a canary in the coal mine. If these figures are extrapolated to populations of 50 million or more, the implications are staggering. We are being kept in the dark. It is time to demand transparency. It is time to demand the data.

Disturbing data from an Italian study of nearly 300,000 individuals reveals a significant increase in cancer risk post-COVID vaccination. Dr. John Campbell breaks down the alarming findings: • The rate of first cancer hospitalization in the UNvaccinated cohort was 0.85%. • In the VACCINATED cohort, the rate jumped to 1.15%—a concerning and statistically significant increase. This wasn't a fluke. The results have a p-value of <0.001, meaning there's less than a 1 in 1000 chance this happened by random chance. Why is this happening? Dr. Campbell outlines plausible biological mechanisms: • Ongoing spike protein production driving chronic inflammation, a known precursor to cancer. • DNA contamination in the vaccines, potentially inhibiting tumor suppressor genes. • Frameshifting from pseudouridine, creating rogue proteins that can cause cellular damage and mutation. The most damning part? This crucial data is being withheld from the public in the UK and other Western nations. Our governments are depriving scientists of the vaxxed vs. unvaxxed data needed to perform this exact analysis. This lack of transparency is, in Dr. Campbell's view, "quite outrageous" and compromises a potential cover-up. This Italian province's data is a canary in the coal mine. If these figures are extrapolated to populations of 50 million or more, the implications are staggering. We are being kept in the dark. It is time to demand transparency. It is time to demand the data.

Camus

62,673 次观看 • 10 个月前

Historically, RL policies for robots have been trained in synthetic, untextured environments, limiting perceptive policies to depth images where the sim-to-real gap is manageable. RGB has always had more potential, but leveraging it to train policies in simulation remained an open problem. Partnering with Niantic Spatial 🌎 and NVIDIA Robotics, we built a pipeline that addresses exactly that. We can now scan a real deployment site with off-the-shelf hardware, reconstruct it into a photorealistic Gaussian splat, and run massively parallel RL training. The policies trained in our Gym environment then transfer zero-shot to the real robot and environments they were trained for. This enables faster deployment of more capable and robust policies for the end user. The new resulting capabilities are a big step towards solving sim-to-real and also apply well beyond navigation. Read the full technical breakdown on our blog; link in the comments. #HumanoidRobots #Flexion #NianticSpatial #NVIDIA

Historically, RL policies for robots have been trained in synthetic, untextured environments, limiting perceptive policies to depth images where the sim-to-real gap is manageable. RGB has always had more potential, but leveraging it to train policies in simulation remained an open problem. Partnering with Niantic Spatial 🌎 and NVIDIA Robotics, we built a pipeline that addresses exactly that. We can now scan a real deployment site with off-the-shelf hardware, reconstruct it into a photorealistic Gaussian splat, and run massively parallel RL training. The policies trained in our Gym environment then transfer zero-shot to the real robot and environments they were trained for. This enables faster deployment of more capable and robust policies for the end user. The new resulting capabilities are a big step towards solving sim-to-real and also apply well beyond navigation. Read the full technical breakdown on our blog; link in the comments. #HumanoidRobots #Flexion #NianticSpatial #NVIDIA

Flexion

69,847 次观看 • 12 天前

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI 🚀 - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to “touch the world.” - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks. Huge thanks to every teammate who fought side by side on this journey—from architecture, data, training, infra, serving, and evaluation to post-training. Every part of this project carries an incredible amount of hard work. This was my first time leading a project as Tech Lead, and I feel truly fortunate. The future of Physical AI needs models that can not only “see” and “describe” the world, but also “imagine,” “simulate,” and “act”—and eventually close the loop with the real world. I hope Cosmos 3 can become an important starting point for this direction, and I’m excited to push Physical AI into its next stage together with the open-source community. Welcome to the era of Physical AI. HuggingFace: Project Website: Code:

Max Zhaoshuo Li 李赵硕

1,078,282 次观看 • 2 个月前

Robora x Lemorele We are excited to announce the integration of the Robora P300 (R) model. An exclusive model developed in partnership with Lemorele, features a custom Robora-exclusive SKU and hardware configuration designed to support high-performance, wireless video streaming for AI training and perception. We will integrate this specialized unit into our VLA Module. The Robora P300 (R) enables seamless, real-time capture of high-definition HDMI video from external cameras, allowing users to wirelessly feed visual data directly into the VLA training pipeline. This enhances capabilities such as multi-view learning, remote annotation, and real-world AI fine-tuning, all without requiring tethered camera setups. While the full Robora robotics platform is still in development, users will be able to use the Robora P300 (R) hardware with their existing iOS or Android devices, enabling them to begin collecting and streaming visual data for VLA from mobile platforms. As part of our commitment to the Robora community, we will be offering exclusive giveaways of the Robora P300 (R) hardware to token holders, giving early supporters the opportunity to contribute directly to Robora’s growing AI training ecosystem and to gain early access to the tools that power the future of robotics and intelligent vision.

Robora x Lemorele We are excited to announce the integration of the Robora P300 (R) model. An exclusive model developed in partnership with Lemorele, features a custom Robora-exclusive SKU and hardware configuration designed to support high-performance, wireless video streaming for AI training and perception. We will integrate this specialized unit into our VLA Module. The Robora P300 (R) enables seamless, real-time capture of high-definition HDMI video from external cameras, allowing users to wirelessly feed visual data directly into the VLA training pipeline. This enhances capabilities such as multi-view learning, remote annotation, and real-world AI fine-tuning, all without requiring tethered camera setups. While the full Robora robotics platform is still in development, users will be able to use the Robora P300 (R) hardware with their existing iOS or Android devices, enabling them to begin collecting and streaming visual data for VLA from mobile platforms. As part of our commitment to the Robora community, we will be offering exclusive giveaways of the Robora P300 (R) hardware to token holders, giving early supporters the opportunity to contribute directly to Robora’s growing AI training ecosystem and to gain early access to the tools that power the future of robotics and intelligent vision.

Robora

19,276 次观看 • 10 个月前

We trained a robot dog to balance and walk on top of a yoga ball purely in simulation, and then transfer zero-shot to the real world. No fine-tuning. Just works. I’m excited to announce DrEureka, an LLM agent that writes code to train robot skills in simulation, and writes more code to bridge the difficult simulation-reality gap. It fully automates the pipeline from new skill learning to real-world deployment. The Yoga ball task is particularly hard because it is not possible to accurately simulate the bouncy ball surface. Yet DrEureka has no trouble searching over a vast space of sim-to-real configurations, and enables the dog to steer the ball on various terrains, even walking sideways! Traditionally, the sim-to-real transfer is achieved by domain randomization, a tedious process that requires expert human roboticists to stare at every parameter and adjust by hand. Frontier LLMs like GPT-4 have tons of built-in physical intuition for friction, damping, stiffness, gravity, etc. We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well. DrEureka builds on our prior work Eureka, the algorithm that teaches a 5-finger robot hand to do pen spinning. It takes one step further on our quest to automate the entire robot learning pipeline by an AI agent system. One model that outputs strings will supervise another model that outputs torque control. We open-source everything! Welcome you all to check out the paper, more videos, and try the codebase today: Code:

We trained a robot dog to balance and walk on top of a yoga ball purely in simulation, and then transfer zero-shot to the real world. No fine-tuning. Just works. I’m excited to announce DrEureka, an LLM agent that writes code to train robot skills in simulation, and writes more code to bridge the difficult simulation-reality gap. It fully automates the pipeline from new skill learning to real-world deployment. The Yoga ball task is particularly hard because it is not possible to accurately simulate the bouncy ball surface. Yet DrEureka has no trouble searching over a vast space of sim-to-real configurations, and enables the dog to steer the ball on various terrains, even walking sideways! Traditionally, the sim-to-real transfer is achieved by domain randomization, a tedious process that requires expert human roboticists to stare at every parameter and adjust by hand. Frontier LLMs like GPT-4 have tons of built-in physical intuition for friction, damping, stiffness, gravity, etc. We are (mildly) surprised to find that DrEureka can tune these parameters competently and explain its reasoning well. DrEureka builds on our prior work Eureka, the algorithm that teaches a 5-finger robot hand to do pen spinning. It takes one step further on our quest to automate the entire robot learning pipeline by an AI agent system. One model that outputs strings will supervise another model that outputs torque control. We open-source everything! Welcome you all to check out the paper, more videos, and try the codebase today: Code:

Jim Fan

908,894 次观看 • 2 年前

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Foundation models are enough to solve robotics! Unfortunately, this is not true. We keep hearing that Vision-Language-Action (VLA) models struggle because of the gap between static training and the dynamic real world. A German startup (Sereact) just released a solution that bridges this gap perfectly. They are introducing a new paradigm called Interactive RL Policy Patching. It's a distributed framework that allows robots to self-learn from human corrections without needing full retraining. When a robot fails, a human operator provides a brief "patch" or demonstration. The system then uses online off-policy reinforcement learning to update the behavior instantly. This is powered by a massive foundation model trained on hundreds of millions of interactions from over 100 deployed robot stations. The best part is the distributed parameter synchronization... When one robot learns a fix, the update is published fleet-wide... meaning the entire swarm gets smarter from a single human intervention. They are already proving this on complex manipulation tasks like shoe unboxing and screw sorting, drastically reducing the data needed to handle edge cases. Real-world environments are unforgiving, and I love seeing systems that can actually adapt on the fly! 📍 More info:

Ilir Aliu

18,210 次观看 • 6 个月前

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Jim Fan

332,199 次观看 • 2 年前

Dr. McCullough Names Four Key Conspirators in the Greatest Crime in Human History 1.) Dr. Anthony Fauci • “Fauci was in on it the entire time. He was pretending like he was responding to something that was brand new, but he knew the entire time. In fact, he was part of a conspiracy to cover up where the virus came from.” 2.) Dr. Ralph Baric, Professor of Epidemiology and researcher at the University of North Carolina at Chapel Hill. • “He’s the one who engineered taking part of a bat coronavirus onto a human coronavirus and making it invade and be lethal.” 3.) Dr. Peter Daszak, President of the EcoHealth Alliance. • “He carried the plans from Baric over to China.” 4.) Dr. Shi Zhengli, Senior scientist and researcher at the Wuhan Institute of Virology. “The bat lady.” • “She went into the bat caves and handled the bats and harvested the virus, and they published papers.” Peter A. McCullough, MD, MPH® ended on this note: “They telegraphed it. They said, ‘It [COVID] is ready to go.’ That’s what these papers say, ‘It’s ready to go. We’ve got it. We created the virus.’ They announced it in broad daylight.”

The Vigilant Fox 🦊

236,324 次观看 • 2 年前

For robots to be useful, they must be able to interact with a wide variety of environments; and yet, scaling interaction data is difficult, expensive, and time consuming. Instead, much research revolves around sim-to-real manipulation — but mostly this has not been mobile manipulation. Recently, though, this has begun to change. Two recent papers from Tairan He and Haoru Xue show us how to unlock the potential of this technique, building policies which, without any real data at all, can move objects around in the world and open doors in the real world with a humanoid robot. Watch Episode #60 of RoboPapers now to learn more, hosted by Chris Paxton and Jiafei Duan. In this episode, we cover two papers:. First is VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation; and second is DoorMan: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer.

For robots to be useful, they must be able to interact with a wide variety of environments; and yet, scaling interaction data is difficult, expensive, and time consuming. Instead, much research revolves around sim-to-real manipulation — but mostly this has not been mobile manipulation. Recently, though, this has begun to change. Two recent papers from Tairan He and Haoru Xue show us how to unlock the potential of this technique, building policies which, without any real data at all, can move objects around in the world and open doors in the real world with a humanoid robot. Watch Episode #60 of RoboPapers now to learn more, hosted by Chris Paxton and Jiafei Duan. In this episode, we cover two papers:. First is VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation; and second is DoorMan: Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer.

RoboPapers

30,767 次观看 • 6 个月前