Uploaded: 2024-06-05T00:49:07.000Z
Duration: PT10.009S
Channel: Animesh Garg

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with... supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)show more

Animesh Garg

52,279 просмотров • 2 лет назад

Reinforcement Learning (RL) has long been the dominant method... for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: Code:show more

Yulu Gan

414,967 просмотров • 8 месяцев назад

ICML 2026: Latent Reasoning in TRMs is Secretly a... Policy Improvement Operator Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited. We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target. Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.show more

Arip

23,733 просмотров • 7 дней назад

1/ Gemini 2.5 is here, and it’s our most... intelligent AI model ever. Our first 2.5 model, Gemini 2.5 Pro Experimental is a state-of-the-art thinking model, leading in a wide range of benchmarks – with impressive improvements in enhanced reasoning and coding and now #1 on Arena by a significant margin. With a model this intelligent, we wanted to get it to people as quickly as possible. Find it on Google AI Studio and in the Google Gemini for Gemini Advanced users now – and in Vertex in the coming weeks. This is the start of a new era of thinking models – and we can’t wait to see where things go from here.show more

Sundar Pichai

864,057 просмотров • 1 год назад

How robust can model predictive control be if we... show more

Heng Yang

28,660 просмотров • 1 год назад

Robora Sim: A PyBullet-Powered Environment for Learning Robotic Physical... Intelligence We are currently building our Robora simulation environment setup for our sim based learning, leveraging PyBullet, an industry-standard physics engine widely used in AI-driven robotics research and development. The environment is optimized with GPU-accelerated learning algorithms, enabling high-speed imitation learning and reinforcement learning within a safe and controlled virtual setup before shipping out to real world. This simulation platform allows our models to learn, adapt, and generalize across different robot morphologies, terrain types and task objectives - all before deployment to the real world. At it's core, the system combines a VLA-powered high-level planner with low-level motion control algorithms, working cohesively to produce emergent, physically intelligent behaviors. This synergy between simulation, learning, and real-world transfer marks a major step forward in our pursuit of adaptive and intelligent robotic systems. Through advanced domain randomization and synthetic data generation, the Robora Simulation Environment ensures that policies trained in simulation transfer effectively to real-world robots, minimizing the sim-to-real gap. Moreover, users will be able to test and integrate their own hardware kits within selected simulation environments in the Robora Dapp, ensuring seamless compatibility and safer real-world implementation.show more

Robora

23,489 просмотров • 8 месяцев назад

PPO has long dominated robot locomotion training in simulation.... show more

Robotic Systems Lab

41,757 просмотров • 12 дней назад

Building on the previous paper, in this study we... compare a continuous “smooth return” S2>S1 model with an event-driven one, where long periods of relative calm are punctuated by short, intense episodes of global reorganisation. Both models cover the same time window. Neither uses archaeological data in its construction. When compared against where early humans and early civilizations actually appear and persist, the difference is statistically robust. The smooth model behaves like background noise. The event-driven model lines up in time and space far better than chance allows, even after aggressive temporal and spatial randomization tests. Statistically, the event-driven model lines up with where and when early civilizations appear far better than a smooth, continuous model, even after we randomize both timing and location to test what could arise by chance. The event timeline itself was built independently from well-known late-glacial disruptions - such as Heinrich events, meltwater pulses, and abrupt deglacial transitions - rather than from any archaeological data. Nothing here claims that specific events caused specific cultures. It does suggest that history may not unfold on a smooth clock. Human societies seem to flourish during recovery phases between disruptions, not during the disruptions themselves. The animation contrasts the two return models. Draft paper : Source & Results : (coming soon)show more

Craig Stone

10,899 просмотров • 5 месяцев назад

A viral paper "Language Model Represents Space and Time"... recently claims that LLMs learn "world models". As much as I like Max Tegmark's works, I disagree with their definition of world model. World model is a core concept in AI agent and decision making. It is our mental simulation of how the world works given interventions (or lack thereof). A world model captures causality and intuitive physics, telling the agent what is likely and what is impossible. It can and should be used for counterfactual reasoning, i.e. "what ifs": what would happen if I knock over a cup of water? Where would I have been if I had not taken that bus? Yann LeCun Yann LeCun says it well in his position paper ( I quote: "Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems. Importantly, they can also avoid making dangerous mistakes when facing an unknown situation." The first use of the term World Model in deep policy learning is attributed to hardmaru & Jürgen Schmidhuber: In their seminal paper, an agent masters shooting skills in the popular game Doom (demo below) by learning in imagination, using an internal world model as a "physics simulator". To put in a simple Python math formula, world model learns a function F(s[0:t-1], a) -> s[t:], which takes as input the observed past and current action, and outputs plausible future states. Now the definition of World Model in Tegmark's paper seems to be about predicting GPS coordinates and time eras. I see this as just a classification task with no causal learning and simulation going on. You cannot make meaningful interventions against that model, nor can you optimize any decision making in a closed feedback loop. As for the "space & time neurons", I think they are most similar to the "sentiment neuron" that OpenAI published in 2017: Predicting GPS is conceptually no different from predicting sentiment in my opinion. I don't think their experimental results are wrong - just that their conclusion is on shaky grounds. I welcome any debate! Paper link:show more

Jim Fan

593,943 просмотров • 2 лет назад

World Models are the path for some AI Models... in the future. But how can we efficiently train these models to not only see the world the way humans do but to see the world in a new and unique way. By visualizing, what is normally sequenced audio patterns, we can derive much more insights. Here we see Paganini in a visual form that can than be described and transcribed into a World Model. We can observe connections in a manner that may not have been clear prior to the digitalization of music and sound in this way. The company with the most valuable potential in building a World Model is Tesla. Not that this type of visualization is being used, but that the mechanisms are in place, and the technology is in place for the company to thrive in this new form of AI.show more

Brian Roemmele

57,424 просмотров • 7 месяцев назад

Qualia has been selected for the Google DeepMind Robotics... Program. We train embodied models that put a robot on a real manual task and make it work, on the floor, not in a demo. Foundation models and reasoning are where robotics is heading, and doing that work alongside DeepMind, who are pushing this frontier, is exactly where we want to be. If you are a company looking to see how a new generation of robots can help your manual tasks, contact us at [email protected] More soonshow more

Qualia

86,918 просмотров • 13 дней назад

In flow matching, a coupling determines how noise and... data samples are paired during training. The choice of coupling is important because it influences the geometry of trajectories at inference time. The simplest choice is the independent coupling, where noise and data points are paired arbitrarily. This can lead to curved trajectories as the model averages over many conflicting pairings. However, if we use optimal transport on batches of pairs, this leads to fewer ambiguous intersections that the model must resolve, leading to straighter trajectories at inference time.show more

Alec Helbling

65,060 просмотров • 1 месяц назад

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers paper... page: Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in practical applications. This study addresses this challenge by breaking down the text-based video editing process into two separate stages. In the first stage, we leverage an existing text-to-image diffusion model to simultaneously edit a few keyframes without additional fine-tuning. In the second stage, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the keyframes, benefiting from structural guidance provided by intermediate frames. Our comprehensive set of experiments illustrates the efficacy and efficiency of MaskINT when compared to other diffusion-based methodologies. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.show more

25,449 просмотров • 2 лет назад

There is only one reason to share truth. And... no it has nothing to do with money- and everything to do about helping God move. Peace is the prize- a better world for everyone is the prize. We make SACRIFICES for things we want most in this world. Christ made the ultimate sacrifice If He can do that for us FREE of charge- we owe it to each other to work TOGETHER to make the world a better place- despite the toll and sacrifice. A level playing field- where we can all work hard and do our best to succeed- that is the payout. A better world for all We’re not here for just ourselves- otherwise the second commandment wouldn’t be like the first- love your neighbor as yourself. This world gets better the sooner we all wake up. Information is free- we collect and share it for free in hopes that it keeps spreading. Only working TOGETHER can we win. We are meant to move these messages for God so that humanity can wake up- this plan is designed that way- working together The Lord is ONE. We are bound as a family by Our Father in Heaven. We are apart of Him and He is apart of all of us. We are billions of little pieces of the Lords gift of life. We’ve been separated and pushed apart because it makes us weak. We were designed to be powerful by the love we draw from Creator and push through creation. We are ONE with Our Holy Father 🙏💗 Be EXCELLENT to each other frensshow more

🐸🐸🐸 🇺🇸

23,726 просмотров • 1 год назад

A Letter to Our Community: The Road Ahead for... Robotics To our Community and Partners, As we step into 2026, our mission at Axis is clearer than ever: Constructing the definitive End-to-End Scaling Layer for Robotics. Our goal is to accelerate the transfer of diverse human intelligence into Robotics General Intelligence (RGI). By owning the critical path of intelligence creation, we are turning the physical limitations of robotics into a scalable, software-driven future. Here is our strategic outlook and roadmap for the year ahead. The Core Thesis: Simulation is the Only Way Out The path to RGI is currently blocked by Data Scarcity, Generalization Fragility, and Hardware Fragmentation. At Axis, we believe Simulation is the only way out. Our Simulation Data Platform and Data Augmentation Engine transform raw data into "Synthetic Gold". Backed by academic milestones like Roboverse, Skill Blending, and GraspVLA, we have proven that pure simulation can achieve the generalization required for the real world. We don’t just collect data; we architect it. The Engine: Why Crypto? We believe RGI should come from all, not a few. Crypto is not just a feature; it is the primitive that powers our entire ecosystem flywheel: - Incentive Mechanism: Democratizing contribution and rewarding the trainers and developers. - Assetization: Turning proprietary data and refined models into liquid, ownable assets. - Verifiable Workflow: We are opening the "Black Box" of AI. By bringing total transparency to the Task Generation → Data Collection → Model Training pipeline, we ensure every byte of intelligence is verifiable, traceable, and secure. 2026 Strategic Deliverables This year, we are committed to delivering three foundational pillars: - The World's Largest Training Dataset for Robots: A robot training set—diverse, high-quality interaction data at an unprecedented scale. - A Robotics Foundation Model: A universal robotic brain trained on our pure simulation and synthetic data, capable of robust cross-embodiment transfer and open-world adaptability. - Evolvable Robot Hardware: Robots deployed with Axis models that autonomously evolve through continuous interaction, turning every deployment into a self-improving node within our RGI network. The Ultimate Vision We are building more than models; we are architecting the Distributed Machine Economy. A future where every dataset, model, and robotic embodiment is a verifiable asset in a global, autonomous network. Thank you for building the future of intelligence with us✌️📷show more

Axis Robotics

27,858 просмотров • 5 месяцев назад

Depth Any Video with Scalable Synthetic Data AI physicists... and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.show more

MrNeRF

27,428 просмотров • 1 год назад

we are officially working on opentui - a library... show more

dax

286,895 просмотров • 10 месяцев назад

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely... on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. Paper Title: MindDrive: A Vision-Language-Action Model for Autonomous Driving via Project: Link:show more

AI Bites | YouTube Channel

43,451 просмотров • 4 месяцев назад

An update for everyone who is not in our... Discord... Obviously we are still early in development, and our team has been working hard to release updates for everyone to see. We have a peak at the terrain, which is this picture is based of the North Dakota drift prairie, Great care, time and effort was put into the terrain to create smooth rolling hills, the road network in this region will be challenging, just as it is in real life. From Will The current state of the TIV2 model, which will include a lot more in depth details, From Jlkillen03 The stages of damage to a Blender made house, From Kunahic And how texturing will be used for assets like the interiors! From BSPshow more

Severity

18,883 просмотров • 3 месяцев назад

We are already at war. Not with rifles or... tanks, but with replacement. This is conquest by other means, through the slow erasure of a people who no longer recognize they are being conquered. That is why I write—to remind my people that we are not living in peace, but in the midst of a war waged without banners. The invasion is not declared with armies but with flights and boats, birthrates and welfare rolls. It is demographic warfare, calculated, continuous, and increasingly irreversible. A people, and a civilization, does not need to be burned to the ground to fall. It only needs to be replaced. Throughout the Western world, we are witnessing not mere immigration but a deliberate population transformation, one that has been rationalized by moral cowardice and enforced by political elites who have long since abandoned the idea that their nations belong to their people. What you mock as conquest is already underway, and unlike the conquests of old, it comes with the full consent of those in power. But I do not write in surrender. I write as a warning, as an act of resistance. My writing is meant to exhort and to enliven, to reawaken what has been buried beneath shame and silence. It is a summons to remember, to reclaim, and to rebuild. We are in an existential struggle, not only for our land, but for our survival, and thus for the future itself. Those who sneer at the loss will one day find there is nothing left to sneer at. A people who forget that they exist will be replaced by those who do not. You may call this natural. So be it. Then let nature return, red in tooth and claw, and let the sons of Europe remember who they are.show more

Chad Crowley

37,093 просмотров • 1 год назад

Live Cam