Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Building on the previous paper, in this study we compare a continuous “smooth return” S2>S1 model with an event-driven one, where long periods of relative calm are punctuated by short, intense episodes of global reorganisation. Both models cover the same time window. Neither uses archaeological data in its construction....

10,899 görüntüleme • 4 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Animesh Garg

52,279 görüntüleme • 2 yıl önce

Depth Any Video with Scalable Synthetic Data AI physicists and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

MrNeRF

27,428 görüntüleme • 1 yıl önce

A viral paper "Language Model Represents Space and Time" recently claims that LLMs learn "world models". As much as I like Max Tegmark's works, I disagree with their definition of world model. World model is a core concept in AI agent and decision making. It is our mental simulation of how the world works given interventions (or lack thereof). A world model captures causality and intuitive physics, telling the agent what is likely and what is impossible. It can and should be used for counterfactual reasoning, i.e. "what ifs": what would happen if I knock over a cup of water? Where would I have been if I had not taken that bus? Yann LeCun Yann LeCun says it well in his position paper ( I quote: "Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems. Importantly, they can also avoid making dangerous mistakes when facing an unknown situation." The first use of the term World Model in deep policy learning is attributed to hardmaru & Jürgen Schmidhuber: In their seminal paper, an agent masters shooting skills in the popular game Doom (demo below) by learning in imagination, using an internal world model as a "physics simulator". To put in a simple Python math formula, world model learns a function F(s[0:t-1], a) -> s[t:], which takes as input the observed past and current action, and outputs plausible future states. Now the definition of World Model in Tegmark's paper seems to be about predicting GPS coordinates and time eras. I see this as just a classification task with no causal learning and simulation going on. You cannot make meaningful interventions against that model, nor can you optimize any decision making in a closed feedback loop. As for the "space & time neurons", I think they are most similar to the "sentiment neuron" that OpenAI published in 2017: Predicting GPS is conceptually no different from predicting sentiment in my opinion. I don't think their experimental results are wrong - just that their conclusion is on shaky grounds. I welcome any debate! Paper link:

Jim Fan

593,909 görüntüleme • 2 yıl önce