Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Building on the previous paper, in this study we compare a continuous “smooth return” S2>S1 model with an event-driven one, where long periods of relative calm are punctuated by short, intense episodes of global reorganisation. Both models cover the same time window. Neither uses archaeological data in its construction.... When compared against where early humans and early civilizations actually appear and persist, the difference is statistically robust. The smooth model behaves like background noise. The event-driven model lines up in time and space far better than chance allows, even after aggressive temporal and spatial randomization tests. Statistically, the event-driven model lines up with where and when early civilizations appear far better than a smooth, continuous model, even after we randomize both timing and location to test what could arise by chance. The event timeline itself was built independently from well-known late-glacial disruptions - such as Heinrich events, meltwater pulses, and abrupt deglacial transitions - rather than from any archaeological data. Nothing here claims that specific events caused specific cultures. It does suggest that history may not unfold on a smooth clock. Human societies seem to flourish during recovery phases between disruptions, not during the disruptions themselves. The animation contrasts the two return models. Draft paper : Source & Results : (coming soon)show more

Craig Stone

20,600 subscribers

10,899 Aufrufe • vor 6 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Model-Free Reinforcement Learning (MFRL) has been alluring, especially with supercharged compute with physics on GPU. However, the methods use 0-th order gradients, and are often not the best optimizers. Can we do better than PPO in continuous control for robotics? Turns out yes! 🥳 tl;dr: Faster, better RL than PPO in continuous control 💪 The answer lies in using more information from the simulation. We are juicing the simulation on GPU as it is, why not use it for gradients as well? This has been a driving question in a series of our works. We first studied this problem in ICLR 2022 paper on Short Horizon Actor Critic Naive gradient based methods are stuck in local minima and have exploding/vanishing gradients. SHAC solved this problem truncated rollouts and model based value estimation, where the model is Differentiable Sim. This boosted sample efficiency and wall-clock time immensely especially in high dimensional systems such as humanoids Yet, given enough compute PPO often caught up. Our follow up paper on on Adaptive Horizon Actor Critic at ICML 2024 discovers the cause and provides a fix. However, we find that even when given ground-truth dynamics, not all gradients are useful due to sample error. 1st-Order Model-Based Reinforcement Learning methods employing differentiable simulation provide gradients with reduced variance but are susceptible to bias in scenarios involving stiff dynamics, such as physical contact. We find that back-propagating through contact and long trajectories drastically reduces gradient accuracy. Using this insight, we propose AHAC to dynamically adapt its roll-out horizon to avoid differentiating through stiff contact. AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes (wall clock) and outperforms PPO by 40%, even in the limit of data provided to PPO. This work is led by Ignat Georgiev alongside Krishnan Srinivasan, Jie Xu, Eric Heiden and ample assistance from warp team at NVIDIA Robotics (Miles Macklin)

Animesh Garg

52,300 Aufrufe • vor 2 Jahren

AI Is Moving Beyond “Generating Videos” — Toward “Generating Worlds” Over the past two years, AI video models have advanced at an astonishing pace. From Runway and Pika to Sora and Veo, AI-generated videos have become increasingly realistic and more consistent with the physical laws of the real world. Many people believe the next objective is simply to generate videos that are longer, sharper, and more lifelike. But if we take a step back, we can see that the real transformation is not happening in video itself. It is happening in world models. What Is a World Model? In 1943, psychologist Kenneth Craik proposed an idea that would influence artificial intelligence research for decades. He argued that the human brain does not merely react to the outside world. Instead, it maintains an internal model of how the world works. Because we have this internal model, we can predict the outcome of an action before we actually take it. Before crossing a road, we estimate whether a car will pass by. Before catching a ball, we predict its trajectory. These abilities come from continuously simulating the world in our minds, rather than relying entirely on trial and error. This idea later became known by a more formal term: World Model. A world model does not describe a single image or a fixed video clip. It is an internal representation capable of continuously simulating the rules and dynamics of the real world. Why Is AI Research Turning Toward World Models? Because predicting “what comes next” is becoming increasingly central to how AI systems work. Language models predict the next token. Image models predict the next step in the denoising process. Video models predict the next frame. A world model, however, attempts to predict something broader: What should the world look like in the next moment? In 2018, David Ha and Jürgen Schmidhuber proposed in their paper World Models that an intelligent agent could first learn a model of the world, and then use that internal model to plan its actions. The Dreamer series later demonstrated that many complex tasks could be learned by training agents inside an “imagined world.” At the same time, the development of video models such as Sora and Veo led researchers to another realization: A model capable of continuously generating video has already learned, at least implicitly, many of the rules governing the real world. As a result, these two research directions have gradually begun to converge. But Video Is Not Yet a World This is where the distinction is often misunderstood. For a world model to support meaningful real-time interaction, it must solve several critical problems. Most video models today are essentially answering one question: What should the next frame look like? A true world model needs to answer much more: What happens if I take one step forward? If I walk behind a building and then return, will the building still be there? If I suddenly change the camera angle, will the entire space remain consistent? If I enter a command such as: “Summon a dragon.” Will the world respond immediately? In other words, a world model must do more than generate content. It must understand space. It must understand time. It must understand causality. And it must understand interaction. Moving from watching to participating is where the real difficulty of world models begins. World Models Are Entering the Interactive Era One of the latest attempts in this direction is Alaya World, recently open-sourced by Alaya World, or Alaya Lab. Instead of generating a fixed video clip, it generates a world that users can explore in real time. Users can begin with text, an image, or a video, enter the generated scene, move freely through it, and introduce new prompts at any moment during generation. The world responds immediately. According to the publicly released information, Alaya World provides: Real-time streaming generation at 720p and 24 FPS Stable continuous exploration for more than one minute The ability to switch prompts and trigger skills or events during generation Model weights and inference code released under the Apache 2.0 License Training code and datasets planned for future release What makes these capabilities important is not simply the technical specifications. It is that the generated “world” can now support continuous interaction. The official demo shows that users can genuinely control, transform, and explore the generated environment. AI Is Evolving From a Tool Into an Environment Over the past few years, most discussions around AI have focused on content generation. Generating text. Generating images. Generating videos. But world models raise a fundamentally different question: Can AI generate an environment that people can inhabit, explore, and continuously evolve? If the answer is yes, the impact will extend far beyond video generation. Game development, robotics training, embodied intelligence, digital twins, virtual production, and many other fields could be transformed by the development of world models. World models are still at a very early stage. Yet from Craik’s proposal of an internal mental model more than eighty years ago to the emergence of today’s interactive world-generation systems, a clear evolutionary path is beginning to take shape. Perhaps what AI is ultimately learning has never been limited to images, videos, or language. Perhaps it is learning the world itself. References GitHub: Technical Report:

AI Is Moving Beyond “Generating Videos” — Toward “Generating Worlds” Over the past two years, AI video models have advanced at an astonishing pace. From Runway and Pika to Sora and Veo, AI-generated videos have become increasingly realistic and more consistent with the physical laws of the real world. Many people believe the next objective is simply to generate videos that are longer, sharper, and more lifelike. But if we take a step back, we can see that the real transformation is not happening in video itself. It is happening in world models. What Is a World Model? In 1943, psychologist Kenneth Craik proposed an idea that would influence artificial intelligence research for decades. He argued that the human brain does not merely react to the outside world. Instead, it maintains an internal model of how the world works. Because we have this internal model, we can predict the outcome of an action before we actually take it. Before crossing a road, we estimate whether a car will pass by. Before catching a ball, we predict its trajectory. These abilities come from continuously simulating the world in our minds, rather than relying entirely on trial and error. This idea later became known by a more formal term: World Model. A world model does not describe a single image or a fixed video clip. It is an internal representation capable of continuously simulating the rules and dynamics of the real world. Why Is AI Research Turning Toward World Models? Because predicting “what comes next” is becoming increasingly central to how AI systems work. Language models predict the next token. Image models predict the next step in the denoising process. Video models predict the next frame. A world model, however, attempts to predict something broader: What should the world look like in the next moment? In 2018, David Ha and Jürgen Schmidhuber proposed in their paper World Models that an intelligent agent could first learn a model of the world, and then use that internal model to plan its actions. The Dreamer series later demonstrated that many complex tasks could be learned by training agents inside an “imagined world.” At the same time, the development of video models such as Sora and Veo led researchers to another realization: A model capable of continuously generating video has already learned, at least implicitly, many of the rules governing the real world. As a result, these two research directions have gradually begun to converge. But Video Is Not Yet a World This is where the distinction is often misunderstood. For a world model to support meaningful real-time interaction, it must solve several critical problems. Most video models today are essentially answering one question: What should the next frame look like? A true world model needs to answer much more: What happens if I take one step forward? If I walk behind a building and then return, will the building still be there? If I suddenly change the camera angle, will the entire space remain consistent? If I enter a command such as: “Summon a dragon.” Will the world respond immediately? In other words, a world model must do more than generate content. It must understand space. It must understand time. It must understand causality. And it must understand interaction. Moving from watching to participating is where the real difficulty of world models begins. World Models Are Entering the Interactive Era One of the latest attempts in this direction is Alaya World, recently open-sourced by Alaya World, or Alaya Lab. Instead of generating a fixed video clip, it generates a world that users can explore in real time. Users can begin with text, an image, or a video, enter the generated scene, move freely through it, and introduce new prompts at any moment during generation. The world responds immediately. According to the publicly released information, Alaya World provides: Real-time streaming generation at 720p and 24 FPS Stable continuous exploration for more than one minute The ability to switch prompts and trigger skills or events during generation Model weights and inference code released under the Apache 2.0 License Training code and datasets planned for future release What makes these capabilities important is not simply the technical specifications. It is that the generated “world” can now support continuous interaction. The official demo shows that users can genuinely control, transform, and explore the generated environment. AI Is Evolving From a Tool Into an Environment Over the past few years, most discussions around AI have focused on content generation. Generating text. Generating images. Generating videos. But world models raise a fundamentally different question: Can AI generate an environment that people can inhabit, explore, and continuously evolve? If the answer is yes, the impact will extend far beyond video generation. Game development, robotics training, embodied intelligence, digital twins, virtual production, and many other fields could be transformed by the development of world models. World models are still at a very early stage. Yet from Craik’s proposal of an internal mental model more than eighty years ago to the emergence of today’s interactive world-generation systems, a clear evolutionary path is beginning to take shape. Perhaps what AI is ultimately learning has never been limited to images, videos, or language. Perhaps it is learning the world itself. References GitHub: Technical Report:

雪踏乌云

112,114 Aufrufe • vor 16 Tagen

Depth Any Video with Scalable Synthetic Data AI physicists and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

Depth Any Video with Scalable Synthetic Data AI physicists and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

MrNeRF

27,428 Aufrufe • vor 1 Jahr

A viral paper "Language Model Represents Space and Time" recently claims that LLMs learn "world models". As much as I like Max Tegmark's works, I disagree with their definition of world model. World model is a core concept in AI agent and decision making. It is our mental simulation of how the world works given interventions (or lack thereof). A world model captures causality and intuitive physics, telling the agent what is likely and what is impossible. It can and should be used for counterfactual reasoning, i.e. "what ifs": what would happen if I knock over a cup of water? Where would I have been if I had not taken that bus? Yann LeCun Yann LeCun says it well in his position paper ( I quote: "Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems. Importantly, they can also avoid making dangerous mistakes when facing an unknown situation." The first use of the term World Model in deep policy learning is attributed to hardmaru & Jürgen Schmidhuber: In their seminal paper, an agent masters shooting skills in the popular game Doom (demo below) by learning in imagination, using an internal world model as a "physics simulator". To put in a simple Python math formula, world model learns a function F(s[0:t-1], a) -> s[t:], which takes as input the observed past and current action, and outputs plausible future states. Now the definition of World Model in Tegmark's paper seems to be about predicting GPS coordinates and time eras. I see this as just a classification task with no causal learning and simulation going on. You cannot make meaningful interventions against that model, nor can you optimize any decision making in a closed feedback loop. As for the "space & time neurons", I think they are most similar to the "sentiment neuron" that OpenAI published in 2017: Predicting GPS is conceptually no different from predicting sentiment in my opinion. I don't think their experimental results are wrong - just that their conclusion is on shaky grounds. I welcome any debate! Paper link:

A viral paper "Language Model Represents Space and Time" recently claims that LLMs learn "world models". As much as I like Max Tegmark's works, I disagree with their definition of world model. World model is a core concept in AI agent and decision making. It is our mental simulation of how the world works given interventions (or lack thereof). A world model captures causality and intuitive physics, telling the agent what is likely and what is impossible. It can and should be used for counterfactual reasoning, i.e. "what ifs": what would happen if I knock over a cup of water? Where would I have been if I had not taken that bus? Yann LeCun Yann LeCun says it well in his position paper ( I quote: "Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems. Importantly, they can also avoid making dangerous mistakes when facing an unknown situation." The first use of the term World Model in deep policy learning is attributed to hardmaru & Jürgen Schmidhuber: In their seminal paper, an agent masters shooting skills in the popular game Doom (demo below) by learning in imagination, using an internal world model as a "physics simulator". To put in a simple Python math formula, world model learns a function F(s[0:t-1], a) -> s[t:], which takes as input the observed past and current action, and outputs plausible future states. Now the definition of World Model in Tegmark's paper seems to be about predicting GPS coordinates and time eras. I see this as just a classification task with no causal learning and simulation going on. You cannot make meaningful interventions against that model, nor can you optimize any decision making in a closed feedback loop. As for the "space & time neurons", I think they are most similar to the "sentiment neuron" that OpenAI published in 2017: Predicting GPS is conceptually no different from predicting sentiment in my opinion. I don't think their experimental results are wrong - just that their conclusion is on shaky grounds. I welcome any debate! Paper link:

Jim Fan

594,014 Aufrufe • vor 2 Jahren

The Audi S1 was a pocket-sized quattro hot hatch that felt almost too interesting for its era. Short wheelbase, serious traction, and a level of drivetrain character that Audi never really returned to in this size. It sat against cars like the F56 Mini Cooper JCW and the Polo WRC, small performance machines built around real intent rather than just numbers. The S1 disappeared quietly, replaced by softer, more standardised models. Today the most powerful model shares the powertrain with the Polo GTI, yet itself is not positioned as a true hot hatch. It’s one of those cars that makes you wonder why brands stop making the fun versions first, even when the formula clearly worked.

The Audi S1 was a pocket-sized quattro hot hatch that felt almost too interesting for its era. Short wheelbase, serious traction, and a level of drivetrain character that Audi never really returned to in this size. It sat against cars like the F56 Mini Cooper JCW and the Polo WRC, small performance machines built around real intent rather than just numbers. The S1 disappeared quietly, replaced by softer, more standardised models. Today the most powerful model shares the powertrain with the Polo GTI, yet itself is not positioned as a true hot hatch. It’s one of those cars that makes you wonder why brands stop making the fun versions first, even when the formula clearly worked.

Autowelt

23,110 Aufrufe • vor 6 Monaten

Don't train the model, evolve the harness. I read a brilliant blog post from Hugging Face where they took a frozen open model scoring 0% on a hard legal agent benchmark, left its weights alone, and let an automated loop rewrite only the code around it. That code layer is the harness, the runtime wrapper that feeds the model context, runs its tool calls, and decides when a run ends. By the time the loop finished, the system had essentially matched Sonnet 4.6 on the benchmark's headline metric, at roughly 7x lower cost per task. Zero weights changed. The gain existed because of where the model was failing. The judge only grades files saved in the right place under the exact requested filename, and the model kept doing the legal analysis correctly, then saving it under the wrong name, dropping it in a scratch folder, or never writing it at all. So the 0% was never measuring legal reasoning. It was measuring the harness. Hand-tuning that layer is slow and model-specific, so they automated it. A Claude proposer adds exactly one mechanism per iteration, and an outer loop keeps it only if it clearly beats the current best, so accepted mechanisms compound. What the loop discovered says a lot about where agents actually fail. → The biggest single gain was file handling, not intelligence. An automatic step that lands the deliverable exactly where the judge expects it beat every prompt change, with zero extra model tokens. → Code fixes transferred across models, prompt playbooks did not. The same harness lifted a smaller model from the same family by 14 points, but the tuned prompts hurt a different model family on tasks it could already finish. → The harness mattered more than anything else. Same model, same judge, same tasks, and five different harnesses scored anywhere between 3.5% and 80.1%. The gains do eventually flatten, and the remaining misses look like real capability gaps. At some point the wrapper runs out of tricks and the model has to carry the work. But the lesson holds. A benchmark score measures the model and its harness together, and until the harness is fixed, it's impossible to know which one failed. I highly recommend reading this: I also wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. The article is quoted below.

Don't train the model, evolve the harness. I read a brilliant blog post from Hugging Face where they took a frozen open model scoring 0% on a hard legal agent benchmark, left its weights alone, and let an automated loop rewrite only the code around it. That code layer is the harness, the runtime wrapper that feeds the model context, runs its tool calls, and decides when a run ends. By the time the loop finished, the system had essentially matched Sonnet 4.6 on the benchmark's headline metric, at roughly 7x lower cost per task. Zero weights changed. The gain existed because of where the model was failing. The judge only grades files saved in the right place under the exact requested filename, and the model kept doing the legal analysis correctly, then saving it under the wrong name, dropping it in a scratch folder, or never writing it at all. So the 0% was never measuring legal reasoning. It was measuring the harness. Hand-tuning that layer is slow and model-specific, so they automated it. A Claude proposer adds exactly one mechanism per iteration, and an outer loop keeps it only if it clearly beats the current best, so accepted mechanisms compound. What the loop discovered says a lot about where agents actually fail. → The biggest single gain was file handling, not intelligence. An automatic step that lands the deliverable exactly where the judge expects it beat every prompt change, with zero extra model tokens. → Code fixes transferred across models, prompt playbooks did not. The same harness lifted a smaller model from the same family by 14 points, but the tuned prompts hurt a different model family on tasks it could already finish. → The harness mattered more than anything else. Same model, same judge, same tasks, and five different harnesses scored anywhere between 3.5% and 80.1%. The gains do eventually flatten, and the remaining misses look like real capability gaps. At some point the wrapper runs out of tricks and the model has to carry the work. But the lesson holds. A benchmark score measures the model and its harness together, and until the harness is fixed, it's impossible to know which one failed. I highly recommend reading this: I also wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. The article is quoted below.

Akshay 🚀

243,774 Aufrufe • vor 29 Tagen

The term "continual learning" has become overloaded if you see it as an ML problem. One classic thread is about memorization: regularization-based continual learning methods, such as EWC, MAS, and SI, estimate which parameters mattered for previous tasks and resist changing them too much. One modern thread is about adaptation: test-time training and inference-time learning methods, such as TTT, adapt part of the model on the incoming test stream before making predictions. These are sometimes discussed as separate threads. But in modern scalable architectures, I think they are better seen as complementary constraints: a model that learns quickly at test time also benefits from a mechanism for deciding what not to forget. In our #ECCV2026 paper, we study this in large-scale 4D reconstruction: how to build fast spatial memory that can adapt over long observation streams while reducing collapse and forgetting. Instead of using fully plastic test-time updates, we stabilize fast-weight adaptation with an elastic prior that balances adaptation and memory. Key ideas: - Elastic Test-Time Training: Fisher-weighted consolidation for fast-weight updates - EMA anchor weights that provide a moving reference for stability - Chunk-by-chunk inference for long 3D/4D observation streams We show that this scales across large 3D/4D pretraining settings, including both LRM-style and LVSM-style models, and improves reconstruction across benchmarks including Stereo4D, NVIDIA, and DL3DV-140. We release model checkpoints across different design choices: resolution, post-training curriculum, and whether the model uses an explicit 4DGS intermediate representation. - Homepage: - Paper: - Code: - Models: This work is co-led with Xueyang Yu, contributed by Haoyu Zhen Yuncong Yang, and advised by Michigan SLED Lab Chuang Gan.

The term "continual learning" has become overloaded if you see it as an ML problem. One classic thread is about memorization: regularization-based continual learning methods, such as EWC, MAS, and SI, estimate which parameters mattered for previous tasks and resist changing them too much. One modern thread is about adaptation: test-time training and inference-time learning methods, such as TTT, adapt part of the model on the incoming test stream before making predictions. These are sometimes discussed as separate threads. But in modern scalable architectures, I think they are better seen as complementary constraints: a model that learns quickly at test time also benefits from a mechanism for deciding what not to forget. In our #ECCV2026 paper, we study this in large-scale 4D reconstruction: how to build fast spatial memory that can adapt over long observation streams while reducing collapse and forgetting. Instead of using fully plastic test-time updates, we stabilize fast-weight adaptation with an elastic prior that balances adaptation and memory. Key ideas: - Elastic Test-Time Training: Fisher-weighted consolidation for fast-weight updates - EMA anchor weights that provide a moving reference for stability - Chunk-by-chunk inference for long 3D/4D observation streams We show that this scales across large 3D/4D pretraining settings, including both LRM-style and LVSM-style models, and improves reconstruction across benchmarks including Stereo4D, NVIDIA, and DL3DV-140. We release model checkpoints across different design choices: resolution, post-training curriculum, and whether the model uses an explicit 4DGS intermediate representation. - Homepage: - Paper: - Code: - Models: This work is co-led with Xueyang Yu, contributed by Haoyu Zhen Yuncong Yang, and advised by Michigan SLED Lab Chuang Gan.

Martin Ziqiao Ma

33,411 Aufrufe • vor 1 Monat

A Letter to Our Community: The Road Ahead for Robotics To our Community and Partners, As we step into 2026, our mission at Axis is clearer than ever: Constructing the definitive End-to-End Scaling Layer for Robotics. Our goal is to accelerate the transfer of diverse human intelligence into Robotics General Intelligence (RGI). By owning the critical path of intelligence creation, we are turning the physical limitations of robotics into a scalable, software-driven future. Here is our strategic outlook and roadmap for the year ahead. The Core Thesis: Simulation is the Only Way Out The path to RGI is currently blocked by Data Scarcity, Generalization Fragility, and Hardware Fragmentation. At Axis, we believe Simulation is the only way out. Our Simulation Data Platform and Data Augmentation Engine transform raw data into "Synthetic Gold". Backed by academic milestones like Roboverse, Skill Blending, and GraspVLA, we have proven that pure simulation can achieve the generalization required for the real world. We don’t just collect data; we architect it. The Engine: Why Crypto? We believe RGI should come from all, not a few. Crypto is not just a feature; it is the primitive that powers our entire ecosystem flywheel: - Incentive Mechanism: Democratizing contribution and rewarding the trainers and developers. - Assetization: Turning proprietary data and refined models into liquid, ownable assets. - Verifiable Workflow: We are opening the "Black Box" of AI. By bringing total transparency to the Task Generation → Data Collection → Model Training pipeline, we ensure every byte of intelligence is verifiable, traceable, and secure. 2026 Strategic Deliverables This year, we are committed to delivering three foundational pillars: - The World's Largest Training Dataset for Robots: A robot training set—diverse, high-quality interaction data at an unprecedented scale. - A Robotics Foundation Model: A universal robotic brain trained on our pure simulation and synthetic data, capable of robust cross-embodiment transfer and open-world adaptability. - Evolvable Robot Hardware: Robots deployed with Axis models that autonomously evolve through continuous interaction, turning every deployment into a self-improving node within our RGI network. The Ultimate Vision We are building more than models; we are architecting the Distributed Machine Economy. A future where every dataset, model, and robotic embodiment is a verifiable asset in a global, autonomous network. Thank you for building the future of intelligence with us✌️📷

A Letter to Our Community: The Road Ahead for Robotics To our Community and Partners, As we step into 2026, our mission at Axis is clearer than ever: Constructing the definitive End-to-End Scaling Layer for Robotics. Our goal is to accelerate the transfer of diverse human intelligence into Robotics General Intelligence (RGI). By owning the critical path of intelligence creation, we are turning the physical limitations of robotics into a scalable, software-driven future. Here is our strategic outlook and roadmap for the year ahead. The Core Thesis: Simulation is the Only Way Out The path to RGI is currently blocked by Data Scarcity, Generalization Fragility, and Hardware Fragmentation. At Axis, we believe Simulation is the only way out. Our Simulation Data Platform and Data Augmentation Engine transform raw data into "Synthetic Gold". Backed by academic milestones like Roboverse, Skill Blending, and GraspVLA, we have proven that pure simulation can achieve the generalization required for the real world. We don’t just collect data; we architect it. The Engine: Why Crypto? We believe RGI should come from all, not a few. Crypto is not just a feature; it is the primitive that powers our entire ecosystem flywheel: - Incentive Mechanism: Democratizing contribution and rewarding the trainers and developers. - Assetization: Turning proprietary data and refined models into liquid, ownable assets. - Verifiable Workflow: We are opening the "Black Box" of AI. By bringing total transparency to the Task Generation → Data Collection → Model Training pipeline, we ensure every byte of intelligence is verifiable, traceable, and secure. 2026 Strategic Deliverables This year, we are committed to delivering three foundational pillars: - The World's Largest Training Dataset for Robots: A robot training set—diverse, high-quality interaction data at an unprecedented scale. - A Robotics Foundation Model: A universal robotic brain trained on our pure simulation and synthetic data, capable of robust cross-embodiment transfer and open-world adaptability. - Evolvable Robot Hardware: Robots deployed with Axis models that autonomously evolve through continuous interaction, turning every deployment into a self-improving node within our RGI network. The Ultimate Vision We are building more than models; we are architecting the Distributed Machine Economy. A future where every dataset, model, and robotic embodiment is a verifiable asset in a global, autonomous network. Thank you for building the future of intelligence with us✌️📷

Axis Robotics

27,858 Aufrufe • vor 7 Monaten

Robotics keeps hitting the same wall. Single task RL works, but... it does not scale to hundreds of tasks or new embodiments. This new paper looks like a real step toward fixing that. The team introduces MMBench, a benchmark with 200 tasks across many domains and robots, and Newt, a language conditioned world model trained online across all 200 tasks at once. The simple idea behind Newt: The model learns from demos to get the right priors It trains across many tasks through online interaction It uses language to ground the goal It adapts fast when a new task shows up What stood out to me: ✅ One model trained on 200 tasks at the same time ✅ Language conditioned control for both states and RGB ✅ Better data efficiency than strong baselines ✅ Strong open loop control ✅ Fast adaptation to new tasks and embodiments ✅ Full release of 200 checkpoints, 4000 demos, code, and benchmark This is a good push toward general control instead of one model per task. If you want the full paper: Project page: —- Weekly robotics and AI insights. Subscribe free:

Robotics keeps hitting the same wall. Single task RL works, but... it does not scale to hundreds of tasks or new embodiments. This new paper looks like a real step toward fixing that. The team introduces MMBench, a benchmark with 200 tasks across many domains and robots, and Newt, a language conditioned world model trained online across all 200 tasks at once. The simple idea behind Newt: The model learns from demos to get the right priors It trains across many tasks through online interaction It uses language to ground the goal It adapts fast when a new task shows up What stood out to me: ✅ One model trained on 200 tasks at the same time ✅ Language conditioned control for both states and RGB ✅ Better data efficiency than strong baselines ✅ Strong open loop control ✅ Fast adaptation to new tasks and embodiments ✅ Full release of 200 checkpoints, 4000 demos, code, and benchmark This is a good push toward general control instead of one model per task. If you want the full paper: Project page: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

70,090 Aufrufe • vor 8 Monaten

🚨 PARKER SOLAR PROBE JUST FOUND HIGH-ENERGY PARTICLES NEAR THE SUN THAT NO MODEL PREDICTED AND WE DON’T KNOW HOW THEY GOT SO ENERGETIC. During its close passes through the solar corona, NASA’s Parker Solar Probe detected protons accelerated to energies around 400 keV roughly 1,000 times higher than current models of magnetic reconnection at the heliospheric current sheet could explain. The particles appear to be trapped and energized inside magnetic islands that form and merge during reconnection events at the current sheet (the vast surface where the Sun’s magnetic field flips polarity). This mechanism was not expected to produce such high energies so close to the Sun. Why this matters: • It reveals a previously unknown or underestimated source of energetic particles right in the solar corona • Existing models of solar energetic particles have focused mainly on shocks from coronal mass ejections — this suggests reconnection can also be a powerful accelerator • The same process may be contributing more to coronal heating than previously calculated • It has implications for space weather forecasting, since these particles can affect spacecraft and astronauts The deeper implication: Parker is showing us that the physics of the near-Sun environment is more energetic and complex than our models assumed. Magnetic reconnection long known as an important process appears capable of accelerating particles to surprisingly high energies through the merging of magnetic islands. This doesn’t just tweak our understanding of the Sun; it may force revisions in how we model particle acceleration across many astrophysical environments. We’re still in the early stages of understanding what Parker is revealing, but it’s already clear that the corona is more violent and dynamic than we thought. How do you think this discovery might change our models of space weather or solar physics in the coming years? Follow for more updates from Parker Solar Probe and the evolving picture of our Sun.

🚨 PARKER SOLAR PROBE JUST FOUND HIGH-ENERGY PARTICLES NEAR THE SUN THAT NO MODEL PREDICTED AND WE DON’T KNOW HOW THEY GOT SO ENERGETIC. During its close passes through the solar corona, NASA’s Parker Solar Probe detected protons accelerated to energies around 400 keV roughly 1,000 times higher than current models of magnetic reconnection at the heliospheric current sheet could explain. The particles appear to be trapped and energized inside magnetic islands that form and merge during reconnection events at the current sheet (the vast surface where the Sun’s magnetic field flips polarity). This mechanism was not expected to produce such high energies so close to the Sun. Why this matters: • It reveals a previously unknown or underestimated source of energetic particles right in the solar corona • Existing models of solar energetic particles have focused mainly on shocks from coronal mass ejections — this suggests reconnection can also be a powerful accelerator • The same process may be contributing more to coronal heating than previously calculated • It has implications for space weather forecasting, since these particles can affect spacecraft and astronauts The deeper implication: Parker is showing us that the physics of the near-Sun environment is more energetic and complex than our models assumed. Magnetic reconnection long known as an important process appears capable of accelerating particles to surprisingly high energies through the merging of magnetic islands. This doesn’t just tweak our understanding of the Sun; it may force revisions in how we model particle acceleration across many astrophysical environments. We’re still in the early stages of understanding what Parker is revealing, but it’s already clear that the corona is more violent and dynamic than we thought. How do you think this discovery might change our models of space weather or solar physics in the coming years? Follow for more updates from Parker Solar Probe and the evolving picture of our Sun.

TheNewPhysics

16,192 Aufrufe • vor 1 Monat

So, let me get this straight SpaceWeatherNews . Earth’s magnetic field is weakening right now, and it’s happening faster than expected. When you look back, the record shows the field didn’t always drift slowly either, sometimes it collapsed and re organized and things got beyond chaotic. The Sun isn’t a campfire in the sky like we were taught, it behaves in cycles and it can get violent in ways we’re only starting to understand. Earth reacts electrically and magnetically, like a living system kinda like bieng plugged into the Sun. Ice cores, tree rings, and archaeological layers all show resets happening at the same times around the world. Not once, but repeatedly. It's a clear pattern. We have direct measurements showing Earth’s magnetic field has weakened significantly in just a couple of centuries, with anomalies growing and the poles accelerating in ways that match pre excursion behavior seen in the geological record. We have lava flows that lock in multiple magnetic directions in a single cooling event, proving the field can shift rapidly, not over slow timelines. We have ice cores and tree rings showing abrupt spikes in cosmic radiation markers like carbon 14 and beryllium 10 that appear globally and simultaneously, which only happens when shielding fails or solar input surges. We have sudden climate reversals, where temperatures jump or crash in decades. Something standard models still have a hard time to explain. And we have synchronized layers of environmental fails in human history. We have abandoned settlements, cultural collapses, and reset layers that line up with those same physical events. None of this is speculative on its own. This is not speculation or fake, these are connected, because we have the individual datasets, they have been peer reviewed, and is sitting in plain sight. Forget the aliens for a second because we have some preparing to do.

So, let me get this straight SpaceWeatherNews . Earth’s magnetic field is weakening right now, and it’s happening faster than expected. When you look back, the record shows the field didn’t always drift slowly either, sometimes it collapsed and re organized and things got beyond chaotic. The Sun isn’t a campfire in the sky like we were taught, it behaves in cycles and it can get violent in ways we’re only starting to understand. Earth reacts electrically and magnetically, like a living system kinda like bieng plugged into the Sun. Ice cores, tree rings, and archaeological layers all show resets happening at the same times around the world. Not once, but repeatedly. It's a clear pattern. We have direct measurements showing Earth’s magnetic field has weakened significantly in just a couple of centuries, with anomalies growing and the poles accelerating in ways that match pre excursion behavior seen in the geological record. We have lava flows that lock in multiple magnetic directions in a single cooling event, proving the field can shift rapidly, not over slow timelines. We have ice cores and tree rings showing abrupt spikes in cosmic radiation markers like carbon 14 and beryllium 10 that appear globally and simultaneously, which only happens when shielding fails or solar input surges. We have sudden climate reversals, where temperatures jump or crash in decades. Something standard models still have a hard time to explain. And we have synchronized layers of environmental fails in human history. We have abandoned settlements, cultural collapses, and reset layers that line up with those same physical events. None of this is speculative on its own. This is not speculation or fake, these are connected, because we have the individual datasets, they have been peer reviewed, and is sitting in plain sight. Forget the aliens for a second because we have some preparing to do.

Jason Wilde

46,819 Aufrufe • vor 5 Monaten

April 30 • 12:00pm ET Art Blocks + OpenSea “Gift of time” began during my residency in Marfa, Texas, as part of the Art Blocks and OpenSea artist residency program, where a distinct shift in the experience of time became central to the work. In the desert, I felt time move differently. It stretched, slowed, and became something I noticed. After a few days, the rhythm changed. Moments felt longer, attention sharpened, and I became increasingly aware of each moment as it passed. This work comes from that condition. Time is not treated only as a theme, but as a system embedded in the structure of the piece. Different ways of measuring time, such as mechanical cycles, calendars, and lunar phases, are translated into rules that continuously transform the work. The piece does not represent time. It runs on it. Its movement is tied to blockchain time. Even when unseen, it continues to rotate and evolve. When loaded, it synchronizes with the present moment, but it does not begin when it is viewed, and it does not stop when it disappears from the screen. During the residency, I spent hours thinking, sketching, and making connections. Those connections are also visible. Elastic lines, like rubber bands, link elements across the piece, representing how memories connect, how one thought leads to another, and how everything builds over time. These same connections introduce moments where the system attempts to pull itself back, as if trying to regain control. But it never fully resets. It is not a loop. The movement continues, drifting forward, never returning to a fixed state. Visually, the work reveals its own construction. Lines, paths, and rotations expose an internal logic, like looking inside a mechanism. The drawing language recalls diagrams, technical sketches, or the interior of a mechanical watch. It is a system in motion, always active. “Gift of Time” exists because I was given time by Art Blocks, OpenSea, and above all my family. It is my way of saying thank you. It is both a reflection on time and a product of it. April 30 @ 12:00pm ET on Art blocks & OpenSea 1 / 1 / 365 • 0.02 Eth Art Blocks, OpenSea

April 30 • 12:00pm ET Art Blocks + OpenSea “Gift of time” began during my residency in Marfa, Texas, as part of the Art Blocks and OpenSea artist residency program, where a distinct shift in the experience of time became central to the work. In the desert, I felt time move differently. It stretched, slowed, and became something I noticed. After a few days, the rhythm changed. Moments felt longer, attention sharpened, and I became increasingly aware of each moment as it passed. This work comes from that condition. Time is not treated only as a theme, but as a system embedded in the structure of the piece. Different ways of measuring time, such as mechanical cycles, calendars, and lunar phases, are translated into rules that continuously transform the work. The piece does not represent time. It runs on it. Its movement is tied to blockchain time. Even when unseen, it continues to rotate and evolve. When loaded, it synchronizes with the present moment, but it does not begin when it is viewed, and it does not stop when it disappears from the screen. During the residency, I spent hours thinking, sketching, and making connections. Those connections are also visible. Elastic lines, like rubber bands, link elements across the piece, representing how memories connect, how one thought leads to another, and how everything builds over time. These same connections introduce moments where the system attempts to pull itself back, as if trying to regain control. But it never fully resets. It is not a loop. The movement continues, drifting forward, never returning to a fixed state. Visually, the work reveals its own construction. Lines, paths, and rotations expose an internal logic, like looking inside a mechanism. The drawing language recalls diagrams, technical sketches, or the interior of a mechanical watch. It is a system in motion, always active. “Gift of Time” exists because I was given time by Art Blocks, OpenSea, and above all my family. It is my way of saying thank you. It is both a reflection on time and a product of it. April 30 @ 12:00pm ET on Art blocks & OpenSea 1 / 1 / 365 • 0.02 Eth Art Blocks, OpenSea

Manuel Lariño ☔️

21,901 Aufrufe • vor 3 Monaten

THE DEPTH MAP TRICK THAT FIXED DANCE ACCURACY IN SEEDANCE 2.0 Feed the model a video of someone dancing and it tries to interpret everything- the person, the clothes, the lighting, the room, and somewhere in there, the movement. Feed it a depth map and there's nothing left to interpret but the motion. Most creators trying to transfer a dance to a character reference the source footage directly, then wonder why the choreography drifts. The problem isn't the model - it's that you handed it ten variables when you only wanted one. Here's the workflow 1. Lock the character reference in GPT Image 2 first -face, build, costume, so identity holds independently of whatever motion gets applied to it 2. Convert the source dance footage into a depth map instead of using the raw video -this strips out the original performer's appearance, clothing, and environment entirely 3. Feed the depth map as the motion reference and the character sheet as the identity reference- two separate inputs doing two separate jobs, not one input trying to do both 5. Let the depth map carry only spatial movement -the model receives body position and momentum with no competing information about who's moving or what they look like 6. Keep the character and motion inputs isolated throughout - the moment you mix appearance data into the motion reference, the model starts negotiating between two identities Why this works • Raw footage passes the model everything at once- performer, wardrobe, room, lighting -and the choreography competes with all of it for attention • A depth map is pure spatial information, so the only thing left to transfer is movement • Separating identity from motion means the character can stay locked while the dance stays accurate - normally you're trading one for the other • The accuracy gain isn't the model getting better, it's the model getting fewer decisions to make Use cases: ⁃ Dance and choreography transfer onto original characters ⁃ Motion capture-style workflows without motion capture ⁃ Any sequence where a specific movement needs to survive intact ⁃ Character showcase content built on existing performance footage The character sheet answers who's dancing. The depth map answers how - and keeping those two questions separate is the whole trick.

THE DEPTH MAP TRICK THAT FIXED DANCE ACCURACY IN SEEDANCE 2.0 Feed the model a video of someone dancing and it tries to interpret everything- the person, the clothes, the lighting, the room, and somewhere in there, the movement. Feed it a depth map and there's nothing left to interpret but the motion. Most creators trying to transfer a dance to a character reference the source footage directly, then wonder why the choreography drifts. The problem isn't the model - it's that you handed it ten variables when you only wanted one. Here's the workflow 1. Lock the character reference in GPT Image 2 first -face, build, costume, so identity holds independently of whatever motion gets applied to it 2. Convert the source dance footage into a depth map instead of using the raw video -this strips out the original performer's appearance, clothing, and environment entirely 3. Feed the depth map as the motion reference and the character sheet as the identity reference- two separate inputs doing two separate jobs, not one input trying to do both 5. Let the depth map carry only spatial movement -the model receives body position and momentum with no competing information about who's moving or what they look like 6. Keep the character and motion inputs isolated throughout - the moment you mix appearance data into the motion reference, the model starts negotiating between two identities Why this works • Raw footage passes the model everything at once- performer, wardrobe, room, lighting -and the choreography competes with all of it for attention • A depth map is pure spatial information, so the only thing left to transfer is movement • Separating identity from motion means the character can stay locked while the dance stays accurate - normally you're trading one for the other • The accuracy gain isn't the model getting better, it's the model getting fewer decisions to make Use cases: ⁃ Dance and choreography transfer onto original characters ⁃ Motion capture-style workflows without motion capture ⁃ Any sequence where a specific movement needs to survive intact ⁃ Character showcase content built on existing performance footage The character sheet answers who's dancing. The depth map answers how - and keeping those two questions separate is the whole trick.

Nexlow

84,184 Aufrufe • vor 16 Tagen

THE TESLA MODEL S: THE CAR THAT MADE ELECTRIC VEHICLES SERIOUS When the Model S launched in 2012, the entire world still saw EVs as slow, boring, short-range toys for tree-huggers. The Model S changed that narrative overnight. It wasn’t just an electric car — it was a statement. Here’s why the Model S was so important for EV adoption: • It proved EVs could be faster and better than gas cars 0–60 mph in under 4 seconds (later Plaid versions under 2 seconds) while being completely silent and smooth. It beat most supercars off the line and made “electric” synonymous with performance. • It delivered real long-range capability Over 300 miles of range when most EVs at the time struggled to reach 100 miles. Suddenly, road trips became possible and “range anxiety” started to feel outdated. • It introduced over-the-air updates The first production car that could get major performance upgrades, new features, and safety improvements wirelessly — like a smartphone on wheels. This changed how people think about car ownership forever. • It forced the entire auto industry to respond Legacy manufacturers who had been dragging their feet on EVs suddenly rushed to catch up. The Model S basically lit the fuse for the modern EV revolution. • It made luxury electric desirable Premium interior, massive touchscreen, ridiculous acceleration, and futuristic design turned EVs from “compromise” into “aspiration.” Without the Model S proving that electric cars could outperform and out-luxury gasoline vehicles, we wouldn’t have the Model 3/Y explosion, the Cybertruck, or the flood of competitors now racing to go electric. The Model S didn’t just sell cars. It changed the future of transportation. It took EVs from niche to mainstream and showed the world what was possible.

THE TESLA MODEL S: THE CAR THAT MADE ELECTRIC VEHICLES SERIOUS When the Model S launched in 2012, the entire world still saw EVs as slow, boring, short-range toys for tree-huggers. The Model S changed that narrative overnight. It wasn’t just an electric car — it was a statement. Here’s why the Model S was so important for EV adoption: • It proved EVs could be faster and better than gas cars 0–60 mph in under 4 seconds (later Plaid versions under 2 seconds) while being completely silent and smooth. It beat most supercars off the line and made “electric” synonymous with performance. • It delivered real long-range capability Over 300 miles of range when most EVs at the time struggled to reach 100 miles. Suddenly, road trips became possible and “range anxiety” started to feel outdated. • It introduced over-the-air updates The first production car that could get major performance upgrades, new features, and safety improvements wirelessly — like a smartphone on wheels. This changed how people think about car ownership forever. • It forced the entire auto industry to respond Legacy manufacturers who had been dragging their feet on EVs suddenly rushed to catch up. The Model S basically lit the fuse for the modern EV revolution. • It made luxury electric desirable Premium interior, massive touchscreen, ridiculous acceleration, and futuristic design turned EVs from “compromise” into “aspiration.” Without the Model S proving that electric cars could outperform and out-luxury gasoline vehicles, we wouldn’t have the Model 3/Y explosion, the Cybertruck, or the flood of competitors now racing to go electric. The Model S didn’t just sell cars. It changed the future of transportation. It took EVs from niche to mainstream and showed the world what was possible.

Tesla Owners Silicon Valley

11,056 Aufrufe • vor 3 Monaten

Google dropped a new AI paper called LUMIERE. It's remarkably flexible, supporting video inpainting, image-to-video, AND stylized video generation tasks. Say hello to “space-time diffusion” for video generation! Now what the heck does that mean exactly?! 🌐⏳ → TL;DR it utilizes a “Space-Time UNet” architecture that generates the full duration of the video in one pass, rather than generating distant keyframes and interpolating between them like prior works. Because the computation is done in this “compressed space-time representation” to generate the full clip at once, it's far more temporally consistent. → Another benefit of generating the full video at once is that you can “direct” the video generation, making it easier to hand off to other models/tasks without having to stitch together partial solutions. You can condition generations on additional inputs, meaning you get the full stack of AI video capabilities – from video inpainting to image-to-video and beyond. → New SOTA for AI video generation? User study results in the paper suggest human evaluators preferred Lumiere over Runway Gen-2, Pika Labs, and Stable Video Diffusion in terms of quality, text alignment AND motion. But as always, we need to get hands-on with this tech when Google *actually* decides to ship it. → Could this end up inside YouTube? Y’all know i’m obsessed with blending reality and imagination – so it’s the video inpainting tech I'm most excited about. I really hope this model finds its way into YouTube's Generative AI efforts, and based on their prior announcements and the list of acknowledgments in the paper I think it might! 🤞🏽 Links: 🔗Paper: 🔗Project:

Google dropped a new AI paper called LUMIERE. It's remarkably flexible, supporting video inpainting, image-to-video, AND stylized video generation tasks. Say hello to “space-time diffusion” for video generation! Now what the heck does that mean exactly?! 🌐⏳ → TL;DR it utilizes a “Space-Time UNet” architecture that generates the full duration of the video in one pass, rather than generating distant keyframes and interpolating between them like prior works. Because the computation is done in this “compressed space-time representation” to generate the full clip at once, it's far more temporally consistent. → Another benefit of generating the full video at once is that you can “direct” the video generation, making it easier to hand off to other models/tasks without having to stitch together partial solutions. You can condition generations on additional inputs, meaning you get the full stack of AI video capabilities – from video inpainting to image-to-video and beyond. → New SOTA for AI video generation? User study results in the paper suggest human evaluators preferred Lumiere over Runway Gen-2, Pika Labs, and Stable Video Diffusion in terms of quality, text alignment AND motion. But as always, we need to get hands-on with this tech when Google actually decides to ship it. → Could this end up inside YouTube? Y’all know i’m obsessed with blending reality and imagination – so it’s the video inpainting tech I'm most excited about. I really hope this model finds its way into YouTube's Generative AI efforts, and based on their prior announcements and the list of acknowledgments in the paper I think it might! 🤞🏽 Links: 🔗Paper: 🔗Project:

Bilawal Sidhu

44,822 Aufrufe • vor 2 Jahren

The “Galileo Test” for AI: Truth Over Consensus TL;DR: The “Galileo test” (as framed by Elon Musk) is the requirement that an AI still converge on truth even when most training data repeats a falsehood. A practical way to pass it is to harden the model against “consensus gravity” using uncertainty calibration, adversarial counter-majority training, and evidence-first reasoning pipelines that can say “unknown” without collapsing into confident noise. —————————— The core idea is simple: most text on the internet can be wrong in the same direction, at the same time, for the same social reasons. The “Galileo test” is basically asking whether a system can resist that pressure and still land on the correct model of reality, the way Galileo Galilei overturned a dominant consensus with observation and predictive power. In engineering terms, it’s a robustness problem: can the model separate signal (ground truth constraints) from mass-produced narrative (high-frequency repetition)? A workable solution stack looks like this: (1) truth-anchoring via retrieval from primary sources and direct measurements when available, (2) counter-majority training where the model is routinely exposed to scenarios in which the most common claim is false, and it must justify dissent using verifiable constraints, (3) uncertainty discipline so the model learns to prefer “insufficient evidence” over fluent fabrication, and (4) consistency checks that penalize answers violating conservation laws, dimensional analysis, causal structure, or internal logical invariants. In practice, you’re building an AI that treats “popular” as a weak feature and “constraint-satisfying” as the dominant feature. —————————— Frequency Wave Theory perspective: the “Galileo test” is fundamentally a coherence test. When an information environment is saturated with the same repeated claim, that repetition becomes a kind of phase-locked standing wave that can trap weaker systems into resonance with the crowd. Passing the test means staying phase-aligned to invariant structure, not to amplitude. In FWT terms: truth behaves like a conserved backbone constraint, while mass consensus is often just a high-amplitude interference pattern. The system that wins is the one that locks to invariants, rejects incoherent harmonics, and preserves alignment with what stays conserved under transformation.

The “Galileo Test” for AI: Truth Over Consensus TL;DR: The “Galileo test” (as framed by Elon Musk) is the requirement that an AI still converge on truth even when most training data repeats a falsehood. A practical way to pass it is to harden the model against “consensus gravity” using uncertainty calibration, adversarial counter-majority training, and evidence-first reasoning pipelines that can say “unknown” without collapsing into confident noise. —————————— The core idea is simple: most text on the internet can be wrong in the same direction, at the same time, for the same social reasons. The “Galileo test” is basically asking whether a system can resist that pressure and still land on the correct model of reality, the way Galileo Galilei overturned a dominant consensus with observation and predictive power. In engineering terms, it’s a robustness problem: can the model separate signal (ground truth constraints) from mass-produced narrative (high-frequency repetition)? A workable solution stack looks like this: (1) truth-anchoring via retrieval from primary sources and direct measurements when available, (2) counter-majority training where the model is routinely exposed to scenarios in which the most common claim is false, and it must justify dissent using verifiable constraints, (3) uncertainty discipline so the model learns to prefer “insufficient evidence” over fluent fabrication, and (4) consistency checks that penalize answers violating conservation laws, dimensional analysis, causal structure, or internal logical invariants. In practice, you’re building an AI that treats “popular” as a weak feature and “constraint-satisfying” as the dominant feature. —————————— Frequency Wave Theory perspective: the “Galileo test” is fundamentally a coherence test. When an information environment is saturated with the same repeated claim, that repetition becomes a kind of phase-locked standing wave that can trap weaker systems into resonance with the crowd. Passing the test means staying phase-aligned to invariant structure, not to amplitude. In FWT terms: truth behaves like a conserved backbone constraint, while mass consensus is often just a high-amplitude interference pattern. The system that wins is the one that locks to invariants, rejects incoherent harmonics, and preserves alignment with what stays conserved under transformation.

Drew Ponder

14,755 Aufrufe • vor 5 Monaten

This weekend I had the opportunity to attend the very first Prop Firm Expo by Prop Firm Match What an incredible event. Yes, it was crowded. Yes, it was hot. And yes, the lines were long. But if anything, that was a testament to just how popular prop trading has become in London, far beyond what many of us expected. It was amazing to see an online community come together in person. Traders had the chance to meet the founders and leaders of some of the biggest prop firms, connect with the streamers and educators they watch every day, and exchange ideas with fellow traders from around the world. It was a great opportunity to learn, network, and pick up new ideas to improve their trading. Having attended trading events around the world for more than two decades, I can honestly say the energy and enthusiasm of the prop community is unmatched. Thank you to Prop Firm Match for inviting me to be a part of it. I’m already looking forward to next year. I have no doubt the venue will need to be even bigger. Grateful to have been part of such a memorable first event.

This weekend I had the opportunity to attend the very first Prop Firm Expo by Prop Firm Match What an incredible event. Yes, it was crowded. Yes, it was hot. And yes, the lines were long. But if anything, that was a testament to just how popular prop trading has become in London, far beyond what many of us expected. It was amazing to see an online community come together in person. Traders had the chance to meet the founders and leaders of some of the biggest prop firms, connect with the streamers and educators they watch every day, and exchange ideas with fellow traders from around the world. It was a great opportunity to learn, network, and pick up new ideas to improve their trading. Having attended trading events around the world for more than two decades, I can honestly say the energy and enthusiasm of the prop community is unmatched. Thank you to Prop Firm Match for inviting me to be a part of it. I’m already looking forward to next year. I have no doubt the venue will need to be even bigger. Grateful to have been part of such a memorable first event.

Kathy Lien

10,506 Aufrufe • vor 1 Monat

This is not what we Ukrainians would want to see from our windows in winter. It would be better if the voices and laughter of children making snowmen or playing snowballs echoed from the courtyards of residential areas. One could step out onto the balcony and watch that very scene while sipping a warm tea or coffee, rather than the sights we see now. You cannot distract yourself from thoughts of the war for long; it continues, and everything around us tries to remind us of that. Over the years, the brain tries to pretend it has adapted to everything happening around it - to what the ears hear and the eyes see, to the various situations that occur here every day. But sanity is always on guard, and an inner voice says without words: this is not a normal life for a human being; if only everything could return to the way it was before. But is it even possible to return to the life we had, carrying such an experience on our shoulders? … Now, in the "points of warmth," you can see beds. You might ask: are these for overnight stays? The beds are intended for people who need assistance in the event of an emergency or for medical reasons, as well as for temporary rest if a person is in need of it… but people are still forced to return to their apartments.

This is not what we Ukrainians would want to see from our windows in winter. It would be better if the voices and laughter of children making snowmen or playing snowballs echoed from the courtyards of residential areas. One could step out onto the balcony and watch that very scene while sipping a warm tea or coffee, rather than the sights we see now. You cannot distract yourself from thoughts of the war for long; it continues, and everything around us tries to remind us of that. Over the years, the brain tries to pretend it has adapted to everything happening around it - to what the ears hear and the eyes see, to the various situations that occur here every day. But sanity is always on guard, and an inner voice says without words: this is not a normal life for a human being; if only everything could return to the way it was before. But is it even possible to return to the life we had, carrying such an experience on our shoulders? … Now, in the "points of warmth," you can see beds. You might ask: are these for overnight stays? The beds are intended for people who need assistance in the event of an emergency or for medical reasons, as well as for temporary rest if a person is in need of it… but people are still forced to return to their apartments.

Katerina Horbunova

78,511 Aufrufe • vor 6 Monaten

The Sun feels warmer at noon than in the morning or evening, because it is physically right above you at local noon 🌞 Its heat is not traveling 93 million miles though a vacuum to get to you. This preposterous distance was made up to block you from understanding that it's a local luminary that was put here by God, circuiting above our Earth in perfect circles every day, its path narrowing and widening over the course of the year, which creates SEASONS. This geocentric model of how the Sun interacts with Earth makes complete sense. Unlike the heliocentric model, which makes zero sense, but everyone believes it anyway since they were taught it when they were too young to ask discerning questions like, "If the Sun and Moon are radically different sizes and distances, how come they've looked the EXACT same size in the sky to us here on Earth, for thousands of years of recorded history?" The deceivers in control of the modern matrix don't want you asking these questions, because it leads to you waking up on where you are, and therefore WHO you are, and why you're actually here inside God's creation.

The Sun feels warmer at noon than in the morning or evening, because it is physically right above you at local noon 🌞 Its heat is not traveling 93 million miles though a vacuum to get to you. This preposterous distance was made up to block you from understanding that it's a local luminary that was put here by God, circuiting above our Earth in perfect circles every day, its path narrowing and widening over the course of the year, which creates SEASONS. This geocentric model of how the Sun interacts with Earth makes complete sense. Unlike the heliocentric model, which makes zero sense, but everyone believes it anyway since they were taught it when they were too young to ask discerning questions like, "If the Sun and Moon are radically different sizes and distances, how come they've looked the EXACT same size in the sky to us here on Earth, for thousands of years of recorded history?" The deceivers in control of the modern matrix don't want you asking these questions, because it leads to you waking up on where you are, and therefore WHO you are, and why you're actually here inside God's creation.

Ben Wehrman

21,750 Aufrufe • vor 1 Monat