Uploaded: 2026-06-10T18:21:22.000Z
Duration: PT11.133S
Channel: Sergey Edunov

Diffusion models are an amazing tool for cofolding, they... allow us to predict a protein and the molecule bound to it at once. But they are not exactly fast and require a lot of denoising steps to get accurate predictions. So we distilled ours. Meet DeCAF-Pearl: the first flow map model for all-atom cofolding. Instead of inching along the denoising trajectory, a flow map learns to jump across it. DeCAF-Pearl runs structure generation ~5x faster than Pearl, our SOTA model, while still maintaining the performance of the teacher model. That speed up allows us to run larger experiments and generate more synthetic data to improve our models. Getting there meant reparameterizing into noise-level space to stabilize gradients, committing to clean-structure prediction to keep the rigid-alignment loss biomolecules needed, and building DeCAF-Search, one steering algorithm for every compute budget. For more technical details, read out blog post: And the paper:show more

Sergey Edunov

36,985 görüntüleme • 23 gün önce

Announcing Neo-1: the world’s most advanced atomistic foundation model,... show more

Proxima

331,637 görüntüleme • 1 yıl önce

DeCAF won the #ICML Test of Time Award 2024!... Big congrats to trevordarrell (my PhD advisor at MIT), and Jeff Donahue. 🎉 You may not heard of DeCAF, but it is everywhere! DeCAF stands for Deep Convolutional Activation Features. Published ten years ago, the DeCAF paper is a groundbreaking work that shows the activation features of the last few layers of a deep network contain useful features that can be "repurposed" for or "transferred to" many other tasks, not just the original task the network was trained for. I created this exercise to show where we can see DeCAF's influence in some of the most well-known architectures: AlexNet, ViT, U-Net, CLIP, and Latent Diffusion, to prove that DeCAF's "Test of Time Award" is well-deserved! Let's give a round of applause to DeCAF, the unsung hero of computer vision.show more

Tom Yeh

21,420 görüntüleme • 1 yıl önce

We release Diamond Maps💎 unlocking accurate and efficient guidance... for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: Code: Huge thanks to an amazing team: Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong (Kelly) He, Zeynep Akata, Tommi Jaakkola, Nicholas Boffi, and Max Simchowitz. It was awesome bringing this to life together!show more

Peter Holderrieth

59,512 görüntüleme • 2 ay önce

Today, we are releasing Stable Video Diffusion, our first... show more

Stability AI

1,024,438 görüntüleme • 2 yıl önce

1/ We are so excited to unveil the Kite... show more

KITE AI

61,990 görüntüleme • 9 ay önce

We are pleased to announce the availability of Stable... Video 4D, our very first video-to-video generation model that allows users to upload a single video and receive dynamic novel-view videos of eight new angles, delivering a new level of versatility and creativity. In conjunction with this announcement, we are releasing a comprehensive technical report detailing the methodologies, challenges, and breakthroughs achieved during the development of this model. Learn more about this release and access the report here:show more

Stability AI

131,114 görüntüleme • 1 yıl önce

All the big language models under one roof for... show more

Shubham Saboo

299,257 görüntüleme • 3 yıl önce

ESMFold2 and the ESM-C family, now available for use!... show more

Deniz Kavi

15,633 görüntüleme • 1 ay önce

A Letter to Our Community: The Road Ahead for... Robotics To our Community and Partners, As we step into 2026, our mission at Axis is clearer than ever: Constructing the definitive End-to-End Scaling Layer for Robotics. Our goal is to accelerate the transfer of diverse human intelligence into Robotics General Intelligence (RGI). By owning the critical path of intelligence creation, we are turning the physical limitations of robotics into a scalable, software-driven future. Here is our strategic outlook and roadmap for the year ahead. The Core Thesis: Simulation is the Only Way Out The path to RGI is currently blocked by Data Scarcity, Generalization Fragility, and Hardware Fragmentation. At Axis, we believe Simulation is the only way out. Our Simulation Data Platform and Data Augmentation Engine transform raw data into "Synthetic Gold". Backed by academic milestones like Roboverse, Skill Blending, and GraspVLA, we have proven that pure simulation can achieve the generalization required for the real world. We don’t just collect data; we architect it. The Engine: Why Crypto? We believe RGI should come from all, not a few. Crypto is not just a feature; it is the primitive that powers our entire ecosystem flywheel: - Incentive Mechanism: Democratizing contribution and rewarding the trainers and developers. - Assetization: Turning proprietary data and refined models into liquid, ownable assets. - Verifiable Workflow: We are opening the "Black Box" of AI. By bringing total transparency to the Task Generation → Data Collection → Model Training pipeline, we ensure every byte of intelligence is verifiable, traceable, and secure. 2026 Strategic Deliverables This year, we are committed to delivering three foundational pillars: - The World's Largest Training Dataset for Robots: A robot training set—diverse, high-quality interaction data at an unprecedented scale. - A Robotics Foundation Model: A universal robotic brain trained on our pure simulation and synthetic data, capable of robust cross-embodiment transfer and open-world adaptability. - Evolvable Robot Hardware: Robots deployed with Axis models that autonomously evolve through continuous interaction, turning every deployment into a self-improving node within our RGI network. The Ultimate Vision We are building more than models; we are architecting the Distributed Machine Economy. A future where every dataset, model, and robotic embodiment is a verifiable asset in a global, autonomous network. Thank you for building the future of intelligence with us✌️📷show more

Axis Robotics

27,858 görüntüleme • 6 ay önce

1/ Happy to share VADER: Video Diffusion Alignment via... show more

Mihir Prabhudesai

13,368 görüntüleme • 1 yıl önce

🚨New paper! Generative models are often “miscalibrated”. We calibrate... show more

Brian L Trippe

20,536 görüntüleme • 8 ay önce

DimensionX: Create Any 3D and 4D Scenes from a... Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.show more

MrNeRF

17,039 görüntüleme • 1 yıl önce

We are excited to launch our two models Pharia-1-LLM-7B-control... show more

Aleph Alpha

44,326 görüntüleme • 1 yıl önce

Happy to announce that the man, the myth, the... show more

Rainmaker

14,977 görüntüleme • 8 ay önce

It’s more than a little daunting to set out... to expand and improve the identity system for a company and brand like Stripe. But we knew we had to — the existing one had served us well, but wasn’t up to the task anymore. Our brand system required new and improved tools to scale with our ever growing audiences, new products, global footprint, and more. This update introduces material improvements to infographics, advertising, type styles, and more. While the wordmark remains unchanged, we’re using the dot of the ‘i’ (called the “tittle”), a parallelogram pointing up and to the right, to serve as our identifying symbol. We’re also using it as an ever evolving storytelling device to use when talking about our many great users (you can see the latest brand campaign in SF and NYC doing just that). Anyone who has ever worked on the refresh and expansion of an existing system for a large company knows that it is no small endeavor. Crafting impactful solutions, building alignment, creating extensible guidelines, building toolkits, and orchestrating rollout requires a ton of resilience. Here’s to the team that continually inspires me with their dedication, rigor, taste, and exceptional vibes. Great work and thank you to the Brand Studio folks, and of course our many many amazing and invaluable friends and collaborators across the company who all helped shape the work. And a special thank you to a handful of creative agencies that helped us along the way.show more

Michael Jeter

11,072 görüntüleme • 8 ay önce

Depth Any Video with Scalable Synthetic Data AI physicists... and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.show more

MrNeRF

27,428 görüntüleme • 1 yıl önce

we sped up distributed inference by up to 5x... with decentralized speculative decoding. many don't realize that AI models normally generate text one single word at a time, waiting for the network after every word. speculative decoding changes this by using a "guess & confirm" system, similar to autocomplete. how it's done: 1. draft locally (the guess) instead of waiting for the network, a tiny, fast model on your device guesses the next few words instantly, without waiting for the network. 2. confirm remotely (the check) the massive remote model doesn't generate from scratch; it just checks the draft. it looks at the guesses in a batch and says "yes, yes, no." you get multiple words in the time it usually takes to get one. 3. adaptive logic dsd is smart. if the topic is creative, it lets the draft flow loose. if the topic is math or code, it checks more strictly. it balances speed and precision automatically so your inference almost feel instant. find out more: paper: blog:show more

Parallax

45,425 görüntüleme • 5 ay önce

Chop the gradients ✂️! We found that truncating decoder... gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.show more

Felix Heide

28,323 görüntüleme • 2 ay önce

Our first test flight is just the beginning! Behind... the scenes, we are focused on up-scaling and improving our technology. We are excited to announce that we have successfully tested the central subsystem of our Helix 2.0 oxygen-rich staged-combustion engine: the powerpack. We have performed two successful hot-fire tests in which we have shown steady-state operation and cavitation limits. The powerpack incorporates the turbopump and pre-burner(s). It is the most complex as well as the most mechanically and thermally stressed subsystem of a staged-combustion engine. This milestone validated key technological challenges, such as the simultaneous ignition of multiple pre-burners and turbopump cavitation performance. The results are in-line with the predictions from our design models. The closed-cycle architecture of Helix allows us to push the performance envelope further: Helix 2.0 is designed to deliver double the thrust (200kN), while mass, production technology and costs remain comparable to Helix 1.0. The result for our customers: more payload for a lower budget! Excited about this news? Check out our career portal for employment opportunities and help us to elevate our Helix staged-combustion engine technology to the next level! ➡️show more

Rocket Factory Augsburg

32,392 görüntüleme • 1 ay önce

We are trying to build #PrachyamTV into a hub... show more

Prachyam

40,471 görüntüleme • 2 yıl önce

Live Cam