Diffusion models are an amazing tool for cofolding, they... allow us to predict a protein and the molecule bound to it at once. But they are not exactly fast and require a lot of denoising steps to get accurate predictions. So we distilled ours. Meet DeCAF-Pearl: the first flow map model for all-atom cofolding. Instead of inching along the denoising trajectory, a flow map learns to jump across it. DeCAF-Pearl runs structure generation ~5x faster than Pearl, our SOTA model, while still maintaining the performance of the teacher model. That speed up allows us to run larger experiments and generate more synthetic data to improve our models. Getting there meant reparameterizing into noise-level space to stabilize gradients, committing to clean-structure prediction to keep the rigid-alignment loss biomolecules needed, and building DeCAF-Search, one steering algorithm for every compute budget. For more technical details, read out blog post: And the paper:show more

Sergey Edunov
36,985 görüntüleme • 23 gün önce
Announcing Neo-1: the world’s most advanced atomistic foundation model,... unifying structure prediction and all-atom de novo generation for the first time - to decode and design the structure of life 🧵(1/10)show more

Proxima
331,637 görüntüleme • 1 yıl önce
DeCAF won the #ICML Test of Time Award 2024!... Big congrats to trevordarrell (my PhD advisor at MIT), and Jeff Donahue. 🎉 You may not heard of DeCAF, but it is everywhere! DeCAF stands for Deep Convolutional Activation Features. Published ten years ago, the DeCAF paper is a groundbreaking work that shows the activation features of the last few layers of a deep network contain useful features that can be "repurposed" for or "transferred to" many other tasks, not just the original task the network was trained for. I created this exercise to show where we can see DeCAF's influence in some of the most well-known architectures: AlexNet, ViT, U-Net, CLIP, and Latent Diffusion, to prove that DeCAF's "Test of Time Award" is well-deserved! Let's give a round of applause to DeCAF, the unsung hero of computer vision.show more

Tom Yeh
21,420 görüntüleme • 1 yıl önce
We release Diamond Maps💎 unlocking accurate and efficient guidance... for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: Code: Huge thanks to an amazing team: Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong (Kelly) He, Zeynep Akata, Tommi Jaakkola, Nicholas Boffi, and Max Simchowitz. It was awesome bringing this to life together!show more

Peter Holderrieth
59,512 görüntüleme • 2 ay önce
Today, we are releasing Stable Video Diffusion, our first... foundation model for generative AI video based on the image model, Stable Diffusion. As part of this research preview, the code, weights, and research paper are now available. Additionally, today you can sign up for our waitlist to access a new upcoming web experience featuring a Text-To-Video interface. To access the model & sign up for our waitlist, visit our website here:show more

Stability AI
1,024,438 görüntüleme • 2 yıl önce
1/ We are so excited to unveil the Kite... AI Ecosystem Map. Kite AI is only as strong as the ecosystem behind it, and ours has enabled us to quickly become the leading base layer for the agentic web. From Google, Shopify and PayPal to Coinbase 🛡️ and Chainlink, our ecosystem comprises 100+ powerhouses across Web2 and Web3 that are building the next generation of autonomous AI. Learn more about who’s building with us here:show more

KITE AI
61,990 görüntüleme • 9 ay önce
We are pleased to announce the availability of Stable... Video 4D, our very first video-to-video generation model that allows users to upload a single video and receive dynamic novel-view videos of eight new angles, delivering a new level of versatility and creativity. In conjunction with this announcement, we are releasing a comprehensive technical report detailing the methodologies, challenges, and breakthroughs achieved during the development of this model. Learn more about this release and access the report here:show more

Stability AI
131,114 görüntüleme • 1 yıl önce
All the big language models under one roof for... the very first time 🤯 Compare the output of OpenAI ChatGPT, Anthropic's Claude and Cohere's language model in a single playground!! Check out this amazing tool to get the best of large language models 👇show more

Shubham Saboo
299,257 görüntüleme • 3 yıl önce
ESMFold2 and the ESM-C family, now available for use!... We’ve partnered with biohub (the ESM team’s new home), to provide day 1 access to their newly open-sourced series of models. The family of models show best-in-class results for structure prediction, de novo design, and protein-language model tasks.show more

Deniz Kavi
15,633 görüntüleme • 1 ay önce
A Letter to Our Community: The Road Ahead for... Robotics To our Community and Partners, As we step into 2026, our mission at Axis is clearer than ever: Constructing the definitive End-to-End Scaling Layer for Robotics. Our goal is to accelerate the transfer of diverse human intelligence into Robotics General Intelligence (RGI). By owning the critical path of intelligence creation, we are turning the physical limitations of robotics into a scalable, software-driven future. Here is our strategic outlook and roadmap for the year ahead. The Core Thesis: Simulation is the Only Way Out The path to RGI is currently blocked by Data Scarcity, Generalization Fragility, and Hardware Fragmentation. At Axis, we believe Simulation is the only way out. Our Simulation Data Platform and Data Augmentation Engine transform raw data into "Synthetic Gold". Backed by academic milestones like Roboverse, Skill Blending, and GraspVLA, we have proven that pure simulation can achieve the generalization required for the real world. We don’t just collect data; we architect it. The Engine: Why Crypto? We believe RGI should come from all, not a few. Crypto is not just a feature; it is the primitive that powers our entire ecosystem flywheel: - Incentive Mechanism: Democratizing contribution and rewarding the trainers and developers. - Assetization: Turning proprietary data and refined models into liquid, ownable assets. - Verifiable Workflow: We are opening the "Black Box" of AI. By bringing total transparency to the Task Generation → Data Collection → Model Training pipeline, we ensure every byte of intelligence is verifiable, traceable, and secure. 2026 Strategic Deliverables This year, we are committed to delivering three foundational pillars: - The World's Largest Training Dataset for Robots: A robot training set—diverse, high-quality interaction data at an unprecedented scale. - A Robotics Foundation Model: A universal robotic brain trained on our pure simulation and synthetic data, capable of robust cross-embodiment transfer and open-world adaptability. - Evolvable Robot Hardware: Robots deployed with Axis models that autonomously evolve through continuous interaction, turning every deployment into a self-improving node within our RGI network. The Ultimate Vision We are building more than models; we are architecting the Distributed Machine Economy. A future where every dataset, model, and robotic embodiment is a verifiable asset in a global, autonomous network. Thank you for building the future of intelligence with us✌️📷show more

Axis Robotics
27,858 görüntüleme • 6 ay önce
1/ Happy to share VADER: Video Diffusion Alignment via... Reward Gradients. We adapt foundational video diffusion models using pre-trained reward models to generate high-quality, aligned videos for various end-applications. Below we generated a short movie using VADER 😀, we used ChatGPT to write a script and an off-the-shelf AI music generator to generate the sound. Our code & weights are open-sourced:show more

Mihir Prabhudesai
13,368 görüntüleme • 1 yıl önce
🚨New paper! Generative models are often “miscalibrated”. We calibrate... diffusion models, LLMs, and more to meet desired distributional properties. E.g. we finetune protein models to better match the diversity of natural proteins.show more

Brian L Trippe
20,536 görüntüleme • 8 ay önce
DimensionX: Create Any 3D and 4D Scenes from a... Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.show more

MrNeRF
17,039 görüntüleme • 1 yıl önce
We are excited to launch our two models Pharia-1-LLM-7B-control... and Pharia-1-LLM-7B-control-aligned. Both models and the code used to train them are now publicly available and open-sourced for non-commercial research and educational use. Read our model blog post here: Learn more about our open-source codebase Scaling: #writtenbyalephalphashow more

Aleph Alpha
44,326 görüntüleme • 1 yıl önce
Happy to announce that the man, the myth, the... legendary former RB of the Pittsburgh Steelers Le'Veon Bell will be on stream for Rainmaker soon! More details to come. We are building the best sniper for AI sports prediction markets and what better way to introduce it to the world than to have one of the NFL's best running backs join us!show more

Rainmaker
14,977 görüntüleme • 8 ay önce
It’s more than a little daunting to set out... to expand and improve the identity system for a company and brand like Stripe. But we knew we had to — the existing one had served us well, but wasn’t up to the task anymore. Our brand system required new and improved tools to scale with our ever growing audiences, new products, global footprint, and more. This update introduces material improvements to infographics, advertising, type styles, and more. While the wordmark remains unchanged, we’re using the dot of the ‘i’ (called the “tittle”), a parallelogram pointing up and to the right, to serve as our identifying symbol. We’re also using it as an ever evolving storytelling device to use when talking about our many great users (you can see the latest brand campaign in SF and NYC doing just that). Anyone who has ever worked on the refresh and expansion of an existing system for a large company knows that it is no small endeavor. Crafting impactful solutions, building alignment, creating extensible guidelines, building toolkits, and orchestrating rollout requires a ton of resilience. Here’s to the team that continually inspires me with their dedication, rigor, taste, and exceptional vibes. Great work and thank you to the Brand Studio folks, and of course our many many amazing and invaluable friends and collaborators across the company who all helped shape the work. And a special thank you to a handful of creative agencies that helped us along the way.show more

Michael Jeter
11,072 görüntüleme • 8 ay önce
Depth Any Video with Scalable Synthetic Data AI physicists... and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.show more

MrNeRF
27,428 görüntüleme • 1 yıl önce
we sped up distributed inference by up to 5x... with decentralized speculative decoding. many don't realize that AI models normally generate text one single word at a time, waiting for the network after every word. speculative decoding changes this by using a "guess & confirm" system, similar to autocomplete. how it's done: 1. draft locally (the guess) instead of waiting for the network, a tiny, fast model on your device guesses the next few words instantly, without waiting for the network. 2. confirm remotely (the check) the massive remote model doesn't generate from scratch; it just checks the draft. it looks at the guesses in a batch and says "yes, yes, no." you get multiple words in the time it usually takes to get one. 3. adaptive logic dsd is smart. if the topic is creative, it lets the draft flow loose. if the topic is math or code, it checks more strictly. it balances speed and precision automatically so your inference almost feel instant. find out more: paper: blog:show more

Parallax
45,425 görüntüleme • 5 ay önce
Chop the gradients ✂️! We found that truncating decoder... gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.show more

Felix Heide
28,323 görüntüleme • 2 ay önce
Our first test flight is just the beginning! Behind... the scenes, we are focused on up-scaling and improving our technology. We are excited to announce that we have successfully tested the central subsystem of our Helix 2.0 oxygen-rich staged-combustion engine: the powerpack. We have performed two successful hot-fire tests in which we have shown steady-state operation and cavitation limits. The powerpack incorporates the turbopump and pre-burner(s). It is the most complex as well as the most mechanically and thermally stressed subsystem of a staged-combustion engine. This milestone validated key technological challenges, such as the simultaneous ignition of multiple pre-burners and turbopump cavitation performance. The results are in-line with the predictions from our design models. The closed-cycle architecture of Helix allows us to push the performance envelope further: Helix 2.0 is designed to deliver double the thrust (200kN), while mass, production technology and costs remain comparable to Helix 1.0. The result for our customers: more payload for a lower budget! Excited about this news? Check out our career portal for employment opportunities and help us to elevate our Helix staged-combustion engine technology to the next level! ➡️show more

Rocket Factory Augsburg
32,392 görüntüleme • 1 ay önce
We are trying to build #PrachyamTV into a hub... of content for Hindu Kids. Our Itihaas, Values, Dharma - all taught in slow, meaningful, non ADHD style. We need more subscriptions to speed things up - and to validate the model. PrachyamTV Plans 👉show more

Prachyam
40,471 görüntüleme • 2 yıl önce