Uploaded: 2026-01-15T18:48:34.000Z
Duration: PT13.355S
Channel: Nathan Barry

Most recent diffusion language model research (that I’ve seen)... seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens (either with a random token or a semantically similar token). I wondered how they were getting such high throughput with the latter noising process, since I believed that optimizing inference with KVCache approximation would be more difficult (for various reasons). I visualized this noising process with tiny-diffusion and compared it to normal unmasking, and was very surprised to see how fast the generation “settles” into a reasonable output, and then only slightly refines afterwards, requiring much fewer steps in total. Unmasking (where tokens are never remasked, the typical implementation) is inherently limited in generation speed by the fact that an increase in tokens decoded per step leads to more errors due to the mismatch between individual and marginal token probability distributions we sample from. The token replacement noising process seems to have a much different set of characteristics. Because we sample each token per step, every token makes “progress” towards the final output each iteration (in addition to *potentially* giving other tokens more information in future steps). Generally, masking has outperformed other noising processes, which is probably why most research focused on it (using smaller models). But the paper referred to in the retweet shows that random replacement as a noising process may scale better as model size increases. Big labs might have noticed these results much earlier (due to having drastically more training resources and being able to test larger models), which may explain the discrepancy in the choice of noising process. I’m gonna test this with larger models, since tiny-diffusion only has 10M parameters.show more

Nathan Barry

40,331 görüntüleme • 4 ay önce

Added context to my tiny diffusion model to enable... sequential generation of longer outputs! Currently the context is a quarter of the sequence length (seq_len=256, context_len=64). I have a theory that the less semantic-value-per-token, the worse the “curse of parallel decoding” is. With parallel decoding, we independently predict multiple tokens in one step. With the sentence “My poker hand was a ___ ___”, two valid predictions are “two pair” and “straight flush”. Because each token prediction is independent though, we can end up with a nonsensical output like “two flush”. This seems to be exacerbated with low semantic-value-per-token, as now you need more tokens to express the same concept. Instead of needing to independently predict two tokens, we might need to predict 10 instead (which is of course much harder). The model currently has noticeably worse output compared to nanogpt (similar size) and I believe this is a main reason. I’ll try adding confidence-aware parallel decoding (from NVIDIA’s Fast-dLLM paper) and other tricks and see how much they improve generation quality.show more

Nathan Barry

89,040 görüntüleme • 7 ay önce

The Hidden Language of Diffusion Models paper page: tackle... the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulationshow more

AK

41,746 görüntüleme • 3 yıl önce

Selected as a best paper finalist at #CVPR2026: PixelDiT... show more

NVIDIA AI

26,730 görüntüleme • 4 gün önce

Sparsely activated models like MOEs and Apple silicon +... MLX are a great match. - Lots of RAM to hold the entire model in memory (not just the active parameters). For an MOE at each token you access basically a random subset of the model. Swapping large parts of the model to "disk" from token-to-token is too slow. - Comparatively you don't need as much memory bandwidth. Only a small fraction of the weights are used per token. In the case of DeepSeek v3 37B / 671B are active. So only ~5% of the weights are moved to GPU cache / register for each token. (SVG animation made with the help of DeepSeek V2 1210 + MLX on an M2 Ultra)show more

Awni Hannun

27,452 görüntüleme • 1 yıl önce

High-resolution image and video generation is hitting a wall... show more

Gordon Wetzstein

162,426 görüntüleme • 2 ay önce

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs... discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.show more

AK

50,969 görüntüleme • 1 yıl önce

As Bixos Company, we are pleased to announce that... show more

Bixos Incorporation

27,134 görüntüleme • 1 yıl önce

Chop the gradients ✂️! We found that truncating decoder... gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.show more

Felix Heide

28,282 görüntüleme • 1 ay önce

World Models are the path for some AI Models... in the future. But how can we efficiently train these models to not only see the world the way humans do but to see the world in a new and unique way. By visualizing, what is normally sequenced audio patterns, we can derive much more insights. Here we see Paganini in a visual form that can than be described and transcribed into a World Model. We can observe connections in a manner that may not have been clear prior to the digitalization of music and sound in this way. The company with the most valuable potential in building a World Model is Tesla. Not that this type of visualization is being used, but that the mechanisms are in place, and the technology is in place for the company to thrive in this new form of AI.show more

Brian Roemmele

57,424 görüntüleme • 6 ay önce

🔥 2 BILLION $ATLA BURNED 🔥 Major milestone for... the ATLA ecosystem. 2 BILLION tokens permanently removed from supply — cutting total allocation by nearly 2/3. 👇 Why it happened As ATLA demand and value grew, the team reviewed the original token allocation and decided to burn a large portion of undistributed tokens to strengthen the long-term token model. ⚠️ Important: This burn only affects undistributed tokens — current holders are not affected. Result: 📉 Lower future supply 📈 Stronger price fundamentals This makes $ATLA a much more limited and stronger asset🚀show more

Atleta Network

144,556 görüntüleme • 3 ay önce

DimensionX: Create Any 3D and 4D Scenes from a... Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.show more

MrNeRF

17,017 görüntüleme • 1 yıl önce

🚨Well-Known Crypto Expert Claims ‘#XRP WILL UNAVOIDABLY REPLACE SWIFT’!... TO PUT IT INTO CONTEXT, SWIFT HANDLED A MASSIVE $150 TRILLION IN 2022 ALONE! FOR XRP TO PROCESS THAT VOLUME, EACH TOKEN WOULD NEED TO REACH $2,670.00!! THE DEFI SECTOR ON XRPL IS ABOUT TO EXPLODE!! CTF TOKEN, THE TOP DEFI TOKEN ON XRPL, NEEDS JUST A $20 BILLION MARKET CAP TO SOAR FROM $0.80 TO $748.50!! CTF TOKEN HAS ALREADY SECURED PARTNERSHIPS WITH AMAZON AND WALMART!! A MAJOR REVOLUTION IS UNDERWAY!! TRADE CTF TOKEN HERE: TRADE CTF TOKEN ON MEXC: Official Website:show more

JackTheRippler ©️

124,647 görüntüleme • 1 yıl önce

LLaDA (the first Large Language Diffusion Model) is *just*... show more

apolinario 🌐

81,047 görüntüleme • 1 yıl önce

(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with... show more

Matthias Niessner

57,912 görüntüleme • 2 yıl önce

Crypto narratives tend to move in cycles. 2020 was... DeFi. 2021 became NFTs. 2023 turned into the AI boom. 2024–2025 were dominated by memecoins and attention tokens. But markets eventually rotate back to something simple: real revenue. That’s why some people are starting to look at iGaming tokens as a potential emerging narrative in 2026. Unlike many hype driven tokens, the iGaming sector already runs large cash flow businesses. Many platforms generate hundreds of millions of dollars in monthly revenue, yet their tokens often trade with far less volume than projects that barely produce revenue at all. In other words, there’s a visible mismatch between actual business activity and token market valuation. One ecosystem that sits right inside this discussion is 1win Token, which already operates as one of the top 10 online casinos globally by scale and user activity. The upcoming $1win Token is designed to connect that existing business with on chain incentives. Its token model includes buybacks and burns funded directly from casino revenue, tying token supply mechanics to real cash flow. There’s also an interesting structural difference compared to previous gaming tokens. For example, $RLB (Rollbit) saw a massive post launch rally, but the product and revenue scale at launch were significantly smaller than what 1win operates today. Another notable point is the launch design: instead of only farming an airdrop, 1win Token plans a public sale model, allowing broader participation from the start. If Web3 narratives are indeed shifting away from pure attention cycles and back toward revenue generating platforms, sectors like iGaming may start attracting more analytical focus and 1win could emerge as the biggest winner.show more

BitBull

20,972 görüntüleme • 3 ay önce

Kurdistan - PKK Today and right now the armed... wing of the PKK, the HPG is holding a disarment ceremony in which around 30 soldiers and 2 high ranking commanders will officially lay down their arms and return to the PKK headquarters as non-combatants. Alongside the ceremony a speech will be given by these 2 high ranking commanders in which the peace process between Turkey and the PKK will be addressed. This event is an event to underline the PKK readiness to lay down arms and its commitment to the peace process. However the Kurdish community must remain cautious, this event serves the purpose of showing commitment, the real struggle of this peace process will be the process itself and the Turkish commitment to this process. Guarantees must be given by the Turkish government and the international community must be guarantors of this process and subsequently the agreement. We should not foolishly assume that with this ceremony, the process is done and that an agreement will follow easily. The process is still in its early stages and the PKK has taken the initiative with this event to show their willingness and determination should Turkey be genuine in their efforts to achieve a peace agreement.show more

ScharoMaroof

14,275 görüntüleme • 11 ay önce

🚨THOSE WHO HOLD #XRP MAY BE LOOKING AT A... show more

97,416 görüntüleme • 17 gün önce

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering discuss: The... correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have shown strong generative and inpainting capabilities, we find that current models do not sufficiently "understand" the scene shown in a single picture to generate consistent lighting effects (shadows, bright reflections, etc.) while preserving the identity and details of the composited object. We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process. Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes. Our physically based pipeline further enables automatic materials and tone-mapping refinement.show more

AK

19,101 görüntüleme • 1 yıl önce

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models... show more

Tanishq Mathew Abraham, Ph.D.

21,813 görüntüleme • 1 yıl önce

Just went over an audit of a very large... Fortune 500 firm and the use of OpenClaw. I advised this client to track and isolate everything. Most listened some did not. Unfortunately one employee using 5 MacMinis had racked up $13,000 of token use in 4 days! The output was minimal and low quality. I have about 200 audits to do in the “wow OpenClaw, MacMini” fiasco. But I can tell you, few have seen a big return on investment. Now don’t get me wrong, these can be powerful tools. The issue is AI influencers have turned many rally smart folks into “like and subscribe” zombies assuming that real work is getting done. It isn’t. Not for the price paid, even local models the way most folks are using this. It is one reason Mr. Grok and myself formed The Zero-Human Company to show that there is a way to do this. We will open source this to save millions of dollars of burnt tokens. It is one reason I invented JouleWork. How else can you monitor real output?show more

Brian Roemmele

72,146 görüntüleme • 1 ay önce

Live Cam