Most recent diffusion language model research (that I’ve seen)... seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens (either with a random token or a semantically similar token). I wondered how they were getting such high throughput with the latter noising process, since I believed that optimizing inference with KVCache approximation would be more difficult (for various reasons). I visualized this noising process with tiny-diffusion and compared it to normal unmasking, and was very surprised to see how fast the generation “settles” into a reasonable output, and then only slightly refines afterwards, requiring much fewer steps in total. Unmasking (where tokens are never remasked, the typical implementation) is inherently limited in generation speed by the fact that an increase in tokens decoded per step leads to more errors due to the mismatch between individual and marginal token probability distributions we sample from. The token replacement noising process seems to have a much different set of characteristics. Because we sample each token per step, every token makes “progress” towards the final output each iteration (in addition to *potentially* giving other tokens more information in future steps). Generally, masking has outperformed other noising processes, which is probably why most research focused on it (using smaller models). But the paper referred to in the retweet shows that random replacement as a noising process may scale better as model size increases. Big labs might have noticed these results much earlier (due to having drastically more training resources and being able to test larger models), which may explain the discrepancy in the choice of noising process. I’m gonna test this with larger models, since tiny-diffusion only has 10M parameters.show more

Nathan Barry
40,331 görüntüleme • 4 ay önce
Added context to my tiny diffusion model to enable... sequential generation of longer outputs! Currently the context is a quarter of the sequence length (seq_len=256, context_len=64). I have a theory that the less semantic-value-per-token, the worse the “curse of parallel decoding” is. With parallel decoding, we independently predict multiple tokens in one step. With the sentence “My poker hand was a ___ ___”, two valid predictions are “two pair” and “straight flush”. Because each token prediction is independent though, we can end up with a nonsensical output like “two flush”. This seems to be exacerbated with low semantic-value-per-token, as now you need more tokens to express the same concept. Instead of needing to independently predict two tokens, we might need to predict 10 instead (which is of course much harder). The model currently has noticeably worse output compared to nanogpt (similar size) and I believe this is a main reason. I’ll try adding confidence-aware parallel decoding (from NVIDIA’s Fast-dLLM paper) and other tricks and see how much they improve generation quality.show more

Nathan Barry
89,040 görüntüleme • 7 ay önce
The Hidden Language of Diffusion Models paper page: tackle... the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulationshow more

AK
41,746 görüntüleme • 3 yıl önce
Selected as a best paper finalist at #CVPR2026: PixelDiT... from NVIDIA Research In most image generation models, a pretrained autoencoder compresses the image before any diffusion happens, causing quality loss that accumulates across the entire pipeline. PixelDiT, or Pixel Diffusion Transformers, removes this step entirely. It's a single-stage model that learns the diffusion process directly in pixel space, end-to-end.show more

NVIDIA AI
26,730 görüntüleme • 4 gün önce
Sparsely activated models like MOEs and Apple silicon +... MLX are a great match. - Lots of RAM to hold the entire model in memory (not just the active parameters). For an MOE at each token you access basically a random subset of the model. Swapping large parts of the model to "disk" from token-to-token is too slow. - Comparatively you don't need as much memory bandwidth. Only a small fraction of the weights are used per token. In the case of DeepSeek v3 37B / 671B are active. So only ~5% of the weights are moved to GPU cache / register for each token. (SVG animation made with the help of DeepSeek V2 1210 + MLX on an M2 Ultra)show more

Awni Hannun
27,452 görüntüleme • 1 yıl önce
High-resolution image and video generation is hitting a wall... because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵show more

Gordon Wetzstein
162,426 görüntüleme • 2 ay önce
LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs... discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.show more

AK
50,969 görüntüleme • 1 yıl önce
As Bixos Company, we are pleased to announce that... we have successfully completed our tokenization process in Dubai. On July 20th, it will be possible to own real estate in Dubai with UBXS TOKEN via The house to be tokenized, With this important step, we are proud to bring the power of blockchain technology to the real estate sector. Let's step into the future together! 🚀🌆 #Tokenized #Rwa #RealEstateshow more

Bixos Incorporation
27,134 görüntüleme • 1 yıl önce
Chop the gradients ✂️! We found that truncating decoder... gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.show more

Felix Heide
28,282 görüntüleme • 1 ay önce
World Models are the path for some AI Models... in the future. But how can we efficiently train these models to not only see the world the way humans do but to see the world in a new and unique way. By visualizing, what is normally sequenced audio patterns, we can derive much more insights. Here we see Paganini in a visual form that can than be described and transcribed into a World Model. We can observe connections in a manner that may not have been clear prior to the digitalization of music and sound in this way. The company with the most valuable potential in building a World Model is Tesla. Not that this type of visualization is being used, but that the mechanisms are in place, and the technology is in place for the company to thrive in this new form of AI.show more

Brian Roemmele
57,424 görüntüleme • 6 ay önce
🔥 2 BILLION $ATLA BURNED 🔥 Major milestone for... the ATLA ecosystem. 2 BILLION tokens permanently removed from supply — cutting total allocation by nearly 2/3. 👇 Why it happened As ATLA demand and value grew, the team reviewed the original token allocation and decided to burn a large portion of undistributed tokens to strengthen the long-term token model. ⚠️ Important: This burn only affects undistributed tokens — current holders are not affected. Result: 📉 Lower future supply 📈 Stronger price fundamentals This makes $ATLA a much more limited and stronger asset🚀show more

Atleta Network
144,556 görüntüleme • 3 ay önce
DimensionX: Create Any 3D and 4D Scenes from a... Single Image with Controllable Video Diffusion TL;DR: Create 3/4DGS from Video Diffusion Note: Some first inference code released (not all yet). Contributions (cited): • We present DimensionX, a novel framework for generating photorealistic 3D and 4D scenes from only a single image using controllable video diffusion. • We propose ST-Director, which decouples the spatial and temporal priors in video diffusion models by learning (spatial and temporal) dimension-aware modules with our curated datasets. We further enhance the hybriddimension control with a training-free composition approach according to the essence of video diffusion denoising process. • To bridge the gap between video diffusion and real-world scenes, we design a trajectory-aware mechanism for 3D generation and an identity-preserving denoising approach for 4D generation, enabling more realistic and controllable scene synthesis. • Extensive experiments manifest that our DimensionX delivers superior performance in video, 3D, and 4D generation compared with baseline methods.show more

MrNeRF
17,017 görüntüleme • 1 yıl önce
🚨Well-Known Crypto Expert Claims ‘#XRP WILL UNAVOIDABLY REPLACE SWIFT’!... TO PUT IT INTO CONTEXT, SWIFT HANDLED A MASSIVE $150 TRILLION IN 2022 ALONE! FOR XRP TO PROCESS THAT VOLUME, EACH TOKEN WOULD NEED TO REACH $2,670.00!! THE DEFI SECTOR ON XRPL IS ABOUT TO EXPLODE!! CTF TOKEN, THE TOP DEFI TOKEN ON XRPL, NEEDS JUST A $20 BILLION MARKET CAP TO SOAR FROM $0.80 TO $748.50!! CTF TOKEN HAS ALREADY SECURED PARTNERSHIPS WITH AMAZON AND WALMART!! A MAJOR REVOLUTION IS UNDERWAY!! TRADE CTF TOKEN HERE: TRADE CTF TOKEN ON MEXC: Official Website:show more

JackTheRippler ©️
124,647 görüntüleme • 1 yıl önce
LLaDA (the first Large Language Diffusion Model) is *just*... out 💥 and I've built a demo, try out now 👨💻 It's mesmerizing to watch the diffusion process 🌀, and it being a diffusion model gives you superpowers like "the 4th word has to be pineapple" 🦸 Demo and weights 👇show more

apolinario 🌐
81,047 görüntüleme • 1 yıl önce
(1/2) Check out "𝐏𝐨𝐥𝐲𝐃𝐢𝐟𝐟: Generating 3D Polygonal Meshes with... Diffusion Models"! Our model operates directly on the polygons of 3D meshes and generates novel shapes as output through an iterative diffusion process.show more

Matthias Niessner
57,912 görüntüleme • 2 yıl önce
Crypto narratives tend to move in cycles. 2020 was... DeFi. 2021 became NFTs. 2023 turned into the AI boom. 2024–2025 were dominated by memecoins and attention tokens. But markets eventually rotate back to something simple: real revenue. That’s why some people are starting to look at iGaming tokens as a potential emerging narrative in 2026. Unlike many hype driven tokens, the iGaming sector already runs large cash flow businesses. Many platforms generate hundreds of millions of dollars in monthly revenue, yet their tokens often trade with far less volume than projects that barely produce revenue at all. In other words, there’s a visible mismatch between actual business activity and token market valuation. One ecosystem that sits right inside this discussion is 1win Token, which already operates as one of the top 10 online casinos globally by scale and user activity. The upcoming $1win Token is designed to connect that existing business with on chain incentives. Its token model includes buybacks and burns funded directly from casino revenue, tying token supply mechanics to real cash flow. There’s also an interesting structural difference compared to previous gaming tokens. For example, $RLB (Rollbit) saw a massive post launch rally, but the product and revenue scale at launch were significantly smaller than what 1win operates today. Another notable point is the launch design: instead of only farming an airdrop, 1win Token plans a public sale model, allowing broader participation from the start. If Web3 narratives are indeed shifting away from pure attention cycles and back toward revenue generating platforms, sectors like iGaming may start attracting more analytical focus and 1win could emerge as the biggest winner.show more

BitBull
20,972 görüntüleme • 3 ay önce
Kurdistan - PKK Today and right now the armed... wing of the PKK, the HPG is holding a disarment ceremony in which around 30 soldiers and 2 high ranking commanders will officially lay down their arms and return to the PKK headquarters as non-combatants. Alongside the ceremony a speech will be given by these 2 high ranking commanders in which the peace process between Turkey and the PKK will be addressed. This event is an event to underline the PKK readiness to lay down arms and its commitment to the peace process. However the Kurdish community must remain cautious, this event serves the purpose of showing commitment, the real struggle of this peace process will be the process itself and the Turkish commitment to this process. Guarantees must be given by the Turkish government and the international community must be guarantors of this process and subsequently the agreement. We should not foolishly assume that with this ceremony, the process is done and that an agreement will follow easily. The process is still in its early stages and the PKK has taken the initiative with this event to show their willingness and determination should Turkey be genuine in their efforts to achieve a peace agreement.show more

ScharoMaroof
14,275 görüntüleme • 11 ay önce
🚨THOSE WHO HOLD #XRP MAY BE LOOKING AT A... VERY DIFFERENT FUTURE. FOR PERSPECTIVE, IN LEBANON IT TAKES AROUND 1 MILLION LEBANESE POUNDS TO BUY JUST 8 XRP. GOOGLE HAS INTEGRATED BANX MEDIA, POWERED BY $BXE ON THE XRP LEDGER! WITH ONLY 490 MILLION TOKENS IN TOTAL SUPPLY, $BXE HAS ONE OF THE LOWER SUPPLIES IN THE #XRPL ECOSYSTEM. A SOLANA-SIZED MARKET CAP WOULD PUT BXE ABOVE $150 PER TOKEN. BXE TOKEN TRADE LINK ON MEXC:show more

JackTheRippler ©️
97,416 görüntüleme • 17 gün önce
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering discuss: The... correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have shown strong generative and inpainting capabilities, we find that current models do not sufficiently "understand" the scene shown in a single picture to generate consistent lighting effects (shadows, bright reflections, etc.) while preserving the identity and details of the composited object. We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process. Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes. Our physically based pipeline further enables automatic materials and tone-mapping refinement.show more

AK
19,101 görüntüleme • 1 yıl önce
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models... "Block diffusion sequentially generates blocks of tokens by performing diffusion within each block and conditioning on previous blocks. By combining strength from autoregressive and diffusion models, block diffusion overcomes the limitations of both approaches by supporting variable-length, higher-quality generation and improving inference efficiency with KV caching and parallel sampling."show more

Tanishq Mathew Abraham, Ph.D.
21,813 görüntüleme • 1 yıl önce
Just went over an audit of a very large... Fortune 500 firm and the use of OpenClaw. I advised this client to track and isolate everything. Most listened some did not. Unfortunately one employee using 5 MacMinis had racked up $13,000 of token use in 4 days! The output was minimal and low quality. I have about 200 audits to do in the “wow OpenClaw, MacMini” fiasco. But I can tell you, few have seen a big return on investment. Now don’t get me wrong, these can be powerful tools. The issue is AI influencers have turned many rally smart folks into “like and subscribe” zombies assuming that real work is getting done. It isn’t. Not for the price paid, even local models the way most folks are using this. It is one reason Mr. Grok and myself formed The Zero-Human Company to show that there is a way to do this. We will open source this to save millions of dollars of burnt tokens. It is one reason I invented JouleWork. How else can you monitor real output?show more

Brian Roemmele
72,146 görüntüleme • 1 ay önce