正在加载视频...

视频加载失败

Most recent diffusion language model research (that I’ve seen) seems to be using masking as the noising process. It looks like, however, most closed-source models (Google Gemini Diffusion and possibly Inception Labs’ Mercury) use a different noising process, where instead of masking tokens, they replace them with different tokens...

40,331 次观看 • 4 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

LongWriter Unleashing 10,000+ Word Generation from Long Context LLMs discuss: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarcity of long-output examples in existing SFT datasets. To address this, we introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks, enabling off-the-shelf LLMs to generate coherent outputs exceeding 20,000 words. Leveraging AgentWrite, we construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words. By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10,000 words while maintaining output quality. We also develop LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities. Our 9B parameter model, further improved through DPO, achieves state-of-the-art performance on this benchmark, surpassing even much larger proprietary models. In general, our work demonstrates that existing long context LLM already possesses the potential for a larger output window--all you need is data with extended output during model alignment to unlock this capability.

AK

50,969 次观看 • 1 年前

Crypto narratives tend to move in cycles. 2020 was DeFi. 2021 became NFTs. 2023 turned into the AI boom. 2024–2025 were dominated by memecoins and attention tokens. But markets eventually rotate back to something simple: real revenue. That’s why some people are starting to look at iGaming tokens as a potential emerging narrative in 2026. Unlike many hype driven tokens, the iGaming sector already runs large cash flow businesses. Many platforms generate hundreds of millions of dollars in monthly revenue, yet their tokens often trade with far less volume than projects that barely produce revenue at all. In other words, there’s a visible mismatch between actual business activity and token market valuation. One ecosystem that sits right inside this discussion is 1win Token, which already operates as one of the top 10 online casinos globally by scale and user activity. The upcoming $1win Token is designed to connect that existing business with on chain incentives. Its token model includes buybacks and burns funded directly from casino revenue, tying token supply mechanics to real cash flow. There’s also an interesting structural difference compared to previous gaming tokens. For example, $RLB (Rollbit) saw a massive post launch rally, but the product and revenue scale at launch were significantly smaller than what 1win operates today. Another notable point is the launch design: instead of only farming an airdrop, 1win Token plans a public sale model, allowing broader participation from the start. If Web3 narratives are indeed shifting away from pure attention cycles and back toward revenue generating platforms, sectors like iGaming may start attracting more analytical focus and 1win could emerge as the biggest winner.

BitBull

20,972 次观看 • 2 个月前