
Inception
@_inception_ai • 17,714 subscribers
Pioneering a new generation of LLMs.
Shorts
Videos

Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the team at Clyep for the breakdown.
Inception21,021 Aufrufe • vor 19 Tagen

What if language models didn't have to generate one token at a time? Our CEO Stefano Ermon joined @TBPN to break down how Mercury 2's diffusion LLM hits 1,000+ tok/s on standard NVIDIA GPUs — and why speed changes the product for coding, voice agents, and search.
Inception26,928 Aufrufe • vor 3 Monaten
Keine weiteren Inhalte verfügbar