Inception's banner

Inception

@_inception_ai • 17,148 subscribers

Pioneering a new generation of LLMs.

Shorts

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.

1,914,989 views

Mercury is refreshed – with across-the-board improvements in coding, instruction following, math, and knowledge recall. Start building responsive, in-the-flow AI solutions! Read more:

Mercury is refreshed – with across-the-board improvements in coding, instruction following, math, and knowledge recall. Start building responsive, in-the-flow AI solutions! Read more:

54,856 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

90,673 views • 1 year ago

Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the team at Clyep for the breakdown.

Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the team at Clyep for the breakdown.

21,231 views • 2 months ago

What if language models didn't have to generate one token at a time? Our CEO Stefano Ermon joined @TBPN to break down how Mercury 2's diffusion LLM hits 1,000+ tok/s on standard NVIDIA GPUs — and why speed changes the product for coding, voice agents, and search.

What if language models didn't have to generate one token at a time? Our CEO Stefano Ermon joined @TBPN to break down how Mercury 2's diffusion LLM hits 1,000+ tok/s on standard NVIDIA GPUs — and why speed changes the product for coding, voice agents, and search.

27,108 views • 4 months ago

Inception's founding team came together at Stanford. Aditya Grover was Stefano Ermon's second PhD student. Volodymyr Kuleshov 🇺🇦 joined the group as a postdoc. Aditya and Volo shared an office. Years later, Stefano's lab hit a breakthrough on diffusion for language. Volo's group at Cornell was publishing adjacent work. Aditya's research at UCLA overlapped with the direction. Part 3 of our founder story series with Tim Tully at Menlo Ventures ↓

Inception's founding team came together at Stanford. Aditya Grover was Stefano Ermon's second PhD student. Volodymyr Kuleshov 🇺🇦 joined the group as a postdoc. Aditya and Volo shared an office. Years later, Stefano's lab hit a breakthrough on diffusion for language. Volo's group at Cornell was publishing adjacent work. Aditya's research at UCLA overlapped with the direction. Part 3 of our founder story series with Tim Tully at Menlo Ventures ↓

16,653 views • 2 months ago

Listen to Samar Khanna explain why parallel generation, rather than sequential, raises the performance ceiling for language models. Learn more about diffusion LLMs. → We're hiring:

Listen to Samar Khanna explain why parallel generation, rather than sequential, raises the performance ceiling for language models. Learn more about diffusion LLMs. → We're hiring:

18,515 views • 4 months ago

Try Mercury Coder on our playground at

Try Mercury Coder on our playground at

47,097 views • 1 year ago

No more content to load