Cerebras

@cerebras • 62,588 subscribers

The world's fastest AI inference and training. Try the latest open models at: https://t.co/jREGhLI2nj

Shorts

144,875 views

536,432 views

177,904 views

108,466 views

51,711 views

68,007 views

47,752 views

Videos

sweetdream.ai

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Private Show

Join now for exclusive access

Free preview available • Premium content

269,017 views • 20 days ago

383,060 views • 1 month ago

293,415 views • 1 month ago

141,858 views • 18 days ago

633,658 views • 5 months ago

30,493 views • 9 days ago

287,547 views • 5 months ago

411,916 views • 8 months ago

346,446 views • 7 months ago

461,227 views • 11 months ago

104,937 views • 2 months ago

222,816 views • 6 months ago

706,706 views • 1 year ago

299,029 views • 9 months ago

18,564 views • 12 days ago

345,732 views • 1 year ago

93,023 views • 3 months ago

134,887 views • 6 months ago

57,654 views • 2 months ago

236,067 views • 1 year ago

Live Cam

Cerebras

Shorts

Google's new fast model 3.5 Flash vs Cerebras

🟪 Qwen3-235B 2507 Instruct is live on Cerebras 1400 TPS • 131K Context • 230ms TTFT $0.6 | $1.2 per M tokens Try chat: Get API key: Pay-as-you-go via

Cerebras Code just got an UPGRADE. It's now powered by GLM 4.6 Pro Plans ($50): 300k ▶️ 1M TPM @ 24M Tokens/day Max Plans ($200): 400k ▶️ 1.5M TPM @ 120M Tokens/day Fastest GLM provider on the planet at 1000 tokens/s and at 131K context. Get yours before we run out 👇

No more waitlist – Cerebras inference API is open to all! 1M free tokens/day 20x GPU speed Reasoning in ~1 second It's time to build!

Cerebras Code Plans are open for business and massively upgraded. Pro Plans ($50): 165k▶️300k TPM @ 24M Tokens/day Max Plans ($200): 300k▶️400k TPM @ 120M Tokens/day More vibing, more tokens. Keep sending us feedback!

Perplexity Pro is Now Powered by Cerebras. Perplexity Sonar, now running on Cerebras Inference, delivers answers at an unprecedented 1,200 tokens/s – 10x faster than comparable models​.

We built Death by Diet Coke in less than 30 seconds using Cerebras Code Pro, where you get higher rate limits and more power for Qwen3-Coder. We are opening the same number of Cerebras Code Pro/Max plans as diet coke cans in the office. First come, first serve.

Videos

Watch Anya Live

Gemma 4 31B is now available in Public Preview on Cerebras. Our first multimodal model runs at over 1,800 tokens/s for ultra-fast image and text workflows. Give it a try:

NVIDIA paid $20B for Groq. AWS partnered with Cerebras for the same purpose. A quick breakdown of why disaggregated inference is the next thing in AI infrastructure.

We gave two agents the same task: “Find images matching this description.” Both use Gemma 4 31B. One runs on Cerebras. The other runs on GPUs. You can see the difference. Speed changes the product experience. What would you build if you didn't have to wait?

OpenAI Codex-Spark powered by Cerebras You can now just build things faster—at 1,000 tokens/s.

After 9 years at NVIDIA, James Wang left and joined Cerebras. In this Big Chip Club episode, James Wang breaks down the bottlenecks of NVIDIA GPUs, and what's keeping them OUT OF first place... Drop your follow-up questions below 👇

Everyone talks about our hardware @Cerebras. Few notice the software. Ryan Loney breaks down the hidden optimizations powering 20× faster LLM inference than GPUs, speculative decoding, token reuse, and why we’re just getting started. Watch the full story here

Introducing Cerebras Inference ‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs ‣ 60c per M tokens – a fifth the price of hyperscalers ‣ Full 16-bit precision for full model accuracy ‣ Generous rate limits for devs Try now:

In pick the best model > improve your prompts > catch bugs with Braintrust and Cerebras inference. Avoid the AI deleting your entire codebase this halloween and remember, our free tier gets you 1M+ free toks/day per model.

"The hardware lottery got worse." Six years after Sara Hooker's landmark essay, she's even more convinced that our current chips are constraining which AI ideas succeed — and which never get a chance.

"I used a billion tokens this week. I'm not even in the top 100 Codex users at OpenAI." We sat down with jason (creator of Instructor, now on OpenAI's Developer Experience team) to talk about how zero-latency inference is changing the way engineers work.

GLM-4.7 from Z.ai is live on Cerebras! - Frontier intelligence for coding, tool-driven agents, and multi-turn reasoning - Record coding speed: ~1,000 tokens per second (up to 1,700 TPS for other uses) - Strong price-performance: ~10x higher than Sonnet 4.5

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama *3B* - It's like the perf of a new hardware generation in a single software release Available now at

Perplexity Pro is Now Powered by Cerebras. Perplexity Sonar, now running on Cerebras Inference, delivers answers at an unprecedented 1,200 tokens/s – 10x faster than comparable models.

🚨 Cerebras Inference is now 3x faster: Llama3.1-70B just broke 2,100 tokens/s - 16x faster than the fastest GPU solution - 8x faster than GPUs running Llama 3B - It's like the perf of a new hardware generation in a single software release Available now at