
Cerebras
@cerebras • 54,933 subscribers
The world's fastest AI inference and training. Try the latest open models at: https://t.co/jREGhLI2nj
Shorts
Videos

GLM 4.7 is one of the strongest open-source coding models available—but most developers aren't prompting it correctly. We put together 10 rules to help you get the most out of it: - Front-load instructions (it has a strong recency bias) - Use firm language: "must" and "strictly" > soft suggestions - Break complex tasks into smaller steps - Disable reasoning for simple tasks, enable it for hard ones - Use critic agents for code review, QA, and validation - Pair it with a frontier model for the hardest 10% of workloads - and more… GLM 4.7 hits 96% on Tau² Bench and 86% on GPQA Diamond. At 1,500 tokens/sec on Cerebras, it's 20x faster than closed-source alternatives on GPUs.
Cerebras633,565 views • 4 months ago

🎁 We're giving away 5 Windsurf plans ($250 credit each)! Try SWE-1.6 — Cognition’s latest fast and intelligent agentic coding model, powered by Cerebras. In a side-by-side with Claude, the speed difference is clear. More iterations, faster fixes, better code. 💬Comment why you want access to enter. Five winners will be selected at random within 48 hours.
Cerebras104,019 views • 27 days ago

OpenAI Codex-Spark powered by Cerebras You can now just build things faster—at 1,000 tokens/s.
Cerebras287,124 views • 3 months ago

Multi-LoRA is in private preview on Cerebras Inference. Deploy one base model alongside a library of LoRA adapters. Switch between them per request, with no reloading, no separate deployments, and no latency cost. Available now for dedicated endpoint users. Reach out to your account rep to get access.
Cerebras20,208 views • 7 days ago

Some of our top customers are still choosing Llama 3.1 8B. For a while, we jumped to whatever hottest, latest model was taking up our twitter feed. 🙈 But as we are quickly realizing, to create a SOTA product, you need a model that fits your exact use case. Here’s what our customers tell us: > a lot of the legwork is actually around prompting > there’s an art to selecting and combining multiple models > benchmarks only show part of the picture. you have to understand the unique quirks of each model. Especially as model releases become more and more frequent, we need a clear way to evaluate new models. We have to break free of the naive trend to migrate to the ‘latest and greatest’. And you can easily achieve this using tools like Cerebras and Braintrust to swap models safely (without breaking production).
Cerebras346,446 views • 5 months ago

Fully homomorphic encryption was invented in the 1980s. Why wasn't it adopted sooner? A 100,000x slowdown, driven by memory boundedness. Ajay Joshi from CipherSonic AI explains how his team got it down to less than 2x. (if this pattern sounds familar... LLM inference is memory-bound too. It's why wafer-scale exists.)
Cerebras57,072 views • 28 days ago

Cerebras Code: 20x faster than Claude, 1x the price Today we are launching two monthly coding plans: ➡️Cerebras Code Pro: $50/m – for indie developers ➡️Cerebras Code Max: $200/m – for power users with 5x rate limits Both plans get: Qwen3-Coder at 2,000 tokens/s, 131K context, and no weekly limits. Sign up now:
Cerebras460,888 views • 10 months ago

Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models? Mixture of Experts (MoE) is changing how the biggest AI models are built — but it’s still hard to get right. That's why we are launching a new MoE 101 series, led by Daria Soboleva to bridge the gap between theory and practice. Dive in to our MoE guide:
Cerebras344,469 views • 10 months ago

what can you build with gpt-5.3-codex-spark? jason liu from OpenAI demos 3 real workflows — ones you can set up yourself inside the Codex app to help you spend less time on overhead and more time building. 00:09 – what is gpt-5.3-codex-spark? 00:25 – workflow 1: multi-agent daily briefing from slack, drive & meets 01:06 – workflow 2: automated PR review 01:31 – workflow 3: real-time interactive coding 02:56 – what speed changes, and what's coming next
Cerebras23,633 views • 2 months ago

📣 ANNOUNCEMENT DAY AT CEREBRAS 📣 Today, we are thrilled to share some of the biggest announcements in our company’s history. 📢 Cerebras announces CS-3, the world’s fastest AI Chip with a whopping 4 trillion transistors 📢 Cerebras selects Qualcomm to deliver unprecedented performance in AI Inference 📢 Cerebras and G42 break ground on Condor Galaxy 3, an 8 exaFLOPs AI Supercomputer Read all about it! 📰 CS-3 Press Release: 📰 Cerebras + Qualcomm Press Release: 📰 Condor Galaxy 3 Press Release: #AI #Supercomputer #ExaFLOPs #ML #Training #Inference
Cerebras128,510 views • 2 years ago

GLM 4.7 is one of the top open-source models on LM Arena—and it's going toe-to-toe with Claude Opus 4.5 and Gemini Pro. We sat down with Anastasios Nikolas Angelopoulos, co-founder and CEO of Arena.ai, to break down 8,000+ developer votes: → Within 30 points of Gemini Pro in math & coding → Frontier-level multi-turn & instruction following → The open-source model devs are actually switching to The best part? You can run it at 1,500+ tokens/sec on Cerebras—for free.
Cerebras28,515 views • 4 months ago