
Cerebras
@cerebras • 54,933 subscribers
The world's fastest AI inference and training. Try the latest open models at: https://t.co/jREGhLI2nj
Shorts
Videos

GLM 4.7 is one of the strongest open-source coding models available—but most developers aren't prompting it correctly. We put together 10 rules to help you get the most out of it: - Front-load instructions (it has a strong recency bias) - Use firm language: "must" and "strictly" > soft suggestions - Break complex tasks into smaller steps - Disable reasoning for simple tasks, enable it for hard ones - Use critic agents for code review, QA, and validation - Pair it with a frontier model for the hardest 10% of workloads - and more… GLM 4.7 hits 96% on Tau² Bench and 86% on GPQA Diamond. At 1,500 tokens/sec on Cerebras, it's 20x faster than closed-source alternatives on GPUs.
Cerebras633,565 просмотров • 4 месяцев назад

🎁 We're giving away 5 Windsurf plans ($250 credit each)! Try SWE-1.6 — Cognition’s latest fast and intelligent agentic coding model, powered by Cerebras. In a side-by-side with Claude, the speed difference is clear. More iterations, faster fixes, better code. 💬Comment why you want access to enter. Five winners will be selected at random within 48 hours.
Cerebras104,019 просмотров • 27 дней назад

OpenAI Codex-Spark powered by Cerebras You can now just build things faster—at 1,000 tokens/s.
Cerebras287,124 просмотров • 3 месяцев назад

Multi-LoRA is in private preview on Cerebras Inference. Deploy one base model alongside a library of LoRA adapters. Switch between them per request, with no reloading, no separate deployments, and no latency cost. Available now for dedicated endpoint users. Reach out to your account rep to get access.
Cerebras20,208 просмотров • 7 дней назад

Some of our top customers are still choosing Llama 3.1 8B. For a while, we jumped to whatever hottest, latest model was taking up our twitter feed. 🙈 But as we are quickly realizing, to create a SOTA product, you need a model that fits your exact use case. Here’s what our customers tell us: > a lot of the legwork is actually around prompting > there’s an art to selecting and combining multiple models > benchmarks only show part of the picture. you have to understand the unique quirks of each model. Especially as model releases become more and more frequent, we need a clear way to evaluate new models. We have to break free of the naive trend to migrate to the ‘latest and greatest’. And you can easily achieve this using tools like Cerebras and Braintrust to swap models safely (without breaking production).
Cerebras346,446 просмотров • 5 месяцев назад

Fully homomorphic encryption was invented in the 1980s. Why wasn't it adopted sooner? A 100,000x slowdown, driven by memory boundedness. Ajay Joshi from CipherSonic AI explains how his team got it down to less than 2x. (if this pattern sounds familar... LLM inference is memory-bound too. It's why wafer-scale exists.)
Cerebras57,072 просмотров • 28 дней назад

Cerebras Code: 20x faster than Claude, 1x the price Today we are launching two monthly coding plans: ➡️Cerebras Code Pro: $50/m – for indie developers ➡️Cerebras Code Max: $200/m – for power users with 5x rate limits Both plans get: Qwen3-Coder at 2,000 tokens/s, 131K context, and no weekly limits. Sign up now:
Cerebras460,888 просмотров • 10 месяцев назад

Everyone talks about our hardware @Cerebras. Few notice the software. Ryan Loney breaks down the hidden optimizations powering 20× faster LLM inference than GPUs, speculative decoding, token reuse, and why we’re just getting started. Watch the full story here
Cerebras222,782 просмотров • 4 месяцев назад

Let's talk about MoE: 🔶 How many experts should you use? 🔶 How does dynamic routing actually behave in production? 🔶 How do you debug a model that won’t train? 🔶 What does 8x7B actually mean for memory and compute? 🔶 What hardware optimizations matter for sparse models? Mixture of Experts (MoE) is changing how the biggest AI models are built — but it’s still hard to get right. That's why we are launching a new MoE 101 series, led by Daria Soboleva to bridge the gap between theory and practice. Dive in to our MoE guide:
Cerebras344,469 просмотров • 10 месяцев назад

GLM-4.7 from Z.ai is live on Cerebras! - Frontier intelligence for coding, tool-driven agents, and multi-turn reasoning - Record coding speed: ~1,000 tokens per second (up to 1,700 TPS for other uses) - Strong price-performance: ~10x higher than Sonnet 4.5
Cerebras134,655 просмотров • 4 месяцев назад

what can you build with gpt-5.3-codex-spark? jason liu from OpenAI demos 3 real workflows — ones you can set up yourself inside the Codex app to help you spend less time on overhead and more time building. 00:09 – what is gpt-5.3-codex-spark? 00:25 – workflow 1: multi-agent daily briefing from slack, drive & meets 01:06 – workflow 2: automated PR review 01:31 – workflow 3: real-time interactive coding 02:56 – what speed changes, and what's coming next
Cerebras23,633 просмотров • 2 месяцев назад

📣 ANNOUNCEMENT DAY AT CEREBRAS 📣 Today, we are thrilled to share some of the biggest announcements in our company’s history. 📢 Cerebras announces CS-3, the world’s fastest AI Chip with a whopping 4 trillion transistors 📢 Cerebras selects Qualcomm to deliver unprecedented performance in AI Inference 📢 Cerebras and G42 break ground on Condor Galaxy 3, an 8 exaFLOPs AI Supercomputer Read all about it! 📰 CS-3 Press Release: 📰 Cerebras + Qualcomm Press Release: 📰 Condor Galaxy 3 Press Release: #AI #Supercomputer #ExaFLOPs #ML #Training #Inference
Cerebras128,510 просмотров • 2 лет назад

GLM 4.7 is one of the top open-source models on LM Arena—and it's going toe-to-toe with Claude Opus 4.5 and Gemini Pro. We sat down with Anastasios Nikolas Angelopoulos, co-founder and CEO of Arena.ai, to break down 8,000+ developer votes: → Within 30 points of Gemini Pro in math & coding → Frontier-level multi-turn & instruction following → The open-source model devs are actually switching to The best part? You can run it at 1,500+ tokens/sec on Cerebras—for free.
Cerebras28,515 просмотров • 4 месяцев назад