Baseten's banner

Baseten

@baseten • 15,365 subscribers

Inference is everything.

Shorts

We've launched the fastest GLM 5 API available at 190 TPS and 0.79 sec TTFT with the Baseten Inference Stack. Ready for your coding and agentic workflows.

We've launched the fastest GLM 5 API available at 190 TPS and 0.79 sec TTFT with the Baseten Inference Stack. Ready for your coding and agentic workflows.

19,549 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Today, we're introducing GLM-5.2 Fast: our GLM-5.2 Model API designed for the most demanding real-time use cases. The Fast tier delivers 2-3x higher TPS than the standard GLM-5.2 MAPI.

Today, we're introducing GLM-5.2 Fast: our GLM-5.2 Model API designed for the most demanding real-time use cases. The Fast tier delivers 2-3x higher TPS than the standard GLM-5.2 MAPI.

110,924 Aufrufe • vor 2 Tagen

Nobody knows what inference means but it's provocative

Nobody knows what inference means but it's provocative

496,940 Aufrufe • vor 1 Jahr

let there be inference

let there be inference

332,307 Aufrufe • vor 1 Jahr

DeepSeek-V3 dropped today and the LLM world just got turned upside down. Again. Early indicators are that this model completely transforms the closed and open-source model landscapes. Tl;Dr - OSS is now SOTA/Top3 again. Here are the key details to know: - Open source and licensed for commercial use - Beats Llamas, Qwens, GPT-4o, Sonnet 3.5 - MoE w/ 671B params, 37B active per token - 128K-token context window - Distilled o3-style reasoning Deeper dive in 🧵 This is one of the first models that need the horsepower of H200s GPUs, so we’re getting them ready to go. If you’re interested in running DeepSeek-V3, reach out to us about a dedicated deployment on H200s: h/t zhyncs for putting us on this early, Dhruv Singal for getting it running on H200s, and Philip Kiely for the demo!

DeepSeek-V3 dropped today and the LLM world just got turned upside down. Again. Early indicators are that this model completely transforms the closed and open-source model landscapes. Tl;Dr - OSS is now SOTA/Top3 again. Here are the key details to know: - Open source and licensed for commercial use - Beats Llamas, Qwens, GPT-4o, Sonnet 3.5 - MoE w/ 671B params, 37B active per token - 128K-token context window - Distilled o3-style reasoning Deeper dive in 🧵 This is one of the first models that need the horsepower of H200s GPUs, so we’re getting them ready to go. If you’re interested in running DeepSeek-V3, reach out to us about a dedicated deployment on H200s: h/t zhyncs for putting us on this early, Dhruv Singal for getting it running on H200s, and Philip Kiely for the demo!

69,598 Aufrufe • vor 1 Jahr

DeepSeek-R1 is blowing up right now, but we're not surprised. And not just because we’ve been working closely with DeepSeek to bring these models to production. We've been betting on powerful, open-source models like DeepSeek-R1 from day one. 1/n 🧵

DeepSeek-R1 is blowing up right now, but we're not surprised. And not just because we’ve been working closely with DeepSeek to bring these models to production. We've been betting on powerful, open-source models like DeepSeek-R1 from day one. 1/n 🧵

59,332 Aufrufe • vor 1 Jahr

🚀 Our "technical" marketer might not be looped in, but today is our biggest launch day yet. We're introducing two new products to serve the inference lifecycle: Model APIs and Training. Model APIs are frontier models running on the Baseten Inference Stack, purpose-built for production. Baseten Training (Beta) provides infra and tooling without limitations for AI models destined for production. Huge shoutout to the many partners and customers we've worked with as we built these two new products—more details below.

🚀 Our "technical" marketer might not be looped in, but today is our biggest launch day yet. We're introducing two new products to serve the inference lifecycle: Model APIs and Training. Model APIs are frontier models running on the Baseten Inference Stack, purpose-built for production. Baseten Training (Beta) provides infra and tooling without limitations for AI models destined for production. Huge shoutout to the many partners and customers we've worked with as we built these two new products—more details below.

35,207 Aufrufe • vor 1 Jahr

🚀 New Generally Available Whisper drop: The fastest, most accurate, and cost-effective transcription with over 1000x real-time factor for production AI workloads. 🚀 Our new Generally Available Whisper implementation delivers: 🏎️ Over 1000x real-time factor ✨ The lowest word error rate 💪 Production-grade reliability 🧩 Custom scaling and hardware per processing step 👉 See how in our blog: Reach out to get record-breaking performance for your mission-critical AI workloads!

🚀 New Generally Available Whisper drop: The fastest, most accurate, and cost-effective transcription with over 1000x real-time factor for production AI workloads. 🚀 Our new Generally Available Whisper implementation delivers: 🏎️ Over 1000x real-time factor ✨ The lowest word error rate 💪 Production-grade reliability 🧩 Custom scaling and hardware per processing step 👉 See how in our blog: Reach out to get record-breaking performance for your mission-critical AI workloads!

15,489 Aufrufe • vor 1 Jahr

Keine weiteren Inhalte verfügbar