Charles 🎉 Frye's banner

Charles 🎉 Frye

@charles_irl • 18,850 subscribers

memer of technical staff at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCb7OB

Shorts

Low-precision floats are weird. I have been building up my intuition by playing with them outside of inference/training. Adam Azzam and I cooked up this visualizer for micro-scaling/block quant formats like NVFP4, MXFP4, and friends. Try it:

Low-precision floats are weird. I have been building up my intuition by playing with them outside of inference/training. Adam Azzam and I cooked up this visualizer for micro-scaling/block quant formats like NVFP4, MXFP4, and friends. Try it:

13,028 görüntüleme

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

18,764 görüntüleme

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

17,452 görüntüleme

nother banger in the pipeline btw

nother banger in the pipeline btw

12,600 görüntüleme

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

GLM 5.2 runs pretty fast on Modal.

GLM 5.2 runs pretty fast on Modal.

Charles 🎉 Frye

20,342 görüntüleme • 1 ay önce

Dozens of teams have asked my advice on running LLMs. How fast is DeepSeek V3 with vLLM on 8 GPUs? What's the max throughput of Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor

Dozens of teams have asked my advice on running LLMs. How fast is DeepSeek V3 with vLLM on 8 GPUs? What's the max throughput of Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor

Charles 🎉 Frye

86,412 görüntüleme • 1 yıl önce

Announcing Twitter '95, an AI simulation of Twitter, if it had existed in 1995. - LLaMA 3.1 405B + FastAPI on Modal ✅ - Next.js app on Vercel ✅ - PostgreSQL on Supabase ✅ - Jordan dunking on Clinton ✅ - MVP written in 26 hours at crossover hackathon/marathon ✅

Announcing Twitter '95, an AI simulation of Twitter, if it had existed in 1995. - LLaMA 3.1 405B + FastAPI on Modal ✅ - Next.js app on Vercel ✅ - PostgreSQL on Supabase ✅ - Jordan dunking on Clinton ✅ - MVP written in 26 hours at crossover hackathon/marathon ✅

Charles 🎉 Frye

55,168 görüntüleme • 1 yıl önce

We've run thousands of LLM inference serving benchmarks at Modal. We're releasing the results so you don't have to. We're releasing the code so that you can. Introducing: The LLM Engineer's Almanac. Just in time for the AI Engineer World's Fair.

We've run thousands of LLM inference serving benchmarks at Modal. We're releasing the results so you don't have to. We're releasing the code so that you can. Introducing: The LLM Engineer's Almanac. Just in time for the AI Engineer World's Fair.

Charles 🎉 Frye

21,976 görüntüleme • 1 yıl önce

Daha fazla içerik yok.