Charles 🎉 Frye's banner
Charles 🎉 Frye's profile picture

Charles 🎉 Frye

@charles_irl18,293 subscribers

memer of technical staff at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCb7OB

Shorts

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

18,561 görüntüleme

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

17,384 görüntüleme

nother banger in the pipeline btw

nother banger in the pipeline btw

12,600 görüntüleme

Videos

Daha fazla içerik yok.