Charles 🎉 Frye's banner
Charles 🎉 Frye's profile picture

Charles 🎉 Frye

@charles_irl18,202 subscribers

memer of technical staff at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCb7OB

Shorts

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's David Wang's latest work with Zhijian Liu's DFlash technique in SGLang -- ~1k TPS!

18,561 views

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.

17,384 views

nother banger in the pipeline btw

nother banger in the pipeline btw

12,600 views

Videos

No more content to load