Loading video...
Video Failed to Load
DFlash v0.1.4 : custom Metal verify kernels for quantized Qwen3 hybrid models, plus significant peak memory reduction at long context. M5 Max 40-core GPU, 64GB, stock mlx_lm baseline: Qwen3.6-35B-A3B-4bit: ► @ 1024 · 138.3 → 300.3 tok/s (2.20x) ► @ 2048 · 135.6 → 246.4 tok/s (1.81x) ► @... show more
23,120 views • 2 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
