Loading video...
Video Failed to Load
Running GLM 4.7 Flash (8-bit) with Tensor Parallel / RDMA on 2 M4 Pro Mac Minis at 60 tok/sec. mlx-lm 0.30.5 features huge speedups for GLM 4.7 Flash for long context (h/t N8 Programs & Awni Hannun). M5 Pro (~28 Jan) will have ~4x faster prefill and ~1.3x faster decode.
56,555 views • 4 months ago •via X (Twitter)
0 Comments
No comments available
Comments from the original post will appear here
