Loading video...

Video Failed to Load

Go Home

Running GLM 4.7 Flash (8-bit) with Tensor Parallel / RDMA on 2 M4 Pro Mac Minis at 60 tok/sec. mlx-lm 0.30.5 features huge speedups for GLM 4.7 Flash for long context (h/t N8 Programs & Awni Hannun). M5 Pro (~28 Jan) will have ~4x faster prefill and ~1.3x faster decode.

56,555 views • 4 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos