Loading video...

Video Failed to Load

Go Home

Got continuous batching working with SSMs in mlx-lm. Here's four OpenCode agents simultaneously running Nvidia's Nemotron Nano on 64GB M4 Max. This is a nice model for smaller machines since it's MoE + hybrid attention (small cache).

35,078 views • 5 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos