Loading video...

Video Failed to Load

Go Home

Qwen 32B (4-bit) generates at >40 toks/sec on an M4 Max with assisted decoding and Qwen 0.5B as the draft model. Coming soon to mlx-lm. Compare regular decoding (left) to assisted decoding (right):

50,353 views • 1 year ago •via X (Twitter)

11 Comments

N8 Programs's profile picture
N8 Programs1 year ago

WOW! How does this differ from my speculative decoding impl - what makes it so much faster? Cause this is awesome.

Lab4crypto's profile picture
Lab4crypto1 year ago

🚀 Don't gamble with your portfolio! Use our advanced hybrid quant risk tool using on/off-chain data daily and make informed decisions. 📈 Acess to 1000+ charts for your crypto journey. 📚Receive free weekly quant analysis. 📊+21 projects supported. 🏗️ Beginners and experts.

Ivan Fioravanti ᯅ's profile picture
Ivan Fioravanti ᯅ1 year ago

Super fast! 💪

Tay's profile picture
Tay1 year ago

Assisted decoding?

Awni Hannun's profile picture
Awni Hannun1 year ago

A small draft model is used to generate tokens which are then accepted or rejected by the main model depending on certain criteria. In this case the criteria is exact match.

Caleb's profile picture
Caleb1 year ago

Super cool 🤩

DS's profile picture
DS1 year ago

Apple intelligence so far: "siri can set an alarm even faster now!"

Mark Lord's profile picture
Mark Lord1 year ago

Try with the 2b model, set draft tokens to 31, and modify the wording of the prompt to “Write me a quick sort in C++. Don’t give me a preamble, just immediately write the code.” If it’s anything like my tests, I reckon you’ll squeeze a few more tokens/second 😁

SM's profile picture
SM1 year ago

Impressive! But do you think one can run diffusion models inference on phones?

Sohaib's profile picture
Sohaib1 year ago

Awesome!

Unclecode (Hossein)'s profile picture
Unclecode (Hossein)1 year ago

Interesting, It makes sense to be faster due to assusted coding definition, however did you try any eval? I wonder what are unpredictable effect of such decoding

Related Videos