Loading video...
Video Failed to Load
Llama3.1 on a raspberry pi Thanks for Llamafile 😍
126,760 views • 1 year ago •via X (Twitter)
10 Comments

I used the @Raspberry_Pi 5 with the M.2 Hat with the Hailo AI module

@JustineTunney Now do 405B

@JustineTunney

@JustineTunney My Raspi5 runs Llama3.1 8B (Q4) at just over 1.8 toks/sec, painful if you're waiting for an answer. Qwen2 1.5B (Q4) gave me a pretty good answer for the same example at over 8 toks/sec. We're getting there!

@JustineTunney Incremental progress!

@JustineTunney That’s pretty cool! Btw if you have a Mac l, you can stream the same model (4-bit quant) much faster using fastMLX (+100 tokens/s for M3 Max 96GB) You can even connect multiple Pis to the same server and run parallel requests :)

@JustineTunney Definitely will experiment! Thanks Prince!

@JustineTunney Coolest thing on internet today.

@JustineTunney 🫡

@JustineTunney I love to see that SLM is getting more and more great possibilities. To think that an SLM as excellent as Llama 3.1 8b is already running on a Raspberry pi, I can't imagine where we'll be in a year's time. Great work
