Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Quantized Gemma 2B runs pretty fast on my iPhone 15 pro in MLX Swift. code & docs: Comparable to GPT 3.5 turbo and Mixtral 8x7B in LMSYS Org benchmarks but runs efficiently on an iPhone. Pretty wild.

Awni Hannun

35,004 subscribers

79,702 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Logan Kilpatrick

Logan Kilpatrickvor 1 Jahr

@lmsysorg Cost of intelligence takes another hit today : )

Profilbild von Christian Schoppe

Christian Schoppevor 1 Jahr

@lmsysorg I have the 6 bit quantized version running on my Pixel. Not quite as fast as yours but still quite usable. After a few initial tests, I still prefer Phi-3-mini.

Profilbild von Eric Hartford

Eric Hartfordvor 1 Jahr

@lmsysorg that's awesome!

Profilbild von Kirito (e/acc) 🏴‍☠️

Kirito (e/acc) 🏴‍☠️vor 1 Jahr

@lmsysorg Great work we all saw it coming - privacy and intelligence at the palm of your hand

Profilbild von Rami El-Masri

Rami El-Masrivor 1 Jahr

@lmsysorg Running advanced models like Gemma 2B efficiently on mobile devices is a game-changing milestone.

Profilbild von Tris Warkentin

Tris Warkentinvor 1 Jahr

@lmsysorg What an incredible demo -- speed and quality are very impressive. Now to work on accessibility =)

Profilbild von NFTPerks 🇵🇹

NFTPerks 🇵🇹vor 1 Jahr

@lmsysorg awesome

Profilbild von Stavros Kassinos

Stavros Kassinosvor 1 Jahr

@lmsysorg 🚀🚀

Profilbild von Mani

Manivor 1 Jahr

@lmsysorg Is it 4bit quantized?

Profilbild von Awni Hannun

Awni Hannunvor 1 Jahr

@lmsysorg Yes

Ähnliche Videos

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Awni Hannun

215,529 Aufrufe • vor 9 Monaten

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

Adrien Grondin

230,383 Aufrufe • vor 3 Monaten

QLoRA fine-tuning 4-bit Gemma 2B on iPhone 15 Pro with MLX Swift. A nice size for fine-tuning on device, getting 70-100 toks/sec depending on the batch. Guide here:

QLoRA fine-tuning 4-bit Gemma 2B on iPhone 15 Pro with MLX Swift. A nice size for fine-tuning on device, getting 70-100 toks/sec depending on the batch. Guide here:

Awni Hannun

19,530 Aufrufe • vor 2 Jahren

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

Prince Canuma

60,974 Aufrufe • vor 4 Monaten

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm:

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm:

Awni Hannun

186,641 Aufrufe • vor 11 Monaten

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

Awni Hannun

68,539 Aufrufe • vor 9 Monaten

Pretty cool that with the new Qwen 2.5 models you can ask questions / generate using a reasonably sized code-base as context, all running on a laptop with mlx-lm. The 7B runs pretty fast on an M4 Max using the mlx-lm code base (~16k lines) as context:

Pretty cool that with the new Qwen 2.5 models you can ask questions / generate using a reasonably sized code-base as context, all running on a laptop with mlx-lm. The 7B runs pretty fast on an M4 Max using the mlx-lm code base (~16k lines) as context:

Awni Hannun

27,442 Aufrufe • vor 1 Jahr

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Awni Hannun

581,723 Aufrufe • vor 2 Jahren

GLM 5.2 runs pretty fast on Modal.

GLM 5.2 runs pretty fast on Modal.

Charles 🎉 Frye

20,251 Aufrufe • vor 5 Tagen

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

Prince Canuma

44,103 Aufrufe • vor 4 Monaten

Fortnite running at “Max” settings on iPhone 17 Pro It is VERY well optimized but runs hot (to be expected)

Fortnite running at “Max” settings on iPhone 17 Pro It is VERY well optimized but runs hot (to be expected)

🧊 Dub 🧊

726,333 Aufrufe • vor 1 Monat

Gemma 4 E2B on iPhone 17 Pro Max in AI Edge Gallery!

Gemma 4 E2B on iPhone 17 Pro Max in AI Edge Gallery!

Max Weinbach

177,545 Aufrufe • vor 2 Monaten

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

Akshay 🚀

84,288 Aufrufe • vor 10 Monaten

Check out this video on how to run Gemma 4 locally on an iPhone! It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.

Check out this video on how to run Gemma 4 locally on an iPhone! It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.

Google Gemma

398,433 Aufrufe • vor 2 Monaten

Running Qwen 3.5 4B on my iPhone 17 Pro Max Very smart & capable model for how small it is Very fast as well

Running Qwen 3.5 4B on my iPhone 17 Pro Max Very smart & capable model for how small it is Very fast as well

Ahmad

107,459 Aufrufe • vor 3 Monaten

Neural motion synthesis in Threejs/WebGPU, even runs on my 7y old Iphone! try it here: sauce:

Neural motion synthesis in Threejs/WebGPU, even runs on my 7y old Iphone! try it here: sauce:

Erik

106,696 Aufrufe • vor 2 Monaten

Alibaba’s Qwen 3.5 is now running fully on-device on the iPhone 17 Pro. It outperforms models 4x its size, delivers strong visual understanding, and lets you switch reasoning on or off. This demo uses the 2B 6-bit version, optimized with MLX for Apple Silicon.

Alibaba’s Qwen 3.5 is now running fully on-device on the iPhone 17 Pro. It outperforms models 4x its size, delivers strong visual understanding, and lets you switch reasoning on or off. This demo uses the 2B 6-bit version, optimized with MLX for Apple Silicon.

Hugging Models

202,723 Aufrufe • vor 3 Monaten

Moondream 2B running on an iPhone

Moondream 2B running on an iPhone

vik

33,412 Aufrufe • vor 1 Jahr

MLX Swift example can also QLoRA fine-tune Llama 3.2. Here's the 1B fine-tuning on my iPhone 15 Pro at > 150 toks/sec. A this rate only takes a few minutes to learn some decent adapters fully on-device.

MLX Swift example can also QLoRA fine-tune Llama 3.2. Here's the 1B fine-tuning on my iPhone 15 Pro at > 150 toks/sec. A this rate only takes a few minutes to learn some decent adapters fully on-device.

Awni Hannun

91,632 Aufrufe • vor 1 Jahr