Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Quantized Gemma 2B runs pretty fast on my iPhone 15 pro in MLX Swift. code & docs: Comparable to GPT 3.5 turbo and Mixtral 8x7B in LMSYS Org benchmarks but runs efficiently on an iPhone. Pretty wild.

Awni Hannun

35,004 subscribers

79,702 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

10 Yorum

Logan Kilpatrick profil fotoğrafı

Logan Kilpatrick1 yıl önce

@lmsysorg Cost of intelligence takes another hit today : )

Christian Schoppe profil fotoğrafı

Christian Schoppe1 yıl önce

@lmsysorg I have the 6 bit quantized version running on my Pixel. Not quite as fast as yours but still quite usable. After a few initial tests, I still prefer Phi-3-mini.

Eric Hartford profil fotoğrafı

Eric Hartford1 yıl önce

@lmsysorg that's awesome!

Kirito (e/acc) 🏴‍☠️ profil fotoğrafı

Kirito (e/acc) 🏴‍☠️1 yıl önce

@lmsysorg Great work we all saw it coming - privacy and intelligence at the palm of your hand

Rami El-Masri profil fotoğrafı

Rami El-Masri1 yıl önce

@lmsysorg Running advanced models like Gemma 2B efficiently on mobile devices is a game-changing milestone.

Tris Warkentin profil fotoğrafı

Tris Warkentin1 yıl önce

@lmsysorg What an incredible demo -- speed and quality are very impressive. Now to work on accessibility =)

NFTPerks 🇵🇹 profil fotoğrafı

NFTPerks 🇵🇹1 yıl önce

@lmsysorg awesome

Stavros Kassinos profil fotoğrafı

Stavros Kassinos1 yıl önce

@lmsysorg 🚀🚀

Mani profil fotoğrafı

Mani1 yıl önce

@lmsysorg Is it 4bit quantized?

Awni Hannun profil fotoğrafı

Awni Hannun1 yıl önce

@lmsysorg Yes

Benzer Videolar

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Awni Hannun

215,529 görüntüleme • 9 ay önce

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

Adrien Grondin

230,383 görüntüleme • 3 ay önce

QLoRA fine-tuning 4-bit Gemma 2B on iPhone 15 Pro with MLX Swift. A nice size for fine-tuning on device, getting 70-100 toks/sec depending on the batch. Guide here:

QLoRA fine-tuning 4-bit Gemma 2B on iPhone 15 Pro with MLX Swift. A nice size for fine-tuning on device, getting 70-100 toks/sec depending on the batch. Guide here:

Awni Hannun

19,530 görüntüleme • 2 yıl önce

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

Prince Canuma

60,974 görüntüleme • 4 ay önce

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm:

A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm:

Awni Hannun

186,641 görüntüleme • 11 ay önce

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

Awni Hannun

68,539 görüntüleme • 9 ay önce

Pretty cool that with the new Qwen 2.5 models you can ask questions / generate using a reasonably sized code-base as context, all running on a laptop with mlx-lm. The 7B runs pretty fast on an M4 Max using the mlx-lm code base (~16k lines) as context:

Pretty cool that with the new Qwen 2.5 models you can ask questions / generate using a reasonably sized code-base as context, all running on a laptop with mlx-lm. The 7B runs pretty fast on an M4 Max using the mlx-lm code base (~16k lines) as context:

Awni Hannun

27,442 görüntüleme • 1 yıl önce

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Awni Hannun

581,723 görüntüleme • 2 yıl önce

GLM 5.2 runs pretty fast on Modal.

GLM 5.2 runs pretty fast on Modal.

Charles 🎉 Frye

20,342 görüntüleme • 8 gün önce

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

Prince Canuma

44,103 görüntüleme • 4 ay önce

Fortnite running at “Max” settings on iPhone 17 Pro It is VERY well optimized but runs hot (to be expected)

Fortnite running at “Max” settings on iPhone 17 Pro It is VERY well optimized but runs hot (to be expected)

Dub

726,883 görüntüleme • 1 ay önce

Gemma 4 E2B on iPhone 17 Pro Max in AI Edge Gallery!

Gemma 4 E2B on iPhone 17 Pro Max in AI Edge Gallery!

Max Weinbach

177,545 görüntüleme • 2 ay önce

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

Akshay 🚀

84,288 görüntüleme • 10 ay önce

Check out this video on how to run Gemma 4 locally on an iPhone! It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.

Check out this video on how to run Gemma 4 locally on an iPhone! It runs completely offline and handles long context, meaning no data plan, no API calls, and no monthly fees required.

Google Gemma

398,433 görüntüleme • 2 ay önce

Running Qwen 3.5 4B on my iPhone 17 Pro Max Very smart & capable model for how small it is Very fast as well

Running Qwen 3.5 4B on my iPhone 17 Pro Max Very smart & capable model for how small it is Very fast as well

Ahmad

107,463 görüntüleme • 3 ay önce

Neural motion synthesis in Threejs/WebGPU, even runs on my 7y old Iphone! try it here: sauce:

Neural motion synthesis in Threejs/WebGPU, even runs on my 7y old Iphone! try it here: sauce:

Erik

106,860 görüntüleme • 2 ay önce

Alibaba’s Qwen 3.5 is now running fully on-device on the iPhone 17 Pro. It outperforms models 4x its size, delivers strong visual understanding, and lets you switch reasoning on or off. This demo uses the 2B 6-bit version, optimized with MLX for Apple Silicon.

Alibaba’s Qwen 3.5 is now running fully on-device on the iPhone 17 Pro. It outperforms models 4x its size, delivers strong visual understanding, and lets you switch reasoning on or off. This demo uses the 2B 6-bit version, optimized with MLX for Apple Silicon.

Hugging Models

202,723 görüntüleme • 3 ay önce

Moondream 2B running on an iPhone

Moondream 2B running on an iPhone

vik

33,412 görüntüleme • 1 yıl önce

MLX Swift example can also QLoRA fine-tune Llama 3.2. Here's the 1B fine-tuning on my iPhone 15 Pro at > 150 toks/sec. A this rate only takes a few minutes to learn some decent adapters fully on-device.

MLX Swift example can also QLoRA fine-tune Llama 3.2. Here's the 1B fine-tuning on my iPhone 15 Pro at > 150 toks/sec. A this rate only takes a few minutes to learn some decent adapters fully on-device.

Awni Hannun

91,632 görüntüleme • 1 yıl önce