正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Quantized Gemma 2B runs pretty fast on my iPhone 15 pro in MLX Swift. code & docs: Comparable to GPT 3.5 turbo and Mixtral 8x7B in LMSYS Org benchmarks but runs efficiently on an iPhone. Pretty wild.

Awni Hannun

35,004 subscribers

79,702 次观看 • 2 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

Logan Kilpatrick 的头像

Logan Kilpatrick2 年前

@lmsysorg Cost of intelligence takes another hit today : )

Christian Schoppe 的头像

Christian Schoppe2 年前

@lmsysorg I have the 6 bit quantized version running on my Pixel. Not quite as fast as yours but still quite usable. After a few initial tests, I still prefer Phi-3-mini.

Eric Hartford 的头像

Eric Hartford2 年前

@lmsysorg that's awesome!

Kirito (e/acc) 🏴‍☠️ 的头像

Kirito (e/acc) 🏴‍☠️2 年前

@lmsysorg Great work we all saw it coming - privacy and intelligence at the palm of your hand

Rami El-Masri 的头像

Rami El-Masri2 年前

@lmsysorg Running advanced models like Gemma 2B efficiently on mobile devices is a game-changing milestone.

Tris Warkentin 的头像

Tris Warkentin2 年前

@lmsysorg What an incredible demo -- speed and quality are very impressive. Now to work on accessibility =)

NFTPerks 🇵🇹 的头像

NFTPerks 🇵🇹2 年前

@lmsysorg awesome

Stavros Kassinos 的头像

Stavros Kassinos2 年前

@lmsysorg 🚀🚀

Mani 的头像

Mani2 年前

@lmsysorg Is it 4bit quantized?

Awni Hannun 的头像

Awni Hannun2 年前

@lmsysorg Yes

相关视频

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Running Qwen3 8B thinking on an iPhone Air with MLX. The model is quantized to 4-bit and runs pretty well.

Awni Hannun

215,589 次观看 • 10 个月前

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

Adrien Grondin

230,466 次观看 • 4 个月前

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

Prince Canuma

60,974 次观看 • 5 个月前

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

GLM 4.6 runs quite fast on an M3 Ultra with mlx-lm even at higher precision. Pretty remarkable that it benchmarks competitive to the just-released Sonnet 4.5. Hope those benchmarks hold-up in day-to-day use. Here's a run using 5.5 bpw quantized model, generating 5.3k tokens at 17+ tok/sec using 244 GB. What prompts should I test?

Awni Hannun

68,539 次观看 • 10 个月前

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

Akshay 🚀

84,288 次观看 • 11 个月前

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It's quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to Gökdeniz Gülmez and Tarjei Mandt for the port.

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It's quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to Gökdeniz Gülmez and Tarjei Mandt for the port.

Awni Hannun

60,599 次观看 • 5 个月前

The latest Qwen 3 VL by Qwen running on iPhone 17 Pro with MLX Qwen 3 VL brings upgraded visual understanding, recognition, and OCR capabilities without sacrificing text performance like previous models The 4B model here is close to Qwen 2.5 VL 72B in many benchmarks

The latest Qwen 3 VL by Qwen running on iPhone 17 Pro with MLX Qwen 3 VL brings upgraded visual understanding, recognition, and OCR capabilities without sacrificing text performance like previous models The 4B model here is close to Qwen 2.5 VL 72B in many benchmarks

Adrien Grondin

109,700 次观看 • 9 个月前

I open sourced my custom tab bar and window splits library for Swift. Say hi to Bonsplit. CoreAnimation-driven animations, pretty flexible/configurable, feels right at home on the Mac and showcased in a demo app. Read the docs/see examples/get the code

I open sourced my custom tab bar and window splits library for Swift. Say hi to Bonsplit. CoreAnimation-driven animations, pretty flexible/configurable, feels right at home on the Mac and showcased in a demo app. Read the docs/see examples/get the code

almonk

28,458 次观看 • 6 个月前

Let’s check in on the Bridgeport Islanders-OH MY GOD Fair to say that the rivalry between Bridgeport and Hartford runs pretty deep when a skater and goalie drop the gloves

Let’s check in on the Bridgeport Islanders-OH MY GOD Fair to say that the rivalry between Bridgeport and Hartford runs pretty deep when a skater and goalie drop the gloves

Nicole Shirman

265,794 次观看 • 4 个月前

Congratulations to the Cohere team on the release of Cohere Transcribe Arabic! 🎉 Runs natively on mlx-audio (Python + Swift) from day-0 🚀 What's inside: → 2B params, Conformer encoder-decoder (audio-in, text-out) → Large Conformer encoder for acoustic representations + lightweight Transformer decoder for token generation → Built for Arabic dialects and Arabic-English code-switching → Auto-resampling to 16kHz + stereo→mono handling built into preprocessing → Apache 2.0 — fully open for community research Currently topping the Open Universal Arabic ASR Leaderboard 🥇 Get started today: 🐍 Python uv pip install -U mlx-audio 🍎 Swift

Congratulations to the Cohere team on the release of Cohere Transcribe Arabic! 🎉 Runs natively on mlx-audio (Python + Swift) from day-0 🚀 What's inside: → 2B params, Conformer encoder-decoder (audio-in, text-out) → Large Conformer encoder for acoustic representations + lightweight Transformer decoder for token generation → Built for Arabic dialects and Arabic-English code-switching → Auto-resampling to 16kHz + stereo→mono handling built into preprocessing → Apache 2.0 — fully open for community research Currently topping the Open Universal Arabic ASR Leaderboard 🥇 Get started today: 🐍 Python uv pip install -U mlx-audio 🍎 Swift

Prince Canuma

16,454 次观看 • 25 天前

Just got to try the AirPods Pro 3 and its new real-time, in-ear heart rate sensor with an outdoor walk workout. Here's a video of my heart rate being monitored in the Fitness app on iPhone 17 Pro #AppleEvent

Just got to try the AirPods Pro 3 and its new real-time, in-ear heart rate sensor with an outdoor walk workout. Here's a video of my heart rate being monitored in the Fitness app on iPhone 17 Pro #AppleEvent

Ray Wong

412,403 次观看 • 10 个月前

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I *have* to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I have to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

Simone Margaritelli

304,072 次观看 • 2 年前

Apple used "15" iPhone 17 Pro Max units to broadcast a Soccer match LIVE on Apple TV… WOW 15 iPhones placed across an 84,000-square-foot pitch capturing cinematic angles, closeups and tracking shots in real time. This is beyond smartphone cameras now!!!

Apple used "15" iPhone 17 Pro Max units to broadcast a Soccer match LIVE on Apple TV… WOW 15 iPhones placed across an 84,000-square-foot pitch capturing cinematic angles, closeups and tracking shots in real time. This is beyond smartphone cameras now!!!

TechDroider

1,447,459 次观看 • 2 个月前

WebGPU + Compute Shaders = 🤯 >500K stars runs at 60fps on my M1 Pro ⚡️ In my latest blog post, I walk through how to build this interactive galaxy simulation 🌌 Source code and live demo link in comments 👇🏻 #threejs #webgl #webgpu #shaders

WebGPU + Compute Shaders = 🤯 >500K stars runs at 60fps on my M1 Pro ⚡️ In my latest blog post, I walk through how to build this interactive galaxy simulation 🌌 Source code and live demo link in comments 👇🏻 #threejs #webgl #webgpu #shaders

Dan Greenheck

18,597 次观看 • 8 个月前

Here’s how I wake my Mac up remotely from my iPhone and connect in from anywhere. This is new in Astropad Workbench 1.1. We’ve made sleep handling much smarter. Small thing, but very useful when you need to check on a machine fast.

Here’s how I wake my Mac up remotely from my iPhone and connect in from anywhere. This is new in Astropad Workbench 1.1. We’ve made sleep handling much smarter. Small thing, but very useful when you need to check on a machine fast.

Matt Ronge

79,644 次观看 • 3 个月前

Sitting in the back of an Uber (XL Black) 6 minutes away from my destination And I decided to do a $100,000 bonus buy from my iPhone 17 pro on Stake.com… Even the driver was screaming when the bananas connected

Sitting in the back of an Uber (XL Black) 6 minutes away from my destination And I decided to do a $100,000 bonus buy from my iPhone 17 pro on Stake.com… Even the driver was screaming when the bananas connected

BTCs

115,811 次观看 • 9 天前

This is also an IMAX 70mm screen size—but it’s in my bed, in my hotel room, on a plane…anywhere I want. And it’s not projection—it’s micro-OLED, with pure blacks and HDR on Apple Vision Pro. No random people nearby to ruin the experience. It’s pretty damn awesome, tbh.

This is also an IMAX 70mm screen size—but it’s in my bed, in my hotel room, on a plane…anywhere I want. And it’s not projection—it’s micro-OLED, with pure blacks and HDR on Apple Vision Pro. No random people nearby to ruin the experience. It’s pretty damn awesome, tbh.

Alberto Carlier ᯅ

52,378 次观看 • 4 个月前

FBI RETURNS PHONES TO O’KEEFE 3.5 years later, the FBI has sent my two iPhones back. Included was a receipt for property from the United States Department of Justice, listing an iPhone "12 Pro Max" and an "iPhone A1921." The document was signed by FBI Special Agent Anthony Casola, the same agent who put me in handcuffs, in my apartment during the November 2021 raid. The last text messages on one of the phones were from November 6th, 2021, the day after the raid. Lawsuit forthcoming against the Department of Justice.

FBI RETURNS PHONES TO O’KEEFE 3.5 years later, the FBI has sent my two iPhones back. Included was a receipt for property from the United States Department of Justice, listing an iPhone "12 Pro Max" and an "iPhone A1921." The document was signed by FBI Special Agent Anthony Casola, the same agent who put me in handcuffs, in my apartment during the November 2021 raid. The last text messages on one of the phones were from November 6th, 2021, the day after the raid. Lawsuit forthcoming against the Department of Justice.

James O'Keefe

1,612,534 次观看 • 1 年前

Spent the last few days vibe coding on my NVIDIA DGX Spark. Here's what I learned. Qwen 3.5 122B took one minute and nine seconds to respond "Hi how are you doing". Unusable for vibe coding. Gemma 4 was fast but built a dot instead of a first person shooter game. GPT-OSS 120B was the sweet spot. Fast, capable, and actually produced working HTML. Open source models running locally are not replacing Claude Opus 4.6 or Codex with GPT 5.4. Not even close. But they're getting better every month. The new DGX Spark Bench is live on Real-world benchmarks for local models on local hardware. This is just the start. Full video below.

Spent the last few days vibe coding on my NVIDIA DGX Spark. Here's what I learned. Qwen 3.5 122B took one minute and nine seconds to respond "Hi how are you doing". Unusable for vibe coding. Gemma 4 was fast but built a dot instead of a first person shooter game. GPT-OSS 120B was the sweet spot. Fast, capable, and actually produced working HTML. Open source models running locally are not replacing Claude Opus 4.6 or Codex with GPT 5.4. Not even close. But they're getting better every month. The new DGX Spark Bench is live on Real-world benchmarks for local models on local hardware. This is just the start. Full video below.

BridgeMind

57,304 次观看 • 3 个月前