Prince Canuma's banner

Prince Canuma

@Prince_Canuma • 22,525 subscribers

Apple MLX King 🤴🏽• Creator of (mlx-audio & mlx-vlm) • working on something new Ex-@arcee_ai • @neptune_ai • https://t.co/iZnxoefJBU

Shorts

Sam 3 by Facebook now on MLX 🚀 Here is a realtime object tracking running on M3 Max 96GB.

Sam 3 by Facebook now on MLX 🚀 Here is a realtime object tracking running on M3 Max 96GB.

180,321 次观看

You can now track single or multiple objects in videos completely locally using Sam3 + MLX

You can now track single or multiple objects in videos completely locally using Sam3 + MLX

63,364 次观看

Quick update on the water situation 💦 M3 Ultra and Titan (RTX6000 Pro) seem to have recovered with little to no visible damage. The main issues are with my MacBook which is in service and Titan CPU temperatures being above avg when idling (58C up from 35C prior to water incident). Anyways, here is a video of MLX-VLM serving Qwen3-4B-Instruct on Titan (~300 tok/s) to do autocomplete and git commit message generation completely locally via Zed IDE.

Quick update on the water situation 💦 M3 Ultra and Titan (RTX6000 Pro) seem to have recovered with little to no visible damage. The main issues are with my MacBook which is in service and Titan CPU temperatures being above avg when idling (58C up from 35C prior to water incident). Anyways, here is a video of MLX-VLM serving Qwen3-4B-Instruct on Titan (~300 tok/s) to do autocomplete and git commit message generation completely locally via Zed IDE.

17,065 次观看

LLaVA Llama-3 and Phi-3 now on MLX 🎉🚀 You can now run inference locally on your Mac. pip install -U mlx-vlm I’m getting ~50 tokens on a M3 Max. Model cards 👇🏾

LLaVA Llama-3 and Phi-3 now on MLX 🎉🚀 You can now run inference locally on your Mac. pip install -U mlx-vlm I’m getting ~50 tokens on a M3 Max. Model cards 👇🏾

74,448 次观看

Chatterbox Turbo by Resemble AI now on MLX 🚀🎉 You can now run it locally on your Mac and it supports voice cloning and emotion control. I'm getting 3.8x faster than real-time. > pip install -U mlx-audio Model collection 👇🏽

Chatterbox Turbo by Resemble AI now on MLX 🚀🎉 You can now run it locally on your Mac and it supports voice cloning and emotion control. I'm getting 3.8x faster than real-time. > pip install -U mlx-audio Model collection 👇🏽

24,953 次观看

Qwen2.5-VL now on MLX 🚀 You can now run inference or train it locally on your Mac using MLX. > pip install -U mlx-vlm Note: Video support is coming shortly :) Model cards 👇🏾

Qwen2.5-VL now on MLX 🚀 You can now run inference or train it locally on your Mac using MLX. > pip install -U mlx-vlm Note: Video support is coming shortly :) Model cards 👇🏾

29,952 次观看

FastMLX: Turn your powerful Mac into an AI home server 🚀 Using my M3 Max (96GB URAM) to run a VLM, streaming responses to my M1 MacBook Air over WiFi.🔥 > pip install -U fastmlx

FastMLX: Turn your powerful Mac into an AI home server 🚀 Using my M3 Max (96GB URAM) to run a VLM, streaming responses to my M1 MacBook Air over WiFi.🔥 > pip install -U fastmlx

36,558 次观看

NariLabs Dia-1.6B by Toby Kim now on MLX 🚀 Get started: > pip install -U mlx-audio Model Card 👇🏾

NariLabs Dia-1.6B by Toby Kim now on MLX 🚀 Get started: > pip install -U mlx-audio Model Card 👇🏾

24,057 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Excited to introduce Nativ 🚀 Run frontier open models locally on your Mac. No accounts, no subscriptions, no cloud. ⚡ Built on mlx-vlm — multimodal + fastest on Apple Silicon 🔒 100% private — every token generated on your machine 🛠 Plug your coding agents into a local endpoint 📊 Live telemetry — tokens/sec, memory, usage 🆓 100% open source, MIT licensed, free forever Your Mac is more capable than you think. Stop renting intelligence. Download for macOS 👇 Github repo 🌟

Excited to introduce Nativ 🚀 Run frontier open models locally on your Mac. No accounts, no subscriptions, no cloud. ⚡ Built on mlx-vlm — multimodal + fastest on Apple Silicon 🔒 100% private — every token generated on your machine 🛠 Plug your coding agents into a local endpoint 📊 Live telemetry — tokens/sec, memory, usage 🆓 100% open source, MIT licensed, free forever Your Mac is more capable than you think. Stop renting intelligence. Download for macOS 👇 Github repo 🌟

102,076 次观看 • 1 天前

DeepSeek-V4-Flash powering 4 parallel agents on Pi (by Mario Zechner) 🚀 Running on M3 Ultra at ~30-34 tok/s and 160-187GB peak URAM using MLX-LM. Special shoutout to clandestine.eth 🦇🔊, Pedro Cuenca, Tarjei Mandt, Ivan Fioravanti ᯅ and others for helping optimize and shape this PR. PR:

DeepSeek-V4-Flash powering 4 parallel agents on Pi (by Mario Zechner) 🚀 Running on M3 Ultra at ~30-34 tok/s and 160-187GB peak URAM using MLX-LM. Special shoutout to clandestine.eth 🦇🔊, Pedro Cuenca, Tarjei Mandt, Ivan Fioravanti ᯅ and others for helping optimize and shape this PR. PR:

109,941 次观看 • 2 个月前

Today we're shipping our biggest MLX-VLM release yet: v0.6.0 ...and we are raising 💸 This one's about turning your Apple devices into real local agent machines. From your desk to your pocket. What's new: ⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting. 🤖 Agent-ready server — native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models. 👁️ New models galore — DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more. 🎨 Image gen & editing — FLUX.2 (base + klein), PrismML Bonsai. 🔊 Audio in — Qwen3 Omni, Gemma 4 audio, base64 chat audio. 🧮 TurboQuant KV cache — RHT-correct fast paths for leaner memory. 📦 Modular server, better metrics, cleaner streaming. Run real agents on the hardware already in your hands. Github:

Today we're shipping our biggest MLX-VLM release yet: v0.6.0 ...and we are raising 💸 This one's about turning your Apple devices into real local agent machines. From your desk to your pocket. What's new: ⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting. 🤖 Agent-ready server — native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models. 👁️ New models galore — DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more. 🎨 Image gen & editing — FLUX.2 (base + klein), PrismML Bonsai. 🔊 Audio in — Qwen3 Omni, Gemma 4 audio, base64 chat audio. 🧮 TurboQuant KV cache — RHT-correct fast paths for leaner memory. 📦 Modular server, better metrics, cleaner streaming. Run real agents on the hardware already in your hands. Github:

65,678 次观看 • 1 个月前

Day 1 of 3 days of MLX: Introducing MLX-Audio-Swift SDK 🚀 A modular Swift SDK for voice agents and tasks on Apple Silicon built by Lucas Newman and yours truly. iOS, macOS, and visionOS developers can now build native apps with real-time, on-device audio intelligence: 🗣️ Text-to-Speech (TTS) 👂 Speech-to-Text (STT) 🔄 Speech-to-Speech (STS) 🎙️ Voice Activity Detection (VAD) and more. Only import the capabilities you need, nothing extra. Get started today and leave us a star ⭐️

Day 1 of 3 days of MLX: Introducing MLX-Audio-Swift SDK 🚀 A modular Swift SDK for voice agents and tasks on Apple Silicon built by Lucas Newman and yours truly. iOS, macOS, and visionOS developers can now build native apps with real-time, on-device audio intelligence: 🗣️ Text-to-Speech (TTS) 👂 Speech-to-Text (STT) 🔄 Speech-to-Speech (STS) 🎙️ Voice Activity Detection (VAD) and more. Only import the capabilities you need, nothing extra. Get started today and leave us a star ⭐️

160,927 次观看 • 4 个月前

Congratulations to the Cohere team on the release of Cohere Transcribe Arabic! 🎉 Runs natively on mlx-audio (Python + Swift) from day-0 🚀 What's inside: → 2B params, Conformer encoder-decoder (audio-in, text-out) → Large Conformer encoder for acoustic representations + lightweight Transformer decoder for token generation → Built for Arabic dialects and Arabic-English code-switching → Auto-resampling to 16kHz + stereo→mono handling built into preprocessing → Apache 2.0 — fully open for community research Currently topping the Open Universal Arabic ASR Leaderboard 🥇 Get started today: 🐍 Python uv pip install -U mlx-audio 🍎 Swift

Congratulations to the Cohere team on the release of Cohere Transcribe Arabic! 🎉 Runs natively on mlx-audio (Python + Swift) from day-0 🚀 What's inside: → 2B params, Conformer encoder-decoder (audio-in, text-out) → Large Conformer encoder for acoustic representations + lightweight Transformer decoder for token generation → Built for Arabic dialects and Arabic-English code-switching → Auto-resampling to 16kHz + stereo→mono handling built into preprocessing → Apache 2.0 — fully open for community research Currently topping the Open Universal Arabic ASR Leaderboard 🥇 Get started today: 🐍 Python uv pip install -U mlx-audio 🍎 Swift

16,454 次观看 • 14 天前

Next mlx-vlm release will ship with continuous batching support on the server 🚀 What's coming: → Continuous batching — new requests join the active batch immediately, no waiting. Mixed image + text batches supported → OpenAI-compatible API — field-for-field match with mlx-lm, reasoning/content split for thinking models, tag-aware streaming → Multi-turn tool calling — full tool use support across streaming and non-streaming, works with Gemma4 and other templates → Vision feature caching — cache image embeddings across turns. Gemma4: 228x speedup, Qwen3.5: 23x on cache hit All running locally on Apple Silicon. Check our this demo running 4 concurrent requests (mixed image + text) to gemma-4-26B-A4B-IT by Google Gemma in bf16 using Pi + MLX-VLM server on my M3 Ultra. One of the requests ingests a 8K resolution image!

Next mlx-vlm release will ship with continuous batching support on the server 🚀 What's coming: → Continuous batching — new requests join the active batch immediately, no waiting. Mixed image + text batches supported → OpenAI-compatible API — field-for-field match with mlx-lm, reasoning/content split for thinking models, tag-aware streaming → Multi-turn tool calling — full tool use support across streaming and non-streaming, works with Gemma4 and other templates → Vision feature caching — cache image embeddings across turns. Gemma4: 228x speedup, Qwen3.5: 23x on cache hit All running locally on Apple Silicon. Check our this demo running 4 concurrent requests (mixed image + text) to gemma-4-26B-A4B-IT by Google Gemma in bf16 using Pi + MLX-VLM server on my M3 Ultra. One of the requests ingests a 8K resolution image!

82,169 次观看 • 3 个月前

Congratulations to the Cohere team on releasing North Mini Code 1.0 🔥 We provide day-0 support on MLX thanks to our close collaboration with the Cohere team. North Mini Code 1.0 is a 30B param MoE model with 3B active. It runs at ~66 tok/s in BF16, truly impressive speeds before any compression. Here is a quick demo I built using Pi by Mario Zechner & Armin Ronacher ⇌ as my agent hardness. Get started today: 1. Install mlx-vlm from source: > uv pip install git+ 2. Start the server and power your favourite agents (pi, opencode, hermes and etc) > uv run mlx_vlm.server --model CohereLabs/North-Mini-Code-1.0 New release tomorrow

Congratulations to the Cohere team on releasing North Mini Code 1.0 🔥 We provide day-0 support on MLX thanks to our close collaboration with the Cohere team. North Mini Code 1.0 is a 30B param MoE model with 3B active. It runs at ~66 tok/s in BF16, truly impressive speeds before any compression. Here is a quick demo I built using Pi by Mario Zechner & Armin Ronacher ⇌ as my agent hardness. Get started today: 1. Install mlx-vlm from source: > uv pip install git+ 2. Start the server and power your favourite agents (pi, opencode, hermes and etc) > uv run mlx_vlm.server --model CohereLabs/North-Mini-Code-1.0 New release tomorrow

35,367 次观看 • 1 个月前

🎉 Congrats to MOSI + OpenMOSS on the release of MOSS-Transcribe-Diarize-0.9B — a genuinely impressive end-to-end ASR model for real-world, multi-speaker conversations. We're proud to partner with them for day-0 support on mlx-audio (Python & Swift). Now running local-first on Apple Silicon: 🎙️ transcript + speaker labels + timestamps in one pass 🗣️ unlimited-speaker diarization ⏱️ up to ~90 min of audio per input 🌍 90+ languages 🧠 128K context • Whisper-Medium encoder + Qwen3-0.6B decoder 🎯 hotword boosting for names, products & domain terms Perfect for meetings, podcasts, interviews, and long-form call analysis — no cloud, no data leaving your Mac. Get started now: 🐍 Python > uv pip install -U mlx-audio 🍎 Swift .package(url: " from: "0.1.3") Go build. 🚀

🎉 Congrats to MOSI + OpenMOSS on the release of MOSS-Transcribe-Diarize-0.9B — a genuinely impressive end-to-end ASR model for real-world, multi-speaker conversations. We're proud to partner with them for day-0 support on mlx-audio (Python & Swift). Now running local-first on Apple Silicon: 🎙️ transcript + speaker labels + timestamps in one pass 🗣️ unlimited-speaker diarization ⏱️ up to ~90 min of audio per input 🌍 90+ languages 🧠 128K context • Whisper-Medium encoder + Qwen3-0.6B decoder 🎯 hotword boosting for names, products & domain terms Perfect for meetings, podcasts, interviews, and long-form call analysis — no cloud, no data leaving your Mac. Get started now: 🐍 Python > uv pip install -U mlx-audio 🍎 Swift .package(url: " from: "0.1.3") Go build. 🚀

11,369 次观看 • 12 天前

DeepSeek-v4 now runs at ~23-26 tok/s on MLX! I made some custom kernels for the sinkhorn and it took gen speeds for 17 -> 26 tok/s. The weights are also significantly smaller thanks to Pedro Cuenca tip about keeping the experts in MXFP4! Now you can use it to power your local coding agents (PI, Open code, Hermes agent or even CC) PR:

DeepSeek-v4 now runs at ~23-26 tok/s on MLX! I made some custom kernels for the sinkhorn and it took gen speeds for 17 -> 26 tok/s. The weights are also significantly smaller thanks to Pedro Cuenca tip about keeping the experts in MXFP4! Now you can use it to power your local coding agents (PI, Open code, Hermes agent or even CC) PR:

58,451 次观看 • 2 个月前

Local transcription running on iPad Pro M1 🔥 Qwen3-ASR-0.6B from Qwen — fully on-device, no cloud, no API calls. Built with our new MLX-Audio-Swift SDK, hitting 25 tok/s at just 1.9GB of RAM. Test audio? Found Marc Lou's old sales call from one of his awesome newsletters sitting on my device. Try it out & drop us a ⭐️

Local transcription running on iPad Pro M1 🔥 Qwen3-ASR-0.6B from Qwen — fully on-device, no cloud, no API calls. Built with our new MLX-Audio-Swift SDK, hitting 25 tok/s at just 1.9GB of RAM. Test audio? Found Marc Lou's old sales call from one of his awesome newsletters sitting on my device. Try it out & drop us a ⭐️

81,427 次观看 • 5 个月前

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

118,352 次观看 • 8 个月前

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

60,974 次观看 • 4 个月前

Day 1 of 3 MLX Releases: Introducing MLX-Audio 🚀🔥 A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon. Features ⚡️Fast inference on Apple Silicon (M series chips) 🤖Multiple language support 🗣️Voice customization options 🚀Quantization support for optimized performance Supported models: 🪶Kokoro - A multilingual TTS model with 82M params that supports various languages and voice styles. With more models coming soon. Get started: > pip install mlx-audio Please leave us a star and send a PR :)

Day 1 of 3 MLX Releases: Introducing MLX-Audio 🚀🔥 A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon. Features ⚡️Fast inference on Apple Silicon (M series chips) 🤖Multiple language support 🗣️Voice customization options 🚀Quantization support for optimized performance Supported models: 🪶Kokoro - A multilingual TTS model with 82M params that supports various languages and voice styles. With more models coming soon. Get started: > pip install mlx-audio Please leave us a star and send a PR :)

123,480 次观看 • 1 年前

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

81,345 次观看 • 10 个月前

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

On-device realtime transcription on iPhone 15 Pro max 🚀 Using MLX-Audio-Swift + Qwen3-ASR-0.6B by Qwen It’s much faster and more consistent with the latest adjustments. Almost ready to push to GH.

44,103 次观看 • 5 个月前

RF-DETR by Roboflow now on MLX It can do realtime instance segmentation on-device and enable some cool use cases for visual analysis, monitoring and robotics like Reachy Mini. Also augmented VLM and VLA by preprocessing image and video with areas of interest. New release coming soon on mlx-vlm 🚀 For those who can’t wait you can install mlx-vlm from source.

RF-DETR by Roboflow now on MLX It can do realtime instance segmentation on-device and enable some cool use cases for visual analysis, monitoring and robotics like Reachy Mini. Also augmented VLM and VLA by preprocessing image and video with areas of interest. New release coming soon on mlx-vlm 🚀 For those who can’t wait you can install mlx-vlm from source.

30,499 次观看 • 3 个月前

Jarvis is can speak! 🚀 I’m running Chatterbox-Turbo from Resemble AI on my Mac using MLX-Audio as a server Now I’m gonna refine it and share it later 🔥 PS: don’t mind my voice it just came back 2 days ago 😅 Repo:

Jarvis is can speak! 🚀 I’m running Chatterbox-Turbo from Resemble AI on my Mac using MLX-Audio as a server Now I’m gonna refine it and share it later 🔥 PS: don’t mind my voice it just came back 2 days ago 😅 Repo:

47,714 次观看 • 7 个月前

Deepseek OCR now on MLX :) Thanks for the patience! I finally found and fixed the bug on the LM part that was stopping it from understanding the Vision Tokens. New release later today, probably alongside batch infer to allow you to process N documents at once.

Deepseek OCR now on MLX :) Thanks for the patience! I finally found and fixed the bug on the LM part that was stopping it from understanding the Vision Tokens. New release later today, probably alongside batch infer to allow you to process N documents at once.

56,286 次观看 • 8 个月前

Introducing MLX-Audio Studio 🚀 An open-source UI for audio gen. This new UI will allow you to easily generate and transcribe audio locally using MLX-Audio, Transformers or any other backend you prefer (i.e. OpenAI). We will be adding more tasks soon, stay tuned! Get started on our GH:

Introducing MLX-Audio Studio 🚀 An open-source UI for audio gen. This new UI will allow you to easily generate and transcribe audio locally using MLX-Audio, Transformers or any other backend you prefer (i.e. OpenAI). We will be adding more tasks soon, stay tuned! Get started on our GH:

51,071 次观看 • 8 个月前

🗓️ Release Week Recap Big week. mlx-audio and mlx-vlm are now among some of the fastest-growing OSS projects. Here’s what we shipped last week. Gemma 4 on Apple Silicon Two awesome releases by our partner Google DeepMind & Google Gemma : > Gemma 4 12B — their new dense, unified multimodal model. It uses an encoder free audio path and simplified vision encoder. > Gemma 4 QAT — quantization-aware training checkpoints, optimized to run locally on consumer GPUs and edge devices, compressing the model while preserving the quality you expect from Gemma 4. On the audio 🎧 side we added support for 15+ new TTS, ASR & VAD models, faster long-form transcription, and an expanded OpenAI-compatible audio server. All local on Apple Silicon. Huge thanks to every contributor and my co-maintainer Lucas Newman. 🙏🏽

🗓️ Release Week Recap Big week. mlx-audio and mlx-vlm are now among some of the fastest-growing OSS projects. Here’s what we shipped last week. Gemma 4 on Apple Silicon Two awesome releases by our partner Google DeepMind & Google Gemma : > Gemma 4 12B — their new dense, unified multimodal model. It uses an encoder free audio path and simplified vision encoder. > Gemma 4 QAT — quantization-aware training checkpoints, optimized to run locally on consumer GPUs and edge devices, compressing the model while preserving the quality you expect from Gemma 4. On the audio 🎧 side we added support for 15+ new TTS, ASR & VAD models, faster long-form transcription, and an expanded OpenAI-compatible audio server. All local on Apple Silicon. Huge thanks to every contributor and my co-maintainer Lucas Newman. 🙏🏽

11,731 次观看 • 1 个月前