
Ivan Fioravanti ᯅ
@ivanfioravanti • 22,336 subscribers
GenAI/LLM addicted, Apple MLX, Cloud computing, Kubernetes, Technology Advisor, Investor and Co-Founder & Board Member of CoreView.
Shorts
Videos

I'm sure many already knows: you can use DeepSeek V3 in Cursor right now: - add as OpenAI Base URL - use deepseek/deepseek-chat as model - use it in chat (not in composer) Here the combo created a snake game eating AI companies. 3 turns needed to get it working 1/2 🧵 Sonnet below
Ivan Fioravanti ᯅ153,359 次观看 • 1 年前

First video about MTPLX it works! 🔥 Gonna do my context benchmarks and some coding on it ASAP! In my very first test here: 🥇 M3 Ultra on the right: 33 --> 68 tps 🥈 M5 Max on the left: 27 --> 55 tps But M5 Max has 2 external monitors connected and is doing ton of stuff in parallel. Amazing job by Youssof Altoukhi here!
Ivan Fioravanti ᯅ14,102 次观看 • 1 个月前

Kimi K2.5 - Same Prompt - OpenCode vs KIMI CLI vs Claude Code, any differences? 🤔 Create a single-page website for "PHANTOM PROTOCOL" a fictional tactical shooter video game. Design capabilities of this model are out of scale! 🥇 Final results at the end of the video. 🧵
Ivan Fioravanti ᯅ44,652 次观看 • 4 个月前

oMLX is working really well as single machine inference engine for coding agents! Caching is managed perfectly (it can use a ton of disk space, be aware!) and oQ quantization delivers great results. Behind the scenes it uses the standard MLX building blocks (75% created by Prince Canuma 🙏): - mlx-lm - mlx-vlm - mlx-embeddings - mlx-audio I tested Qwen3.6-35B-A3B-oQ6 on M5 Max with two pi instances and it was fast and furious and leveraging cache like crazy as you can see in the video. Let me try to create some oQ versions (2,4,6?) of MiniMax M2.7 now and then I'll pass to distributed inference. I must win! 💪
Ivan Fioravanti ᯅ19,258 次观看 • 1 个月前

I did it! It works! Using GLM-4.7-4bit with mlx_lm.server and opencode to fix real code locally! 🔥 Here single M3 Ultra 512GB, nex step phase will be 2 using Tensor Parallelism and then apply same changes to exo. Prefill is slow on a single machine, but generation is good.
Ivan Fioravanti ᯅ44,000 次观看 • 5 个月前
