Ivan Fioravanti ᯅ's banner
Ivan Fioravanti ᯅ's profile picture

Ivan Fioravanti ᯅ

@ivanfioravanti22,336 subscribers

GenAI/LLM addicted, Apple MLX, Cloud computing, Kubernetes, Technology Advisor, Investor and Co-Founder & Board Member of CoreView.

Shorts

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB

226,052 görüntüleme

It worked! Hermes Agent + Exo + Qwen3 Coder Next 8bit to create an incredible snake game, with model following 100% specifications passed in prompt! Let's load something bigger now!

It worked! Hermes Agent + Exo + Qwen3 Coder Next 8bit to create an incredible snake game, with model following 100% specifications passed in prompt! Let's load something bigger now!

159,666 görüntüleme

GLM-4.7 Flash by Z.ai running on M3 Ultra using MLX! 4bit at 81 toks/s 🔥 8bit at 64 toks/s 🔥 (video below) Model converted from transformers to MLX on the fly with Codex + GPT-5.2 HIgh!

GLM-4.7 Flash by Z.ai running on M3 Ultra using MLX! 4bit at 81 toks/s 🔥 8bit at 64 toks/s 🔥 (video below) Model converted from transformers to MLX on the fly with Codex + GPT-5.2 HIgh!

75,425 görüntüleme

This is on M3 Ultra; M5 Ultra will be crazy!

This is on M3 Ultra; M5 Ultra will be crazy!

98,227 görüntüleme

First real test on M4 Max 40GPU to transcribe 3 hours of video with mlx_whisper 0.4.1 - M4 Max 2:19 mins (Max Fan no throttling) - M4 Max 2:29 mins (System Fan throttling towards end) GPU freq goes 1.578 --> 952 Here the video (8x) showing throttling kicking in at the end.

First real test on M4 Max 40GPU to transcribe 3 hours of video with mlx_whisper 0.4.1 - M4 Max 2:19 mins (Max Fan no throttling) - M4 Max 2:29 mins (System Fan throttling towards end) GPU freq goes 1.578 --> 952 Here the video (8x) showing throttling kicking in at the end.

216,328 görüntüleme

Qwen 3 0.6B is Ultra powerful if fine-tuned! Here I got 83.5% on classification tasks! Only Gemini 2.5 Pro does better with 85%! Video is in real time on M3 Ultra!

Qwen 3 0.6B is Ultra powerful if fine-tuned! Here I got 83.5% on classification tasks! Only Gemini 2.5 Pro does better with 85%! Video is in real time on M3 Ultra!

143,105 görüntüleme

Another incredible p5js animation created by the new King: Sonnet 3.7 Thinking! It's a kind of magic!

Another incredible p5js animation created by the new King: Sonnet 3.7 Thinking! It's a kind of magic!

137,251 görüntüleme

Eulerian Fluid simulation test! Zero-shot! Opus 4.6 vs GPT-5.3 vs Gemini 3 Deep Think! My personal preference: 🥇 Gemini 3 Deep Think (really strong!) 🥈 Opus 4.6 🥉 GPT 5.3 High

Eulerian Fluid simulation test! Zero-shot! Opus 4.6 vs GPT-5.3 vs Gemini 3 Deep Think! My personal preference: 🥇 Gemini 3 Deep Think (really strong!) 🥈 Opus 4.6 🥉 GPT 5.3 High

38,277 görüntüleme

GLM-4.7 and MiniMax M2.1 side by side, both powered by Claude Code! Prompt below. Trick: one setting file per provider: - claude --settings ~/.claude/minimax_settings.json - claude --settings ~/.claude/zai-settings.json

GLM-4.7 and MiniMax M2.1 side by side, both powered by Claude Code! Prompt below. Trick: one setting file per provider: - claude --settings ~/.claude/minimax_settings.json - claude --settings ~/.claude/zai-settings.json

44,866 görüntüleme

Playing with Qwen3-TTS is and MLX-Audio locally on Mac Studio M3 Ultra 🔥 Amazing model by Qwen and great work by Prince Canuma bringing this magic to MLX! Tuning the voice with instructions feels like magic! Command and prompts to run it are in the video.

Playing with Qwen3-TTS is and MLX-Audio locally on Mac Studio M3 Ultra 🔥 Amazing model by Qwen and great work by Prince Canuma bringing this magic to MLX! Tuning the voice with instructions feels like magic! Command and prompts to run it are in the video.

35,720 görüntüleme

I'm finally entering the Google Gemini world too! I subscribed to Ultra so I could test Deep Think! But I hit a wall immediately 😢

I'm finally entering the Google Gemini world too! I subscribed to Ultra so I could test Deep Think! But I hit a wall immediately 😢

27,606 görüntüleme

Qwen-Image-Lightning on Apple Silicon! M3 Ultra 9 mins --> 42 seconds! 🔥 M4 Max 11mins --> 1:27 mins! 🤩 1st try: MPS + float32 11 mins 2nd try: MPS + bfloat16 9 mins 3rd try: MPS + bfloat16 + Qwen-Image-Lightning LoRA 42 seconds!!! 🤯 Here a video 8x

Qwen-Image-Lightning on Apple Silicon! M3 Ultra 9 mins --> 42 seconds! 🔥 M4 Max 11mins --> 1:27 mins! 🤩 1st try: MPS + float32 11 mins 2nd try: MPS + bfloat16 9 mins 3rd try: MPS + bfloat16 + Qwen-Image-Lightning LoRA 42 seconds!!! 🤯 Here a video 8x

51,001 görüntüleme

Qwen QwQ 32B fp16 on M4 Max and M2 Ultra powered by MLX! M2 Ultra - 10.2 toks/s M4 Max - 7.6 toks/s! "Create an amazing animation using p5js" o1-mini level local model! Note: use temp 0.7-0.75 for optimal results in coding. I did some tests and this

Qwen QwQ 32B fp16 on M4 Max and M2 Ultra powered by MLX! M2 Ultra - 10.2 toks/s M4 Max - 7.6 toks/s! "Create an amazing animation using p5js" o1-mini level local model! Note: use temp 0.7-0.75 for optimal results in coding. I did some tests and this

62,377 görüntüleme

DeepSeek R1 Qwen 7B 4bit M2 Ultra vs M4 Max on Apple MLX 🤫 Let them think... (video 4x in center part) M2 Ultra: 114.9 tokens per sec M4 Max (14"): 88.3 tokens per sec

DeepSeek R1 Qwen 7B 4bit M2 Ultra vs M4 Max on Apple MLX 🤫 Let them think... (video 4x in center part) M2 Ultra: 114.9 tokens per sec M4 Max (14"): 88.3 tokens per sec

59,734 görüntüleme

I love the ToolCall-15 so much, that I'm working on it to add some features: - mlx and lm-studio providers - batch calling - config params to test temp, top-p, top-k, min-p - quantization display in UI model name Here running on M5 Max 128GB 🚀 Thanks stevibe you rock!

I love the ToolCall-15 so much, that I'm working on it to add some features: - mlx and lm-studio providers - batch calling - config params to test temp, top-p, top-k, min-p - quantization display in UI model name Here running on M5 Max 128GB 🚀 Thanks stevibe you rock!

10,916 görüntüleme

🔥Apple MLX first 6bit model is on Hugging Face!🔥 Qwen2.5-Coder-32B-Instruct-6bit! 3bit conversion and test in progress! Video 8x below on M4 Max 40GPU: - Prompt: 38 tokens, 61.731 tokens-per-sec - Generation: 1181 tokens, 16.939 tokens-per-sec - Peak memory: 25.122 GB

🔥Apple MLX first 6bit model is on Hugging Face!🔥 Qwen2.5-Coder-32B-Instruct-6bit! 3bit conversion and test in progress! Video 8x below on M4 Max 40GPU: - Prompt: 38 tokens, 61.731 tokens-per-sec - Generation: 1181 tokens, 16.939 tokens-per-sec - Peak memory: 25.122 GB

46,494 görüntüleme

MLX Step-3.5-Flash! I've reached 45 toks/s! Thanks to Fast-MLX skill by Awni Hannun used withing Codex with GPT 5.2 High! From 13 toks/s v0 to 45 toks/s v2 🚀 but again I bet Tarjei Mandt will do even better 🙌🏻

MLX Step-3.5-Flash! I've reached 45 toks/s! Thanks to Fast-MLX skill by Awni Hannun used withing Codex with GPT 5.2 High! From 13 toks/s v0 to 45 toks/s v2 🚀 but again I bet Tarjei Mandt will do even better 🙌🏻

13,933 görüntüleme

Here Qwen 3 0.6B again on Italian wines 82.5%! The best commercial LLM I tested in the past was scoring 75.6% (Sonnet 3.5)

Here Qwen 3 0.6B again on Italian wines 82.5%! The best commercial LLM I tested in the past was scoring 75.6% (Sonnet 3.5)

25,367 görüntüleme

Videos