Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

I did it! It works! Using GLM-4.7-4bit with mlx_lm.server and opencode to fix real code locally! 🔥 Here single M3 Ultra 512GB, nex step phase will be 2 using Tensor Parallelism and then apply same changes to exo. Prefill is slow on a single machine, but generation is good.

Ivan Fioravanti ᯅ

18,834 subscribers

44,000 görüntüleme • 6 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

GLM-4.7-8bit (350GB) running at 19 toks/s on two M3 Ultra 512GB using Tensor Parallelism with EXO - MLX, versus 14 toks/s with single node. 🚀 Now context benchmarking & then OpenCode tests 🔥 Note: this is from sources, I had to change things to run it.

GLM-4.7-8bit (350GB) running at 19 toks/s on two M3 Ultra 512GB using Tensor Parallelism with EXO - MLX, versus 14 toks/s with single node. 🚀 Now context benchmarking & then OpenCode tests 🔥 Note: this is from sources, I had to change things to run it.

Ivan Fioravanti ᯅ

327,687 görüntüleme • 6 ay önce

Running four high-level OpenCode agents + subagents with mlx_lm.server continuous batching and MiniMax M2.5 (6-bit). Fits easily on a 512GB M3 Ultra. Generation is quite fast. But prefill is still slow compared to cloud servers.

Running four high-level OpenCode agents + subagents with mlx_lm.server continuous batching and MiniMax M2.5 (6-bit). Fits easily on a 512GB M3 Ultra. Generation is quite fast. But prefill is still slow compared to cloud servers.

Awni Hannun

25,535 görüntüleme • 5 ay önce

First kinda working implementation of GLM 5.2 in DwarfStar. Will take some time to be good enough, but it is a promising start. 433 GB GGUF file. M3 Ultra 512GB.

First kinda working implementation of GLM 5.2 in DwarfStar. Will take some time to be good enough, but it is a promising start. 433 GB GGUF file. M3 Ultra 512GB.

antirez

69,419 görüntüleme • 1 ay önce

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It's quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to Gökdeniz Gülmez and Tarjei Mandt for the port.

GLM-5 runs with mlx-lm on a single 512GB M3 Ultra in Q4. It's quite good in my initial testing and pretty fast as well. It generated a highly functional space invaders game using 7.1k tokens at 15.4 tok/s and 419GB memory. Thanks to Gökdeniz Gülmez and Tarjei Mandt for the port.

Awni Hannun

60,599 görüntüleme • 5 ay önce

MLX GLM 5.2 Distributed on two M3 Ultra 512GB 🔥 One M3 Ultra: 18.8 tokens/sec Two M3 Ultra: 23.4 tokens/sec Context: - PR by Pedro Cuenca is still open and probably there is room for improvement: - basic generation test to measure decoding performance here, I will do a full context benchmarking once PR is more mature - nvfp4 quantization used - Video alternates standard speed and x20, with one Mac first and distributed later. Enjoy! 🙌🏻

MLX GLM 5.2 Distributed on two M3 Ultra 512GB 🔥 One M3 Ultra: 18.8 tokens/sec Two M3 Ultra: 23.4 tokens/sec Context: - PR by Pedro Cuenca is still open and probably there is room for improvement: - basic generation test to measure decoding performance here, I will do a full context benchmarking once PR is more mature - nvfp4 quantization used - Video alternates standard speed and x20, with one Mac first and distributed later. Enjoy! 🙌🏻

Ivan Fioravanti ᯅ

87,805 görüntüleme • 1 ay önce

Marketing will never be the same I made 10 ads in 10 minutes using ChatGPT's new native image generation feature to create: • Ads • Memes • Infographics • Cheatsheets • And so much more All it takes is one prompt Here's how I made each and every single ad:

Marketing will never be the same I made 10 ads in 10 minutes using ChatGPT's new native image generation feature to create: • Ads • Memes • Infographics • Cheatsheets • And so much more All it takes is one prompt Here's how I made each and every single ad:

Zain Kahn

35,342 görüntüleme • 1 yıl önce

I asked my clawdbot to send a letter to me in the mail… and it actually did it. I gave it a crypto wallet and some USDC to make purchases with, and clawdbot went and read the agent documentation on postalform. It figured out how to draft an order with a PDF of the letter it wrote to me – then it used Stripe’s Purl cli and paid for the order using Stripe's new Machine Payments protocol. This is a huge step forward for real-world agentic task completion and commerce

I asked my clawdbot to send a letter to me in the mail… and it actually did it. I gave it a crypto wallet and some USDC to make purchases with, and clawdbot went and read the agent documentation on postalform. It figured out how to draft an order with a PDF of the letter it wrote to me – then it used Stripe’s Purl cli and paid for the order using Stripe's new Machine Payments protocol. This is a huge step forward for real-world agentic task completion and commerce

Gabriel Garrett

36,135 görüntüleme • 5 ay önce

MiniMax-M2.1 running fully local in AWQ-4Bit with full context window (170 GB VRAM w full context) - 1000~ to 16,000~ tps prefill - 100~ tps generation speeds - Opencode It’s doing real work, updating my blog with little steering or specificity. The problem with local LLMs is that they require too much steering, this means baby sitting which I don’t have the time to do MiniMax cracked the cost, intelligence, and speed challenge, I would say this is a top tier model. I run frontier models like Gemini and it just fails to call tools, in this year lol… ——————— I think glm-4.?-air is needed still. We need a viable model at each hardware entry point, a Mac M1 Ultra 192GB? is relatively cheap 5k to be able to run this model at 40 tps is a huge societal unlock. Smaller models can be good but size matters :p

MiniMax-M2.1 running fully local in AWQ-4Bit with full context window (170 GB VRAM w full context) - 1000~ to 16,000~ tps prefill - 100~ tps generation speeds - Opencode It’s doing real work, updating my blog with little steering or specificity. The problem with local LLMs is that they require too much steering, this means baby sitting which I don’t have the time to do MiniMax cracked the cost, intelligence, and speed challenge, I would say this is a top tier model. I run frontier models like Gemini and it just fails to call tools, in this year lol… ——————— I think glm-4.?-air is needed still. We need a viable model at each hardware entry point, a Mac M1 Ultra 192GB? is relatively cheap 5k to be able to run this model at 40 tps is a huge societal unlock. Smaller models can be good but size matters :p

0xSero

23,836 görüntüleme • 6 ay önce

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

You can now vibecode your own WisprFlow or Monologue alternative that runs completely locally on Apple Silicon using MLX-Audio-Swift 🔥 Check out this live transcription of Dwarkesh Patel interview with Andrej Karpathy using Qwen3-ASR-0.6B quantized to 4bit on a M3 Max. It also runs in realtime on a iPhone 15 Pro and iPad Pro M1. No cloud. No API keys.

Prince Canuma

60,974 görüntüleme • 5 ay önce

opencode making a pong game in vite+react using (4bit) qwen/qwen3-235b-a22b-2507 locally, served by lmstudio. It used like 130GB of RAM, 0 issues with tool calls. This is completely usable locally now. Whether it's at claude level or not, idk yet, but I've no doubt we'll be there very soon if not. Anthropic better run fast to the Saudis for help.

opencode making a pong game in vite+react using (4bit) qwen/qwen3-235b-a22b-2507 locally, served by lmstudio. It used like 130GB of RAM, 0 issues with tool calls. This is completely usable locally now. Whether it's at claude level or not, idk yet, but I've no doubt we'll be there very soon if not. Anthropic better run fast to the Saudis for help.

SIGKITTEN

30,470 görüntüleme • 1 yıl önce

It's a bit early but still wanted to show Starter Kit City Builder for Godot Engine! The code is super simple, it works using Godot's GridMap component (basically tilemap for 3D objects). Will be free, open source and will come with new game assets!

It's a bit early but still wanted to show Starter Kit City Builder for Godot Engine! The code is super simple, it works using Godot's GridMap component (basically tilemap for 3D objects). Will be free, open source and will come with new game assets!

Kenney

104,858 görüntüleme • 2 yıl önce

For the chest and lapels, these layers can be attached to each other using a single-needle roll-padding machine, such as you see here. This is what you'll typically see on factory-made suits (this is a Strobel KA-ED machine). Happens both on the low- and high-end.

For the chest and lapels, these layers can be attached to each other using a single-needle roll-padding machine, such as you see here. This is what you'll typically see on factory-made suits (this is a Strobel KA-ED machine). Happens both on the low- and high-end.

derek guy

257,848 görüntüleme • 1 yıl önce

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

Akshay 🚀

176,428 görüntüleme • 7 ay önce

Fuck it, 685B parameter, DeepSeek V3 0324 running locally on M3 Ultra, fully private 🔥 Powered by llama.cpp & dynamic quants from Unsloth AI ⚡ Step 1: brew install llama.cpp Step 2: llama-cli -hf unsloth/DeepSeek-V3-0324-GGUF:Q2_K_XL That's it! 🤗 Honestly a bit surreal to be able to chat with such a chunky model at the touch of the keyboard - future is going to be wild!!

Fuck it, 685B parameter, DeepSeek V3 0324 running locally on M3 Ultra, fully private 🔥 Powered by llama.cpp & dynamic quants from Unsloth AI ⚡ Step 1: brew install llama.cpp Step 2: llama-cli -hf unsloth/DeepSeek-V3-0324-GGUF:Q2_K_XL That's it! 🤗 Honestly a bit surreal to be able to chat with such a chunky model at the touch of the keyboard - future is going to be wild!!

Vaibhav (VB) Srivastav

168,141 görüntüleme • 1 yıl önce

I’m building an assistant from scratch using OpenCode first step is iMessage integration. I could run the OpenCode webapp on my phone, but I want a group text with my wife where we can ask it stuff together Here’s a demo, so cool that I can open the same session in the TUI (filmed by my 8 yr old, be gracious)

I’m building an assistant from scratch using OpenCode first step is iMessage integration. I could run the OpenCode webapp on my phone, but I want a group text with my wife where we can ask it stuff together Here’s a demo, so cool that I can open the same session in the TUI (filmed by my 8 yr old, be gracious)

James Long

11,433 görüntüleme • 5 ay önce

🚨Notice! I highly recommend using the updated Smart Summons with FSD 14.3.3 It works great. It is very confident. It moves faster. It also stops nice and smooth now. I think you will find yourself using it all the time as you get used to it. I’ve tested it a few times today and very impressed. Here are few examples.

🚨Notice! I highly recommend using the updated Smart Summons with FSD 14.3.3 It works great. It is very confident. It moves faster. It also stops nice and smooth now. I think you will find yourself using it all the time as you get used to it. I’ve tested it a few times today and very impressed. Here are few examples.

Scott Dembraski

11,210 görüntüleme • 1 ay önce

I think I discovered the cheapest way to code with AI 😁 I tried GLM 4.5 on Claude Code and I'm really amazed how good this model is! I usually test with my nextjs crm app benchmark so I asked it to do that. Prompt "create a nextjs app which is a crm app. use sqlite for the database and use api routes to manage connecting to the database from the frontend" My observations - Generation feels like Sonnet 4 intelligence - Speed is great - It used create-next-app so it bootstrapped nextjs v15.5.2 - It created a sqlite database and used Prisma ORM and helped me easily add seed data - I needed a few prompts to iron out minor issues but all's good. It didn't struggle or got stuck. - Spent $0.50 for this app. Could be way cheaper with the Z.ai sub plans After trying, I can say GLM 4.5 model is up there as the best open source model for coding. Will continue using it and update here. So far so good! P.S. DM me if you want my referral link, otherwise just check Not dropping the link here so it doesn’t look like I’m hyping it. Here's a video of the experience 👇(sped up a bit)

I think I discovered the cheapest way to code with AI 😁 I tried GLM 4.5 on Claude Code and I'm really amazed how good this model is! I usually test with my nextjs crm app benchmark so I asked it to do that. Prompt "create a nextjs app which is a crm app. use sqlite for the database and use api routes to manage connecting to the database from the frontend" My observations - Generation feels like Sonnet 4 intelligence - Speed is great - It used create-next-app so it bootstrapped nextjs v15.5.2 - It created a sqlite database and used Prisma ORM and helped me easily add seed data - I needed a few prompts to iron out minor issues but all's good. It didn't struggle or got stuck. - Spent $0.50 for this app. Could be way cheaper with the Z.ai sub plans After trying, I can say GLM 4.5 model is up there as the best open source model for coding. Will continue using it and update here. So far so good! P.S. DM me if you want my referral link, otherwise just check Not dropping the link here so it doesn’t look like I’m hyping it. Here's a video of the experience 👇(sped up a bit)

Melvin Vivas

45,749 görüntüleme • 10 ay önce

You can use a 100% open source and MUCH cheaper/free alternative to Claude Code and Opus 4.5 OpenCode + MiniMax M2.1 can even build a 3D website using pure vibe coding. Steps are really simple: 1. Install OpenCode using the command 'npm i -g opencode-ai' 2. Get your MiniMax API key here: 3. Configure the MiniMax (official) mode Just type "opencode connect minimax" Coding plans start at $2… 10x cheaper than Claude Code (yes). You can also invite friends so they can get 10% off and you’ll get 10% API credits. (You can also use it locally if you have the config) And you're ready to build and iterate almost endlessly since the model is both way faster and cheaper than Opus in CC.

You can use a 100% open source and MUCH cheaper/free alternative to Claude Code and Opus 4.5 OpenCode + MiniMax M2.1 can even build a 3D website using pure vibe coding. Steps are really simple: 1. Install OpenCode using the command 'npm i -g opencode-ai' 2. Get your MiniMax API key here: 3. Configure the MiniMax (official) mode Just type "opencode connect minimax" Coding plans start at $2… 10x cheaper than Claude Code (yes). You can also invite friends so they can get 10% off and you’ll get 10% API credits. (You can also use it locally if you have the config) And you're ready to build and iterate almost endlessly since the model is both way faster and cheaper than Opus in CC.

Paul Couvert

42,443 görüntüleme • 6 ay önce

Need to chamber a round on an AK with a snapped-off charging handle? No problem. This is how you rack an AK using pure inertia. I don't know how safe it actually is, but it definitely works. Apparently, this is a known battlefield fix across Russia, Pakistan, Afghanistan, and the Middle East.

Need to chamber a round on an AK with a snapped-off charging handle? No problem. This is how you rack an AK using pure inertia. I don't know how safe it actually is, but it definitely works. Apparently, this is a known battlefield fix across Russia, Pakistan, Afghanistan, and the Middle East.

Gun Lovers Club

215,207 görüntüleme • 26 gün önce