Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

🔥 Thrilled to have worked with Google AI Developers on day-0 MLX support for Gemma 3 QAT! QAT optimizes models during training by simulating low-precision operations, delivering similar performance to FP16 and dramatic memory savings when quantised: • Gemma 3 27B: 54GB → 14.1GB (74% reduction) • Gemma 3... show more

Prince Canuma

5,996 subscribers

11,374 Aufrufe • vor 1 Jahr •via X (Twitter)

Bildung Wissenschaft & Technologie Kunst

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Pwnage

Pwnagevor 1 Jahr

Stormbreaker Max CF design to production Get the next generation of high end performance gaming mice. Shop now:

Profilbild von Prince Canuma

Prince Canumavor 1 Jahr

Thanks @osanseviero @reach_vb and the teams behind this amazing release ❤️

Profilbild von Mag Mario Jembrih

Mag Mario Jembrihvor 1 Jahr

@googleaidevs not seeing it in LM Studio, will it show up there too?

Profilbild von Prince Canuma

Prince Canumavor 1 Jahr

@googleaidevs Probably an update? @yagilb

Profilbild von Q

Qvor 1 Jahr

@googleaidevs Thank you for the efforts, also the Kimi thinking was great and fast but had some registration problmes needed to be bypassed with custom code, if not wrong still exist. But cheers for the efforts and speed.

Profilbild von Prince Canuma

Prince Canumavor 1 Jahr

@googleaidevs Could you share more about it? Perhaps open an issue

Profilbild von Ljubomir Josifovski

Ljubomir Josifovskivor 1 Jahr

Excellent stuff! About double the speed of a prior model of similar size, from recollection. 🤩Thanks for that! 🙏 Would you know if we are to expect Speculative Decoding in @lmstudio to work? I got the 27b, then downloaded the 1b and then 4b versions too. Trying to get them to show up in the "Speculative Decoding" "Draft Model" "Select a compatible draft model" dropdown. So far no luck, none of them show up in the dropdown in @lmstudio. (on m2 mbp 96gb ram) (pic of the exact models versions below)

Profilbild von Prince Canuma

Prince Canumavor 1 Jahr

@googleaidevs @lmstudio Not yet for VLMs only if you use them as text models

Profilbild von Jikku Jose

Jikku Josevor 1 Jahr

@googleaidevs Wish there was an intermediate model between 27B & 12B, lots of cards in that range!

Profilbild von Joe Burnett

Joe Burnettvor 1 Jahr

@googleaidevs I can’t wait to check it out!

Ähnliche Videos

We've partnered to bring more Gemma 3 quantized models to you! 🚀 We worked with Georgi Gerganov llama.cpp, LM Studio, MLX, ollama to make sure you can run it using your favorite tool! Gemma models optimized with QAT, reduce memory requirements while keeping quality! All models checkpoints are available on Hugging Face and Kaggle. 🤗 What does this mean? - Gemma 3 27B (int4): Fits on NVIDIA RTX 3090 (24GB VRAM) or similar. - Gemma 3 12B (int4): Only needs a NVIDIA RTX 4060 (8GB VRAM) or similar. - Gemma 3 4B, 1B (int4): Run anything with more than 2.5GB Memory. Want to see it in action? Video below shows how easy it is to get started using LMStudio:

We've partnered to bring more Gemma 3 quantized models to you! 🚀 We worked with Georgi Gerganov llama.cpp, LM Studio, MLX, ollama to make sure you can run it using your favorite tool! Gemma models optimized with QAT, reduce memory requirements while keeping quality! All models checkpoints are available on Hugging Face and Kaggle. 🤗 What does this mean? - Gemma 3 27B (int4): Fits on NVIDIA RTX 3090 (24GB VRAM) or similar. - Gemma 3 12B (int4): Only needs a NVIDIA RTX 4060 (8GB VRAM) or similar. - Gemma 3 4B, 1B (int4): Run anything with more than 2.5GB Memory. Want to see it in action? Video below shows how easy it is to get started using LMStudio:

Philipp Schmid

15,996 Aufrufe • vor 1 Jahr

A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm Here is a video where a single M3 Ultra serves 5 opencode sessions with Gemma 4 26B that process ~130k tokens in ~1.5 minutes.

A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm Here is a video where a single M3 Ultra serves 5 opencode sessions with Gemma 4 26B that process ~130k tokens in ~1.5 minutes.

Angelos Katharopoulos

66,128 Aufrufe • vor 2 Monaten

I’ve been using the new MacBook Pro M5 since launch and it’s a beast, especially compared to my old M1! It’s good for running the smaller AI models locally but also everything you do is just so fast. Here’s Gemma 3 QAT 12B running with MLX:

I’ve been using the new MacBook Pro M5 since launch and it’s a beast, especially compared to my old M1! It’s good for running the smaller AI models locally but also everything you do is just so fast. Here’s Gemma 3 QAT 12B running with MLX:

Adrien Grondin

101,950 Aufrufe • vor 7 Monaten

Hermes Agent Nous Research + Gemma 4 26B Google DeepMind, fully working on local Mac 🤯: one prompt → full CLI app + tests + pytest passing tool calls firing back to back — local Claude Code vibes Gemma 4 MoE is fast AND smart, and tool calling just works — powered by Rapid-MLX which natively parses Gemma 4's tool format (no other local backend can do this yet) Two lines to try it: pip install rapid-mlx & rapid-mlx serve gemma-4-26b pip install hermes-agent && hermes @teknaborat @siaborisov #HermesAgent #Gemma4 #LocalLLM

Hermes Agent Nous Research + Gemma 4 26B Google DeepMind, fully working on local Mac 🤯: one prompt → full CLI app + tests + pytest passing tool calls firing back to back — local Claude Code vibes Gemma 4 MoE is fast AND smart, and tool calling just works — powered by Rapid-MLX which natively parses Gemma 4's tool format (no other local backend can do this yet) Two lines to try it: pip install rapid-mlx & rapid-mlx serve gemma-4-26b pip install hermes-agent && hermes @teknaborat @siaborisov #HermesAgent #Gemma4 #LocalLLM

raullen

24,988 Aufrufe • vor 2 Monaten

🚨 Google Just Made OpenClaw Free (GEMMA 4): 0:00 - Why Gemma 4 matters 0:48 - #3 open model in the world 1:24 - What Gemma 4 actually does 2:01 - What this means for OpenClaw 3:03 - How to set up Gemma 4 3:58 - My honest take after running Claude for 3 months

🚨 Google Just Made OpenClaw Free (GEMMA 4): 0:00 - Why Gemma 4 matters 0:48 - #3 open model in the world 1:24 - What Gemma 4 actually does 2:01 - What this means for OpenClaw 3:03 - How to set up Gemma 4 3:58 - My honest take after running Claude for 3 months

Sharbel

368,986 Aufrufe • vor 2 Monaten

Gemma 3 27B + Ollama + Open WebUI + M3 Ultra 🔥 What a model! Great release Google! 👏🏻

Gemma 3 27B + Ollama + Open WebUI + M3 Ultra 🔥 What a model! Great release Google! 👏🏻

Ivan Fioravanti ᯅ

59,594 Aufrufe • vor 1 Jahr

🔴 The Pain: Running local MLX models is incredibly fast and private. But let's be real - testing tool calling via terminal is clunky, and there's zero good UI for it. 🟢 The Fix: rapid-mlx share Just ONE command gives you a polished web chat + seamless tool calling (works beautifully with gemma-4-12b-qat). We are proud to be the ONLY MLX inference engine in the community shipping this. ⚡️ 👇 Try it now: brew install raullenchai/rapid-mlx/rapid-mlx

🔴 The Pain: Running local MLX models is incredibly fast and private. But let's be real - testing tool calling via terminal is clunky, and there's zero good UI for it. 🟢 The Fix: rapid-mlx share Just ONE command gives you a polished web chat + seamless tool calling (works beautifully with gemma-4-12b-qat). We are proud to be the ONLY MLX inference engine in the community shipping this. ⚡️ 👇 Try it now: brew install raullenchai/rapid-mlx/rapid-mlx

raullen

22,680 Aufrufe • vor 25 Tagen

Gemma 4 just hit 200M downloads in only 2.5 months! For context, total downloads across the entire Gemma family of models were at 100M when we launched Gemma 3. The community's acceleration is incredible. Thank you to everyone building with Gemma. Watch how developers are driving real-world impact:

Gemma 4 just hit 200M downloads in only 2.5 months! For context, total downloads across the entire Gemma family of models were at 100M when we launched Gemma 3. The community's acceleration is incredible. Thank you to everyone building with Gemma. Watch how developers are driving real-world impact:

Google Gemma

200,312 Aufrufe • vor 6 Tagen

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: works with lot's of models (Mistral, Gemma, Phi-2, etc)

Awni Hannun

581,723 Aufrufe • vor 2 Jahren

Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift With MTP, Gemma 31B is 30-40% faster on M5 Max and with zero quality degradation A significant speedup by just adding a 900MB MTP drafter model

Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift With MTP, Gemma 31B is 30-40% faster on M5 Max and with zero quality degradation A significant speedup by just adding a 900MB MTP drafter model

Adrien Grondin

35,875 Aufrufe • vor 1 Monat

Gemma3n + MLX 🚀 Run DeepMind's first open-source Omni Model on Your Mac! Get started: > pip install -U mlx-vlm

Gemma3n + MLX 🚀 Run DeepMind's first open-source Omni Model on Your Mac! Get started: > pip install -U mlx-vlm

Prince Canuma

31,873 Aufrufe • vor 1 Jahr

have you played with Gemma 3-12B-IT on Hugging Face Spaces yet? 😏 you can also pull this and run locally if you feel like it 🤝 here's me asking Gemma 3 for styling tips (interleaved inference) 👒

have you played with Gemma 3-12B-IT on Hugging Face Spaces yet? 😏 you can also pull this and run locally if you feel like it 🤝 here's me asking Gemma 3 for styling tips (interleaved inference) 👒

merve

17,275 Aufrufe • vor 1 Jahr

See what’s new in Gemma 3, the latest generation of open models from Google with sizes ranging from 1B to 27B parameters. These versatile models run quickly and efficiently, and scale to developer needs to provide project flexibility.

See what’s new in Gemma 3, the latest generation of open models from Google with sizes ranging from 1B to 27B parameters. These versatile models run quickly and efficiently, and scale to developer needs to provide project flexibility.

Google AI Developers

11,391 Aufrufe • vor 1 Jahr

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

Alok

259,993 Aufrufe • vor 25 Tagen

This video is at normal speed. Gemma 4 12B MLX version running locally at 50 tokens/sec. Thank you Google DeepMind team. This model feels really solid for a lot of small local tasks and everyday AI workflows.

This video is at normal speed. Gemma 4 12B MLX version running locally at 50 tokens/sec. Thank you Google DeepMind team. This model feels really solid for a lot of small local tasks and everyday AI workflows.

AshutoshShrivastava

19,782 Aufrufe • vor 27 Tagen

MARK AND GEMMA NATION UP 3

MARK AND GEMMA NATION UP 3

leia 🧚‍♀️

15,577 Aufrufe • vor 1 Jahr

Thank you Google for demoing how to fine-tune Gemma 3 with Unsloth, free on Colab! 🦥 #GoogleIO

Thank you Google for demoing how to fine-tune Gemma 3 with Unsloth, free on Colab! 🦥 #GoogleIO

Unsloth AI

49,143 Aufrufe • vor 1 Jahr

Unlock local, agentic workflows with Gemma 4 12B and Google AI Edge, directly on your laptop. Experience 100% on-device AI: • Generate code in AI Edge Gallery (new to Mac) • Dictate and edit text via AI Edge Eloquent (new to Mac) • Serve Gemma 4 12B locally with LiteRT-LM Dive in:

Unlock local, agentic workflows with Gemma 4 12B and Google AI Edge, directly on your laptop. Experience 100% on-device AI: • Generate code in AI Edge Gallery (new to Mac) • Dictate and edit text via AI Edge Eloquent (new to Mac) • Serve Gemma 4 12B locally with LiteRT-LM Dive in:

Google for Developers

139,264 Aufrufe • vor 28 Tagen

3️⃣ ways to run Gemma 3. Get a Gemma model up and running fast using Keras, Ollama, and the GenAI SDK in this demo from Developer Relations Engineer Marina Coelho.

3️⃣ ways to run Gemma 3. Get a Gemma model up and running fast using Keras, Ollama, and the GenAI SDK in this demo from Developer Relations Engineer Marina Coelho.

Google AI Developers

25,192 Aufrufe • vor 1 Jahr

Gemma 3 + Ollama + N8N + Open Router 🔥 What a model! Great release Google! 👏🏻 - Runs locally via Ollama - Outperforms Claude Mini & DeepSeek - Available in 1B, 4B, 12B & 27B sizes - 128k context window - Free API access Save this video it will save you hundreds on AI costs. 🚀 Want the SOP? DM me.

Gemma 3 + Ollama + N8N + Open Router 🔥 What a model! Great release Google! 👏🏻 - Runs locally via Ollama - Outperforms Claude Mini & DeepSeek - Available in 1B, 4B, 12B & 27B sizes - 128k context window - Free API access Save this video it will save you hundreds on AI costs. 🚀 Want the SOP? DM me.

Julian Goldie SEO

140,500 Aufrufe • vor 1 Jahr