Georgi Gerganov's banner

Georgi Gerganov

@ggerganov • 50,809 subscribers

24th at the Electrica puzzle challenge | https://t.co/baTQS2bdia

Shorts

HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.

HuggingFace just shipped in-browser GGUF editing It allows you to edit GGUF metadata in the comfort of your browser, without having to even download the full model. This feature is enabled via the Xet technology that makes partial file updates possible.

37,314 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Today I was sent the following cool demo: Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave

Sensitive content

This media may contain sensitive content.

Today I was sent the following cool demo: Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave

Georgi Gerganov

17,322,144 просмотров • 1 год назад

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video)

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video)

Georgi Gerganov

783,999 просмотров • 3 месяцев назад

Pro tip - hook your PC and Phone with Tailscale and enjoy fast and private inference on the go. Here is Gemma 4, hosted on Mac Studio, streaming to my iPhone. No 3rd party apps. Same WebUI experience everywhere.

Pro tip - hook your PC and Phone with Tailscale and enjoy fast and private inference on the go. Here is Gemma 4, hosted on Mac Studio, streaming to my iPhone. No 3rd party apps. Same WebUI experience everywhere.

Georgi Gerganov

370,956 просмотров • 3 месяцев назад

Introducing LLaMA voice chat! 🦙 You can run this locally on an M1 Pro

Introducing LLaMA voice chat! 🦙 You can run this locally on an M1 Pro

Georgi Gerganov

1,681,888 просмотров • 3 лет назад

Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra

Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra

Georgi Gerganov

1,158,014 просмотров • 2 лет назад

Casually running a 180B parameter LLM on M2 Ultra

Casually running a 180B parameter LLM on M2 Ultra

Georgi Gerganov

683,236 просмотров • 2 лет назад

Here is another Gibberlink experiment: Two AI agents autonomously encrypt their audio chat (video by Anton Pidkuiko)

Here is another Gibberlink experiment: Two AI agents autonomously encrypt their audio chat (video by Anton Pidkuiko)

Georgi Gerganov

270,537 просмотров • 1 год назад

DeepSeek-R1 on Mac Studio 192GB 🪄

DeepSeek-R1 on Mac Studio 192GB 🪄

Georgi Gerganov

259,545 просмотров • 1 год назад

LLaMA voice chat + Siri TTS This example is now truly 100% offline since we are now using the built-in Siri text-to-speech available on MacOS through the "say" command

LLaMA voice chat + Siri TTS This example is now truly 100% offline since we are now using the built-in Siri text-to-speech available on MacOS through the "say" command

Georgi Gerganov

412,007 просмотров • 3 лет назад

Causally running Grok-1 at home

Causally running Grok-1 at home

Georgi Gerganov

276,954 просмотров • 2 лет назад

sam.cpp 👀 Inference of Meta's Segment Anything Model on the CPU Project by Yavor Ivanov - powered by

sam.cpp 👀 Inference of Meta's Segment Anything Model on the CPU Project by Yavor Ivanov - powered by

Georgi Gerganov

264,542 просмотров • 2 лет назад

The example below is using prompt-based speculative decoding. Specifically, ngram hashing is utilized to suggest drafts of up to 64 tokens. The hasher keeps track of ngrams in the observed contexts, so mostly effective for coding tasks. Here is another demo:

The example below is using prompt-based speculative decoding. Specifically, ngram hashing is utilized to suggest drafts of up to 64 tokens. The hasher keeps track of ngrams in the observed contexts, so mostly effective for coding tasks. Here is another demo:

Georgi Gerganov

29,815 просмотров • 3 месяцев назад

Native whisper.cpp server with OAI-like API is now available $ make server && ./server This is a very convenient way to run an efficient local transcription service locally on any kind of hardware (CPU, GPU (CUDA or Metal) or ANE) thx felrock

Native whisper.cpp server with OAI-like API is now available $ make server && ./server This is a very convenient way to run an efficient local transcription service locally on any kind of hardware (CPU, GPU (CUDA or Metal) or ANE) thx felrock

Georgi Gerganov

171,380 просмотров • 2 лет назад

Here is how to deploy and serve any LLM on HF with a single command in less than 3 minutes with llama.cpp $ bash -c "$(curl -s

Here is how to deploy and serve any LLM on HF with a single command in less than 3 minutes with llama.cpp $ bash -c "$(curl -s

Georgi Gerganov

143,140 просмотров • 2 лет назад

llama.vscode (powered by Qwen Coder)

llama.vscode (powered by Qwen Coder)

Georgi Gerganov

77,658 просмотров • 1 год назад

GGUF My Repo by Hugging Face Create quantum GGUF models fully online - quickly and secure. Thanks to Vaibhav (VB) Srivastav, Pedro Cuenca and team for creating this HF space! In the video below I give it a try to create a quantum 8-bit model of Gemma 2B - it took about 60 seconds. The resulting model becomes automatically available in your HF profile and is ready to be used with llama.cpp

Georgi Gerganov

64,499 просмотров • 2 лет назад

llama.vim is also pretty wild 🙃

llama.vim is also pretty wild 🙃

Georgi Gerganov

45,053 просмотров • 1 год назад

llama.cpp with it's integrated WebUI is effectively the most lightweight and self-contained agent that you can run locally. Here are a few more examples of using Hugging Face MCP to search for models

Sensitive content

This media may contain sensitive content.

llama.cpp with it's integrated WebUI is effectively the most lightweight and self-contained agent that you can run locally. Here are a few more examples of using Hugging Face MCP to search for models

Georgi Gerganov

11,070 просмотров • 3 месяцев назад

Updated the llama.vim plugin to support speculative FIM It now generates the next suggestion while you review the current one. If you accept it, the next suggestion is available immediately.

Updated the llama.vim plugin to support speculative FIM It now generates the next suggestion while you review the current one. If you accept it, the next suggestion is available immediately.

Georgi Gerganov

28,286 просмотров • 1 год назад