
Georgi Gerganov
@ggerganov • 50,809 subscribers
24th at the Electrica puzzle challenge | https://t.co/baTQS2bdia
Shorts
Videos
1:09
Sensitive content
This media may contain sensitive content.

Let me demonstrate the true power of llama.cpp: - Running on Mac Studio M2 Ultra (3 years old) - Gemma 4 26B A4B Q8_0 (full quality) - Built-in WebUI (ships with llama.cpp) - MCP support out of the box (web-search, HF, github, etc.) - Prompt speculative decoding The result: 300t/s (realtime video)
Georgi Gerganov782,188 просмотров • 2 месяцев назад

Introducing LLaMA voice chat! 🦙 You can run this locally on an M1 Pro
Georgi Gerganov1,681,682 просмотров • 3 лет назад

Full F16 precision 34B Code Llama at >20 t/s on M2 Ultra
Georgi Gerganov1,156,911 просмотров • 2 лет назад

The example below is using prompt-based speculative decoding. Specifically, ngram hashing is utilized to suggest drafts of up to 64 tokens. The hasher keeps track of ngrams in the observed contexts, so mostly effective for coding tasks. Here is another demo:
Georgi Gerganov29,592 просмотров • 2 месяцев назад

sam.cpp 👀 Inference of Meta's Segment Anything Model on the CPU Project by Yavor Ivanov - powered by
Georgi Gerganov264,542 просмотров • 2 лет назад
1:32
Sensitive content
This media may contain sensitive content.

GGUF My Repo by Hugging Face Create quantum GGUF models fully online - quickly and secure. Thanks to Vaibhav (VB) Srivastav, Pedro Cuenca and team for creating this HF space! In the video below I give it a try to create a quantum 8-bit model of Gemma 2B - it took about 60 seconds. The resulting model becomes automatically available in your HF profile and is ready to be used with llama.cpp
Georgi Gerganov64,486 просмотров • 2 лет назад
Больше нет контента для загрузки