Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

llama.vscode (powered by Qwen Coder)

Georgi Gerganov

63,052 subscribers

77,668 views • 1 year ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

9 Comments

Georgi Gerganov1 year ago

This is a lightweight and very efficient VS Code extension using llama.cpp directly to provide local LLM-assisted code and text completions:

Georgi Gerganov1 year ago

The llama.cpp server provides unique context reuse techniques that allow you to efficiently use large contexts to enhance the completions based on the contents of your codebase. The setup is simple, no RAG is necessary and the performance is good even on low-end hardware. Enjoy!

malico.1 year ago

nvim?? 🥹🥹

Georgi Gerganov1 year ago

i got you pal

Daniel Nguyen ⚡1 year ago

Great work. Thanks

Neil Chudleigh1 year ago

Lets go! Nice work.

Nikita 🤙1 year ago

Great start!

Raymond Weitekamp1 year ago

wow - thank you so much for giving the different suggestions for different hardware (RAM). two follow up questions: - can i do this on PC? (my PC happens to have way more GPU/RAM) - from a purely personal perspective - what is the minimum "useful" RAM for this?

Georgi Gerganov1 year ago

Yes, this runs on Mac, Linux and Windows - see the setup instructions in the readme. The 7B models are pretty good, so if you can run this (i.e. ~8GB of VRAM) then go for this. Otherwise - use the 3B model (~4GB)

Related Videos

Qwen-2.5 on WebGPU 🏎️ • 42 tok/sec for Qwen2.5-Coder-1.5B on Mac ⚡ • Powered by MLC WebLLM and WebGPU 🔥 Watch Qwen2.5-Coder-1.5B build a website entirely in the browser!

Qwen-2.5 on WebGPU 🏎️ • 42 tok/sec for Qwen2.5-Coder-1.5B on Mac ⚡ • Powered by MLC WebLLM and WebGPU 🔥 Watch Qwen2.5-Coder-1.5B build a website entirely in the browser!

Caleb

41,579 views • 1 year ago

Qwen Deep Research just got a major upgrade. ⚡️ It now creates not only the report, but also a live webpage 🌐 and a podcast 🎙️ - Powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible. ✨ 👉

Qwen Deep Research just got a major upgrade. ⚡️ It now creates not only the report, but also a live webpage 🌐 and a podcast 🎙️ - Powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible. ✨ 👉

Qwen

204,688 views • 9 months ago

Qwen-Coder-Qoder is live! We’ve launched a customized model built on Alibaba’s Qwen-Coder, fine-tuned via large-scale RL specifically for the Qoder.

Qwen-Coder-Qoder is live! We’ve launched a customized model built on Alibaba’s Qwen-Coder, fine-tuned via large-scale RL specifically for the Qoder.

Qoder

178,024 views • 6 months ago

Go from vibe-coder to building production ready apps. BuildAnything. Powered by Monad

Go from vibe-coder to building production ready apps. BuildAnything. Powered by Monad

Build Anything

155,517 views • 3 months ago

Announcing LlamaCoder v3 – generate React apps in 1 prompt! • Multi-file generation for better apps • Monaco editor to view & export code • New models: GLM 4.6, Kimi K2, & Qwen 3 Coder 100% free, open source, and powered by Together AI.

Announcing LlamaCoder v3 – generate React apps in 1 prompt! • Multi-file generation for better apps • Monaco editor to view & export code • New models: GLM 4.6, Kimi K2, & Qwen 3 Coder 100% free, open source, and powered by Together AI.

Hassan

42,450 views • 7 months ago

Alibaba engineer who leads Qwen explained the future of open agent models in 25 minutes - better than $2000 LLM training courses. pre-train the base ->SFT -> RLHF -> tool use -> multi-modal -> ship a whole family (chat / VL / coder / math / QwQ). That loop is why Qwen quietly became the most downloaded open model family on Hugging Face. Qwen base + Qwen-VL + Qwen-Coder + QwQ reasoning - that's the stack. Watch and save it, then read the article below.

Alibaba engineer who leads Qwen explained the future of open agent models in 25 minutes - better than $2000 LLM training courses. pre-train the base ->SFT -> RLHF -> tool use -> multi-modal -> ship a whole family (chat / VL / coder / math / QwQ). That loop is why Qwen quietly became the most downloaded open model family on Hugging Face. Qwen base + Qwen-VL + Qwen-Coder + QwQ reasoning - that's the stack. Watch and save it, then read the article below.

h100envy

113,871 views • 28 days ago

100% Local LLM Coding with Bolt + Ollama + Qwen Qwen just dropped their qwen2.5-coder-30B and it's on ollama. Finally a model good enough to use instead of gpt-4o or claude. Now with Qwen, Bolt perfectly displays the app preview with Ollama. Run it with 1 click.

100% Local LLM Coding with Bolt + Ollama + Qwen Qwen just dropped their qwen2.5-coder-30B and it's on ollama. Finally a model good enough to use instead of gpt-4o or claude. Now with Qwen, Bolt perfectly displays the app preview with Ollama. Run it with 1 click.

cocktail peanut

52,630 views • 1 year ago

𝘽𝙔𝙍𝙉𝙀 𝘽𝘼𝘽𝙔 𝘽𝙔𝙍𝙉𝙀 🕺 🤝 Powered by Emerge Digital

𝘽𝙔𝙍𝙉𝙀 𝘽𝘼𝘽𝙔 𝘽𝙔𝙍𝙉𝙀 🕺 🤝 Powered by Emerge Digital

Gloucester Rugby 🍒

33,410 views • 1 year ago

PSA: Qwen's Qwen-2.5-Coder-32B-Instruct is now live on Groq Inc for insanely fast (and smart) code generation. See below for instructions to add to Cursor.

PSA: Qwen's Qwen-2.5-Coder-32B-Instruct is now live on Groq Inc for insanely fast (and smart) code generation. See below for instructions to add to Cursor.

Hatice Ozen

36,643 views • 1 year ago

Qwen 2.5 Coder Q4 M4 Max Inference test. Apple MLX vs Ollama: - MLX: 23.97 toks/sec 🥇🔥 - Ollama: 18.33 toks/sec 🥈 Here a video to show results

Qwen 2.5 Coder Q4 M4 Max Inference test. Apple MLX vs Ollama: - MLX: 23.97 toks/sec 🥇🔥 - Ollama: 18.33 toks/sec 🥈 Here a video to show results

Ivan Fioravanti ᯅ

34,881 views • 1 year ago

🤯 Wow! In one prompt Qwen3-Coder-Next generated a fully working flappy birds game in HTML. (0:05) Claude Code with Qwen3-Coder-Next (0:26) Shows the game running Run it fully locally: ollama pull qwen3-coder-next Ollama's cloud if you can't run it locally: ollama pull qwen3-coder-next:cloud Try launching it with Claude Code using ollama launch (link to play 🧵) So cool! Qwen Tongyi Lab Junyang Lin

🤯 Wow! In one prompt Qwen3-Coder-Next generated a fully working flappy birds game in HTML. (0:05) Claude Code with Qwen3-Coder-Next (0:26) Shows the game running Run it fully locally: ollama pull qwen3-coder-next Ollama's cloud if you can't run it locally: ollama pull qwen3-coder-next:cloud Try launching it with Claude Code using ollama launch (link to play 🧵) So cool! Qwen Tongyi Lab Junyang Lin

ollama

117,570 views • 5 months ago

🚀 Tensorplex Dojo (Subnet 52) in action! Meet DOJO-INTERFACE-CODER-7B: Qwen2.5-Coder-7B-Instruct, fine-tuned with Dojo datasets to craft stunning front-end UIs! ✨ Generates beautiful, interactive interfaces ✨ Trained on synthetic data with distributed human feedback ✨ Powered by (Subnet 52) on Bittensor 👇

🚀 Tensorplex Dojo (Subnet 52) in action! Meet DOJO-INTERFACE-CODER-7B: Qwen2.5-Coder-7B-Instruct, fine-tuned with Dojo datasets to craft stunning front-end UIs! ✨ Generates beautiful, interactive interfaces ✨ Trained on synthetic data with distributed human feedback ✨ Powered by (Subnet 52) on Bittensor 👇

Tensorplex Labs

18,281 views • 1 year ago

Run Qwen-TTS on Your Mac, natively. Blaine Brown has created Qwen3-TTS MLX WebUI Enhanced, a Web UI for running Qwen-TTS using the Mac-native MLX framework, powered by MLX-Audio from Prince Canuma, it runs very smooth! 1-click install on pinokio (Apple Silicon Macs)

Run Qwen-TTS on Your Mac, natively. Blaine Brown has created Qwen3-TTS MLX WebUI Enhanced, a Web UI for running Qwen-TTS using the Mac-native MLX framework, powered by MLX-Audio from Prince Canuma, it runs very smooth! 1-click install on pinokio (Apple Silicon Macs)

cocktail peanut

168,965 views • 5 months ago

Anthropic just dropped Claude Opus 4.1 It outperforms OpenAI o3, Gemini 2.5 Pro and Qwen-3 Coder on agentic coding and tool use. Claude Code is going to get incredibly better.

Anthropic just dropped Claude Opus 4.1 It outperforms OpenAI o3, Gemini 2.5 Pro and Qwen-3 Coder on agentic coding and tool use. Claude Code is going to get incredibly better.

Shubham Saboo

26,252 views • 1 year ago

Just added Qwen 2.5 32B Coder to LlamaCoder – it's an amazing open source coding model. Going to run some evals between it and Llama 3.1 405B & will share my results soon.

Just added Qwen 2.5 32B Coder to LlamaCoder – it's an amazing open source coding model. Going to run some evals between it and Llama 3.1 405B & will share my results soon.

Hassan

52,565 views • 1 year ago

Here's to RayNeo's smarter wearable tech, powered by Alibaba Cloud's AI model #Qwen. Enjoy immersive AR, intuitive voice control, and seamless interactions—leveling up your daily life! 👓 #AlibabaAI #SmartGlasses #CustomerExperience

Here's to RayNeo's smarter wearable tech, powered by Alibaba Cloud's AI model #Qwen. Enjoy immersive AR, intuitive voice control, and seamless interactions—leveling up your daily life! 👓 #AlibabaAI #SmartGlasses #CustomerExperience

Alibaba Group

300,201 views • 1 year ago