Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Fully local Code Assistant running on NVIDIA GPU! In this tutorial, I'll show you how to run Llama3 using TensorRT and Nvidia's Triton Inference Server to use it as a Code Assistant in VSCode In this thread 🧵, I'll walk you through the integration process, explaining each step simply... show more

Daniel San

32,935 subscribers

42,154 просмотров • 2 лет назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Daniel San

Daniel San2 лет назад

To get started, we need a @nvidia GPU 🤩 In this case, we will use the following hardware 💻

Фото профиля Daniel San

Daniel San2 лет назад

We need to have Docker and CUDA installed Follow the guides below for installing both tools Docker: CUDA: and then run the following commands to confirm everything is set up correctly

Фото профиля Daniel San

Daniel San2 лет назад

Download the llama3-8B model from @huggingface

Фото профиля Daniel San

Daniel San2 лет назад

Now, Run TensorRT to compile the model using the Docker container Clone the TensorRT repository and move the model folder

Фото профиля Daniel San

Daniel San2 лет назад

You should now be able to test the compiled model

Фото профиля Daniel San

Daniel San2 лет назад

Perfect! We have the model now, let's deploy it on Triton Inference Server

Фото профиля Daniel San

Daniel San2 лет назад

The server is up and ready to connect with CodeGPT via the custom connection Open CodeGPT in VSCode, select Custom as the provider, and enter "ensemble" for the model

Фото профиля Daniel San

Daniel San2 лет назад

That's all! I'm sharing the link to the full article with all the details of the tutorial

Фото профиля Alexander Mia

Alexander Mia1 год назад

INTRODUCING: Agentic Security - LLM Security Scanner! 🔍 🔑 Features: Scans for prompt injections, jailbreaking & more. Provides detailed reports & options to customize attack rules. 🔗access the GitHub Link ↓

Фото профиля ₣rancisco Trillo

₣rancisco Trillo2 лет назад

Or just use Continue and Ollama with whatever brand GPU 🤷‍♂️ that’s open source

Фото профиля Daniel San

Daniel San2 лет назад

you can also use CodeGPT with Ollama Check this link:

Похожие видео

Llama3-70b and phi-3-128k as Copilot in VSCode powered by NVIDIA AI 🤯 Now you can use these two models within VSCode using the NVIDIA AI API In this thread 🧵, I'll walk you through the integration process, explaining each step simply and clearly👇

Llama3-70b and phi-3-128k as Copilot in VSCode powered by NVIDIA AI 🤯 Now you can use these two models within VSCode using the NVIDIA AI API In this thread 🧵, I'll walk you through the integration process, explaining each step simply and clearly👇

Daniel San

99,941 просмотров • 2 лет назад

NVIDIA AI now lets you run Deepseek R1 in VSCode as a code assistant! 😱 With the CodeGPT extension, you can connect NVIDIA AI, then choose the Deepseek R1 model. Then select your project files to use them as context 👇

NVIDIA AI now lets you run Deepseek R1 in VSCode as a code assistant! 😱 With the CodeGPT extension, you can connect NVIDIA AI, then choose the Deepseek R1 model. Then select your project files to use them as context 👇

Daniel San

262,504 просмотров • 1 год назад

Llama 3.1 Now Available in VSCode as a Code Assistant via Groq Inc 🚨 You can now use this new AI at Meta model directly in VSCode using the CodeGPT extension My first impression: The model is incredible 🚀

Llama 3.1 Now Available in VSCode as a Code Assistant via Groq Inc 🚨 You can now use this new AI at Meta model directly in VSCode using the CodeGPT extension My first impression: The model is incredible 🚀

Daniel San

263,015 просмотров • 2 лет назад

It'll suprise you how easy this is to make Using our AI Assistant and the unlimited generations of Google Veo 2 Prompts, tips, and tutorial in thread 👇🧵

It'll suprise you how easy this is to make Using our AI Assistant and the unlimited generations of Google Veo 2 Prompts, tips, and tutorial in thread 👇🧵

Freepik

18,545 просмотров • 1 год назад

NVIDIA just dropped free API keys for every top AI model You don't need your own GPU and you don't pay per token. GLM-5.2, MiniMax, Kimi, DeepSeek, OpenAI, all running on NVIDIA's servers, called through a normal API. Link: How to use one: 1. Create a free NVIDIA account. 2. Pick a Free Endpoint model and open its Build tab. You'll see ready-to-copy code with the base URL 3. Hit Generate API Key, copy it and paste that base URL and key into Claude Code, Cursor, or Cline. Bonus: NVIDIA also dropped 237 official skills that install into Claude Code and Codex in one command. Bookmark this.

NVIDIA just dropped free API keys for every top AI model You don't need your own GPU and you don't pay per token. GLM-5.2, MiniMax, Kimi, DeepSeek, OpenAI, all running on NVIDIA's servers, called through a normal API. Link: How to use one: 1. Create a free NVIDIA account. 2. Pick a Free Endpoint model and open its Build tab. You'll see ready-to-copy code with the base URL 3. Hit Generate API Key, copy it and paste that base URL and key into Claude Code, Cursor, or Cline. Bonus: NVIDIA also dropped 237 official skills that install into Claude Code and Codex in one command. Bookmark this.

Yarchi

62,954 просмотров • 22 дней назад

This robot assistant from the NVIDIA CES Keynote on Monday is going viral. Nader Khalil🍊 explains all the hottest emerging AI trends in one demo: AI applications in 2026 will be multi-model, multi-modal, hybrid cloud/local, use open source models as well as proprietary models, control robots and embedded devices in the physical world, and have voice interfaces. (And the demo had a cute robot *and* a cute dog. Gold.) The demo was built with Pipecat AI. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from Hugging Face is open source hardware. (You can order it now, I have one!). You can run the assistant locally on your own hardware, in the cloud, or both.

This robot assistant from the NVIDIA CES Keynote on Monday is going viral. Nader Khalil🍊 explains all the hottest emerging AI trends in one demo: AI applications in 2026 will be multi-model, multi-modal, hybrid cloud/local, use open source models as well as proprietary models, control robots and embedded devices in the physical world, and have voice interfaces. (And the demo had a cute robot and a cute dog. Gold.) The demo was built with Pipecat AI. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from Hugging Face is open source hardware. (You can order it now, I have one!). You can run the assistant locally on your own hardware, in the cloud, or both.

kwindla

49,010 просмотров • 6 месяцев назад

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I *have* to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I have to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

Simone Margaritelli

304,072 просмотров • 2 лет назад

Llama 3 as a Copilot in VSCode 🤩 Let me show you how to connect this amazing model that Meta released today! Here is a step-by-step tutorial! 🧵

Llama 3 as a Copilot in VSCode 🤩 Let me show you how to connect this amazing model that Meta released today! Here is a step-by-step tutorial! 🧵

Daniel San

371,539 просмотров • 2 лет назад

I built a FREE AI Agent that can browse the web, code websites, and automate tasks WITHOUT any technical setup I literally have my own AI assistant that works 24/7 In this video I'll show you how to easily set it up No coding experience required (Trust me, you want to bookmark this)

I built a FREE AI Agent that can browse the web, code websites, and automate tasks WITHOUT any technical setup I literally have my own AI assistant that works 24/7 In this video I'll show you how to easily set it up No coding experience required (Trust me, you want to bookmark this)

Julian Goldie SEO

47,804 просмотров • 1 год назад

Earlier this year we announced that telecom leaders are building AI grids using NVIDIA AI infrastructure to optimize inference on distributed networks But what actually is an AI Grid? In this video, Amogh Dendukuri takes us back to basics. Watch now to see him break down the top 5 things you need to know about AI grids.

Earlier this year we announced that telecom leaders are building AI grids using NVIDIA AI infrastructure to optimize inference on distributed networks But what actually is an AI Grid? In this video, Amogh Dendukuri takes us back to basics. Watch now to see him break down the top 5 things you need to know about AI grids.

NVIDIA

27,166 просмотров • 1 месяц назад

CMU PhD who built the kernels NVIDIA now ships in TensorRT-LLM explained fast attention in 68 minutes - better than $1200 GPU programming courses. pick the attention pattern -> generate a fused CUDA kernel -> drop it into vLLM/SGLang -> same GPU, way more tokens per second. That loop is why FlashInfer now powers inference at NVIDIA, vLLM, SGLang, and half the serving stacks you use. FlashInfer + Triton + JIT-compiled kernels + paged-KV attention - that's the stack.

CMU PhD who built the kernels NVIDIA now ships in TensorRT-LLM explained fast attention in 68 minutes - better than $1200 GPU programming courses. pick the attention pattern -> generate a fused CUDA kernel -> drop it into vLLM/SGLang -> same GPU, way more tokens per second. That loop is why FlashInfer now powers inference at NVIDIA, vLLM, SGLang, and half the serving stacks you use. FlashInfer + Triton + JIT-compiled kernels + paged-KV attention - that's the stack.

h100envy

32,605 просмотров • 27 дней назад

Now you can recreate any TV show with AI. I'll show you how to do it, using Flow and the prompts included 🧵👇 (save this for later)

Now you can recreate any TV show with AI. I'll show you how to do it, using Flow and the prompts included 🧵👇 (save this for later)

TechHalla

69,411 просмотров • 1 год назад

AMD might have disrupted Nvidia's entire cloud GPU rental business. In January at CES, AMD CEO Lisa Su demonstrated a $1,499 mini PC running the same class of AI model that currently costs companies $2,500 to $3,000 every month to rent from Nvidia-powered cloud servers. AMD's own branded version opened pre-orders this month at $3,999. Third party manufacturers have been selling the same chip since 2025 starting at $1,499. Here is exactly why this is dangerous for Nvidia. Nvidia's $75 billion quarterly revenue is built almost entirely on one business model, companies rent access to Nvidia GPUs through cloud providers like AWS and Lambda Labs to run AI. They pay monthly. Nvidia gets paid every time someone runs an AI model in the cloud. That recurring rental income is what turned Nvidia into a $5 trillion company. The AMD box eliminates that monthly fee permanently. One AI consultant switched from $2,800 per month in Nvidia cloud rental costs to $8 per month in electricity. The hardware paid for itself in 11 days. Over 8 months he generated $47,000 running the same AI workloads that previously left him paying Nvidia's ecosystem $2,800 every single month. Multiply that across thousands of enterprise customers and the revenue erosion becomes structural. Every business that buys this box stops paying cloud rental fees forever. Lawyers, doctors, banks, accountants, and financial advisors, businesses with sensitive data that cannot legally go to a cloud server represent billions in annual cloud GPU fees that Nvidia is now at risk of losing permanently. The threat is also closing in from the top. Google signed deals worth tens of billions with Anthropic and Meta to replace Nvidia with its own chips. Amazon built its own AI chips across AWS. Apple trained its AI on Google's chips, not Nvidia's. Custom silicon has grown from 21% of the AI chip market in 2025 to 28% in 2026. Nvidia's rental model only worked because serious AI compute had no alternative.

AMD might have disrupted Nvidia's entire cloud GPU rental business. In January at CES, AMD CEO Lisa Su demonstrated a $1,499 mini PC running the same class of AI model that currently costs companies $2,500 to $3,000 every month to rent from Nvidia-powered cloud servers. AMD's own branded version opened pre-orders this month at $3,999. Third party manufacturers have been selling the same chip since 2025 starting at $1,499. Here is exactly why this is dangerous for Nvidia. Nvidia's $75 billion quarterly revenue is built almost entirely on one business model, companies rent access to Nvidia GPUs through cloud providers like AWS and Lambda Labs to run AI. They pay monthly. Nvidia gets paid every time someone runs an AI model in the cloud. That recurring rental income is what turned Nvidia into a $5 trillion company. The AMD box eliminates that monthly fee permanently. One AI consultant switched from $2,800 per month in Nvidia cloud rental costs to $8 per month in electricity. The hardware paid for itself in 11 days. Over 8 months he generated $47,000 running the same AI workloads that previously left him paying Nvidia's ecosystem $2,800 every single month. Multiply that across thousands of enterprise customers and the revenue erosion becomes structural. Every business that buys this box stops paying cloud rental fees forever. Lawyers, doctors, banks, accountants, and financial advisors, businesses with sensitive data that cannot legally go to a cloud server represent billions in annual cloud GPU fees that Nvidia is now at risk of losing permanently. The threat is also closing in from the top. Google signed deals worth tens of billions with Anthropic and Meta to replace Nvidia with its own chips. Amazon built its own AI chips across AWS. Apple trained its AI on Google's chips, not Nvidia's. Custom silicon has grown from 21% of the AI chip market in 2025 to 28% in 2026. Nvidia's rental model only worked because serious AI compute had no alternative.

Bull Theory

26,668 просмотров • 1 месяц назад

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Daniel San

991,697 просмотров • 1 год назад

Create compositions like a PRO by combining 3D with AI! I'll show how to take full control of your scene and generate images in any style you want. Breaking it all down for you in this thread. 🧵👇

Create compositions like a PRO by combining 3D with AI! I'll show how to take full control of your scene and generate images in any style you want. Breaking it all down for you in this thread. 🧵👇

TechHalla

158,069 просмотров • 1 год назад