Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Fully local Code Assistant running on NVIDIA GPU! In this tutorial, I'll show you how to run Llama3 using TensorRT and Nvidia's Triton Inference Server to use it as a Code Assistant in VSCode In this thread 🧵, I'll walk you through the integration process, explaining each step simply... show more

Daniel San

32,935 subscribers

42,154 Aufrufe • vor 2 Jahren •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

To get started, we need a @nvidia GPU 🤩 In this case, we will use the following hardware 💻

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

We need to have Docker and CUDA installed Follow the guides below for installing both tools Docker: CUDA: and then run the following commands to confirm everything is set up correctly

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

Download the llama3-8B model from @huggingface

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

Now, Run TensorRT to compile the model using the Docker container Clone the TensorRT repository and move the model folder

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

You should now be able to test the compiled model

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

Perfect! We have the model now, let's deploy it on Triton Inference Server

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

The server is up and ready to connect with CodeGPT via the custom connection Open CodeGPT in VSCode, select Custom as the provider, and enter "ensemble" for the model

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

That's all! I'm sharing the link to the full article with all the details of the tutorial

Profilbild von Alexander Mia

Alexander Miavor 1 Jahr

INTRODUCING: Agentic Security - LLM Security Scanner! 🔍 🔑 Features: Scans for prompt injections, jailbreaking & more. Provides detailed reports & options to customize attack rules. 🔗access the GitHub Link ↓

Profilbild von ₣rancisco Trillo

₣rancisco Trillovor 2 Jahren

Or just use Continue and Ollama with whatever brand GPU 🤷‍♂️ that’s open source

Profilbild von Daniel San

Daniel Sanvor 2 Jahren

you can also use CodeGPT with Ollama Check this link:

Ähnliche Videos

Llama3-70b and phi-3-128k as Copilot in VSCode powered by NVIDIA AI 🤯 Now you can use these two models within VSCode using the NVIDIA AI API In this thread 🧵, I'll walk you through the integration process, explaining each step simply and clearly👇

Llama3-70b and phi-3-128k as Copilot in VSCode powered by NVIDIA AI 🤯 Now you can use these two models within VSCode using the NVIDIA AI API In this thread 🧵, I'll walk you through the integration process, explaining each step simply and clearly👇

Daniel San

99,941 Aufrufe • vor 2 Jahren

NVIDIA AI now lets you run Deepseek R1 in VSCode as a code assistant! 😱 With the CodeGPT extension, you can connect NVIDIA AI, then choose the Deepseek R1 model. Then select your project files to use them as context 👇

NVIDIA AI now lets you run Deepseek R1 in VSCode as a code assistant! 😱 With the CodeGPT extension, you can connect NVIDIA AI, then choose the Deepseek R1 model. Then select your project files to use them as context 👇

Daniel San

262,504 Aufrufe • vor 1 Jahr

Llama 3.1 Now Available in VSCode as a Code Assistant via Groq Inc 🚨 You can now use this new AI at Meta model directly in VSCode using the CodeGPT extension My first impression: The model is incredible 🚀

Llama 3.1 Now Available in VSCode as a Code Assistant via Groq Inc 🚨 You can now use this new AI at Meta model directly in VSCode using the CodeGPT extension My first impression: The model is incredible 🚀

Daniel San

263,015 Aufrufe • vor 2 Jahren

It'll suprise you how easy this is to make Using our AI Assistant and the unlimited generations of Google Veo 2 Prompts, tips, and tutorial in thread 👇🧵

It'll suprise you how easy this is to make Using our AI Assistant and the unlimited generations of Google Veo 2 Prompts, tips, and tutorial in thread 👇🧵

Freepik

18,545 Aufrufe • vor 1 Jahr

NVIDIA just dropped free API keys for every top AI model You don't need your own GPU and you don't pay per token. GLM-5.2, MiniMax, Kimi, DeepSeek, OpenAI, all running on NVIDIA's servers, called through a normal API. Link: How to use one: 1. Create a free NVIDIA account. 2. Pick a Free Endpoint model and open its Build tab. You'll see ready-to-copy code with the base URL 3. Hit Generate API Key, copy it and paste that base URL and key into Claude Code, Cursor, or Cline. Bonus: NVIDIA also dropped 237 official skills that install into Claude Code and Codex in one command. Bookmark this.

NVIDIA just dropped free API keys for every top AI model You don't need your own GPU and you don't pay per token. GLM-5.2, MiniMax, Kimi, DeepSeek, OpenAI, all running on NVIDIA's servers, called through a normal API. Link: How to use one: 1. Create a free NVIDIA account. 2. Pick a Free Endpoint model and open its Build tab. You'll see ready-to-copy code with the base URL 3. Hit Generate API Key, copy it and paste that base URL and key into Claude Code, Cursor, or Cline. Bonus: NVIDIA also dropped 237 official skills that install into Claude Code and Codex in one command. Bookmark this.

Yarchi

62,951 Aufrufe • vor 22 Tagen

This robot assistant from the NVIDIA CES Keynote on Monday is going viral. Nader Khalil🍊 explains all the hottest emerging AI trends in one demo: AI applications in 2026 will be multi-model, multi-modal, hybrid cloud/local, use open source models as well as proprietary models, control robots and embedded devices in the physical world, and have voice interfaces. (And the demo had a cute robot *and* a cute dog. Gold.) The demo was built with Pipecat AI. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from Hugging Face is open source hardware. (You can order it now, I have one!). You can run the assistant locally on your own hardware, in the cloud, or both.

This robot assistant from the NVIDIA CES Keynote on Monday is going viral. Nader Khalil🍊 explains all the hottest emerging AI trends in one demo: AI applications in 2026 will be multi-model, multi-modal, hybrid cloud/local, use open source models as well as proprietary models, control robots and embedded devices in the physical world, and have voice interfaces. (And the demo had a cute robot and a cute dog. Gold.) The demo was built with Pipecat AI. NVIDIA posted a really nice technical walk-through and complete code. The Reachy Mini robot from Hugging Face is open source hardware. (You can order it now, I have one!). You can run the assistant locally on your own hardware, in the cloud, or both.

kwindla

49,010 Aufrufe • vor 6 Monaten

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I *have* to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

llama3 8B (not quantized) running on an heterogeneous home cluster made of: - iPhone 15 Pro Max - iPad Pro (not sure which version XD) - MacBook Pro ( M1 Max ) - NVIDIA GeForce 3080 (not visible in video) - 2x NVIDIA Titan X Pascal Very soon also supporting Android (I have to also add my NVIDIA Shield GPU!!!!!). Single code base, single model format (reduced and optimally distributed to every node to save space). Everything (including iOS code) is open here ... it would be really nice, with the help of the community, taking this project to the next level in terms of optimization and support. My vision is about a distributed inference server that can run any model on any backend in any cluster topology - let's fight programmed obsolescence and democratize inference!

Simone Margaritelli

304,072 Aufrufe • vor 2 Jahren

Llama 3 as a Copilot in VSCode 🤩 Let me show you how to connect this amazing model that Meta released today! Here is a step-by-step tutorial! 🧵

Llama 3 as a Copilot in VSCode 🤩 Let me show you how to connect this amazing model that Meta released today! Here is a step-by-step tutorial! 🧵

Daniel San

371,539 Aufrufe • vor 2 Jahren

I built a FREE AI Agent that can browse the web, code websites, and automate tasks WITHOUT any technical setup I literally have my own AI assistant that works 24/7 In this video I'll show you how to easily set it up No coding experience required (Trust me, you want to bookmark this)

I built a FREE AI Agent that can browse the web, code websites, and automate tasks WITHOUT any technical setup I literally have my own AI assistant that works 24/7 In this video I'll show you how to easily set it up No coding experience required (Trust me, you want to bookmark this)

Julian Goldie SEO

47,804 Aufrufe • vor 1 Jahr

Earlier this year we announced that telecom leaders are building AI grids using NVIDIA AI infrastructure to optimize inference on distributed networks But what actually is an AI Grid? In this video, Amogh Dendukuri takes us back to basics. Watch now to see him break down the top 5 things you need to know about AI grids.

Earlier this year we announced that telecom leaders are building AI grids using NVIDIA AI infrastructure to optimize inference on distributed networks But what actually is an AI Grid? In this video, Amogh Dendukuri takes us back to basics. Watch now to see him break down the top 5 things you need to know about AI grids.

NVIDIA

27,166 Aufrufe • vor 1 Monat

CMU PhD who built the kernels NVIDIA now ships in TensorRT-LLM explained fast attention in 68 minutes - better than $1200 GPU programming courses. pick the attention pattern -> generate a fused CUDA kernel -> drop it into vLLM/SGLang -> same GPU, way more tokens per second. That loop is why FlashInfer now powers inference at NVIDIA, vLLM, SGLang, and half the serving stacks you use. FlashInfer + Triton + JIT-compiled kernels + paged-KV attention - that's the stack.

CMU PhD who built the kernels NVIDIA now ships in TensorRT-LLM explained fast attention in 68 minutes - better than $1200 GPU programming courses. pick the attention pattern -> generate a fused CUDA kernel -> drop it into vLLM/SGLang -> same GPU, way more tokens per second. That loop is why FlashInfer now powers inference at NVIDIA, vLLM, SGLang, and half the serving stacks you use. FlashInfer + Triton + JIT-compiled kernels + paged-KV attention - that's the stack.

h100envy

32,605 Aufrufe • vor 27 Tagen

Now you can recreate any TV show with AI. I'll show you how to do it, using Flow and the prompts included 🧵👇 (save this for later)

Now you can recreate any TV show with AI. I'll show you how to do it, using Flow and the prompts included 🧵👇 (save this for later)

TechHalla

69,411 Aufrufe • vor 1 Jahr

AMD might have disrupted Nvidia's entire cloud GPU rental business. In January at CES, AMD CEO Lisa Su demonstrated a $1,499 mini PC running the same class of AI model that currently costs companies $2,500 to $3,000 every month to rent from Nvidia-powered cloud servers. AMD's own branded version opened pre-orders this month at $3,999. Third party manufacturers have been selling the same chip since 2025 starting at $1,499. Here is exactly why this is dangerous for Nvidia. Nvidia's $75 billion quarterly revenue is built almost entirely on one business model, companies rent access to Nvidia GPUs through cloud providers like AWS and Lambda Labs to run AI. They pay monthly. Nvidia gets paid every time someone runs an AI model in the cloud. That recurring rental income is what turned Nvidia into a $5 trillion company. The AMD box eliminates that monthly fee permanently. One AI consultant switched from $2,800 per month in Nvidia cloud rental costs to $8 per month in electricity. The hardware paid for itself in 11 days. Over 8 months he generated $47,000 running the same AI workloads that previously left him paying Nvidia's ecosystem $2,800 every single month. Multiply that across thousands of enterprise customers and the revenue erosion becomes structural. Every business that buys this box stops paying cloud rental fees forever. Lawyers, doctors, banks, accountants, and financial advisors, businesses with sensitive data that cannot legally go to a cloud server represent billions in annual cloud GPU fees that Nvidia is now at risk of losing permanently. The threat is also closing in from the top. Google signed deals worth tens of billions with Anthropic and Meta to replace Nvidia with its own chips. Amazon built its own AI chips across AWS. Apple trained its AI on Google's chips, not Nvidia's. Custom silicon has grown from 21% of the AI chip market in 2025 to 28% in 2026. Nvidia's rental model only worked because serious AI compute had no alternative.

AMD might have disrupted Nvidia's entire cloud GPU rental business. In January at CES, AMD CEO Lisa Su demonstrated a $1,499 mini PC running the same class of AI model that currently costs companies $2,500 to $3,000 every month to rent from Nvidia-powered cloud servers. AMD's own branded version opened pre-orders this month at $3,999. Third party manufacturers have been selling the same chip since 2025 starting at $1,499. Here is exactly why this is dangerous for Nvidia. Nvidia's $75 billion quarterly revenue is built almost entirely on one business model, companies rent access to Nvidia GPUs through cloud providers like AWS and Lambda Labs to run AI. They pay monthly. Nvidia gets paid every time someone runs an AI model in the cloud. That recurring rental income is what turned Nvidia into a $5 trillion company. The AMD box eliminates that monthly fee permanently. One AI consultant switched from $2,800 per month in Nvidia cloud rental costs to $8 per month in electricity. The hardware paid for itself in 11 days. Over 8 months he generated $47,000 running the same AI workloads that previously left him paying Nvidia's ecosystem $2,800 every single month. Multiply that across thousands of enterprise customers and the revenue erosion becomes structural. Every business that buys this box stops paying cloud rental fees forever. Lawyers, doctors, banks, accountants, and financial advisors, businesses with sensitive data that cannot legally go to a cloud server represent billions in annual cloud GPU fees that Nvidia is now at risk of losing permanently. The threat is also closing in from the top. Google signed deals worth tens of billions with Anthropic and Meta to replace Nvidia with its own chips. Amazon built its own AI chips across AWS. Apple trained its AI on Google's chips, not Nvidia's. Custom silicon has grown from 21% of the AI chip market in 2025 to 28% in 2026. Nvidia's rental model only worked because serious AI compute had no alternative.

Bull Theory

26,668 Aufrufe • vor 1 Monat

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Deepseek running locally and privately for autocompletion in VSCode! 🙌 In less than a minute, I'll show you how to download Deepseek-coder and set it as the autocompletion model in VSCode. You’ll need to use ollama to download the model and CodeGPT to select it as the autocompletion model. Enjoy the best models running locally with :)

Daniel San

991,697 Aufrufe • vor 1 Jahr

Create compositions like a PRO by combining 3D with AI! I'll show how to take full control of your scene and generate images in any style you want. Breaking it all down for you in this thread. 🧵👇

Create compositions like a PRO by combining 3D with AI! I'll show how to take full control of your scene and generate images in any style you want. Breaking it all down for you in this thread. 🧵👇

TechHalla

158,069 Aufrufe • vor 1 Jahr