Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

this is a laptop running a 31b parameter model at 99% gpu autonomously through hermes agent, 15 tok/s sustained, 22.8 of 24gb vram gone, 94 watts at 50c. no api keys. no rate limits. no "your prompts are being used for training". no monthly subscription. no anthropic telling me... show more

Sudo su

26,228 subscribers

65,567 просмотров • 2 месяцев назад •via X (Twitter)

Новости и политика Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Hermes Agent now runs entirely in your browser. No CLI. No VPS. No Mac mini. No 15 hours of config. Phone, laptop, TV - open a tab and your 24/7 self-evolving agent is live.

Hermes Agent now runs entirely in your browser. No CLI. No VPS. No Mac mini. No 15 hours of config. Phone, laptop, TV - open a tab and your 24/7 self-evolving agent is live.

0xMarioNawfal

59,575 просмотров • 1 месяц назад

Hermes Agent with a beautiful WebUI. No Telegram bot. No API keys. No setup. Plus, your AI agent is live in 60 seconds. This is FlyHermes.

Hermes Agent with a beautiful WebUI. No Telegram bot. No API keys. No setup. Plus, your AI agent is live in 60 seconds. This is FlyHermes.

Antoine Rousseaux

25,921 просмотров • 2 месяцев назад

HERMES AGENT CAN NOW RUN COMPLETELY FREE WITH GEMMA 4 + OLLAMA. No API bills, no limits, and a 256k context window running straight from your own machine.

HERMES AGENT CAN NOW RUN COMPLETELY FREE WITH GEMMA 4 + OLLAMA. No API bills, no limits, and a 256k context window running straight from your own machine.

0xMarioNawfal

120,841 просмотров • 2 месяцев назад

$A $351 MINI PC IS RUNNING 26-BILLION-PARAMETER AI MODELS AT 20 TOKENS/SEC AND HERMES AGENT ON TOP OF IT This is the Minisforum UM790 Pro. AMD Ryzen 9 7940HS, Radeon 780M iGPU, 48GB DDR5. The BIOS reports the GPU has 4GB of VRAM Here's the part people get wrong. The 780M has no dedicated VRAM at all it borrows from system RAM. Vulkan ignores the BIOS number and reads the full 48GB pool directly That's the whole trick. 21+ GB allocated to model weights on a "4GB" GPU. Because the models are MoE, only 4 billion of the parameters activate per token a fraction of the reads a dense model needs. That's why 20 tok/s works here Gemma 4 26B MoE holds 19.5 tok/s with 196K context. Qwen3.5-35B-A3B holds 20.8. Nemotron Cascade 2 clears 24.8. A dense 31B, by contrast, drops to 4 tok/s it reads the entire model every token, no way around it On top: Hermes Agent, full agentic workflows terminal, file ops, web, 40+ tools against local models only. No API keys. The wall between you and a usable local agent used to be a GPU you couldn't afford. Now it's a BIOS setting most people never check Bookmark this & Try it yourself ↓$

A $351 MINI PC IS RUNNING 26-BILLION-PARAMETER AI MODELS AT 20 TOKENS/SEC AND HERMES AGENT ON TOP OF IT This is the Minisforum UM790 Pro. AMD Ryzen 9 7940HS, Radeon 780M iGPU, 48GB DDR5. The BIOS reports the GPU has 4GB of VRAM Here's the part people get wrong. The 780M has no dedicated VRAM at all it borrows from system RAM. Vulkan ignores the BIOS number and reads the full 48GB pool directly That's the whole trick. 21+ GB allocated to model weights on a "4GB" GPU. Because the models are MoE, only 4 billion of the parameters activate per token a fraction of the reads a dense model needs. That's why 20 tok/s works here Gemma 4 26B MoE holds 19.5 tok/s with 196K context. Qwen3.5-35B-A3B holds 20.8. Nemotron Cascade 2 clears 24.8. A dense 31B, by contrast, drops to 4 tok/s it reads the entire model every token, no way around it On top: Hermes Agent, full agentic workflows terminal, file ops, web, 40+ tools against local models only. No API keys. The wall between you and a usable local agent used to be a GPU you couldn't afford. Now it's a BIOS setting most people never check Bookmark this & Try it yourself ↓

slash1s

35,742 просмотров • 9 дней назад

six months ago this wasn't happening on 8gb vram. running unsloth's Q4_K_XL quant of gemma 4 26b-a4b-it-qat, a sparse MoE model with only 4b active params on a single rtx 4060 laptop gpu, 8gb vram, 20+ tok/s decode. no cloud, no api, no offload hacks. just a gaming laptop on battery. what makes it fit: google's QAT (quantization aware training), plus MTP (multi token prediction) support in the latest llama.cpp builds. that combo is the single biggest unlock for local inference on low vram. rtx 3060, rtx 3070, gtx 1070, gtx 1080, rtx 4050, rtx 4060, rtx 5050, rtx 5060 — any 6-8gb consumer gpu, old or new — this model runs on it. world cup season, so i told it to build a soccer themed flappy bird clone. one shot, zero iteration, fully playable. six months ago an 8gb model could barely clone vanilla flappy bird. now it's shipping a themed game from a sparse MoE model running locally on a laptop battery. inference benchmarks: - decode throughput: 30 tok/s - context: 64k. this is the real unlock. 64k ctx is what makes a hermes agent loop viable locally on this model, not just single-turn chat. llama.cpp flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 -cmoe --port 8080 game's deployed on my own site, built and shipped end to end with open source llm, zero closed source api dependency in the pipeline. link in the description. gguf weights on huggingface, link in the comments. pull it down, run it on whatever 8gb card is sitting in your rig. try the game and tell me your score and what you want in v2. local llms on consumer gpus stopped being a meme.

six months ago this wasn't happening on 8gb vram. running unsloth's Q4_K_XL quant of gemma 4 26b-a4b-it-qat, a sparse MoE model with only 4b active params on a single rtx 4060 laptop gpu, 8gb vram, 20+ tok/s decode. no cloud, no api, no offload hacks. just a gaming laptop on battery. what makes it fit: google's QAT (quantization aware training), plus MTP (multi token prediction) support in the latest llama.cpp builds. that combo is the single biggest unlock for local inference on low vram. rtx 3060, rtx 3070, gtx 1070, gtx 1080, rtx 4050, rtx 4060, rtx 5050, rtx 5060 — any 6-8gb consumer gpu, old or new — this model runs on it. world cup season, so i told it to build a soccer themed flappy bird clone. one shot, zero iteration, fully playable. six months ago an 8gb model could barely clone vanilla flappy bird. now it's shipping a themed game from a sparse MoE model running locally on a laptop battery. inference benchmarks: - decode throughput: 30 tok/s - context: 64k. this is the real unlock. 64k ctx is what makes a hermes agent loop viable locally on this model, not just single-turn chat. llama.cpp flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -c 64000 -cmoe --port 8080 game's deployed on my own site, built and shipped end to end with open source llm, zero closed source api dependency in the pipeline. link in the description. gguf weights on huggingface, link in the comments. pull it down, run it on whatever 8gb card is sitting in your rig. try the game and tell me your score and what you want in v2. local llms on consumer gpus stopped being a meme.

Alok

59,908 просмотров • 7 дней назад

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

We deployed a fully private AI agent on NuNet in under 5 minutes 🚀 OpenClaw🦞 running Qwen through ollama , one of the hottest open source model families right now, entirely on decentralized compute. No cloud. No API keys. No data leaving the machine. This is what private AI looks like when you actually build it instead of just talking about it. Your model. Your hardware. Your rules. Full walkthrough showing exactly how it works: What should we deploy next?

NuNet 🌐

87,536 просмотров • 3 месяцев назад

no prompt engineering, no agentic harness, no tool calls. just me being lazy in llama.cpp's web ui and gemma 4 31b dense taking the task seriously. i typed "create gpu marketplace cards with hardware specs and prices per hour" and the model went and coded this ui, one shot, navy bg, glassmorphism cards, neon accent buttons, realistic pricing tiers per architecture. it even wrote a "why this looks premium" explanation under the code. for context this is a q4 quant of google's 31b dense thinking model, running on a rtx 5090 mobile 24gb in the rog scar 18 at around 15 tok/s sustained, same vram tier as a 3090 or 4090 desktop, so whatever you see here translates directly to your card at home. the whole interaction was me not trying and the model reasoning harder than the prompt deserved. that tells me more about where local ai is at in april 2026 than any leaderboard score. next test drops gemma 4 into hermes agent, autonomous tool calling, multi step reasoning, real agentic loop instead of a chat window. let's see what the same model does when it gets the right environment. more experiments coming anon. octopus invaders queued. same hardware, different tasks, all published here on x and all translatable to your 24gb card. for now the video below shows it coding live, gpu going brrr.

no prompt engineering, no agentic harness, no tool calls. just me being lazy in llama.cpp's web ui and gemma 4 31b dense taking the task seriously. i typed "create gpu marketplace cards with hardware specs and prices per hour" and the model went and coded this ui, one shot, navy bg, glassmorphism cards, neon accent buttons, realistic pricing tiers per architecture. it even wrote a "why this looks premium" explanation under the code. for context this is a q4 quant of google's 31b dense thinking model, running on a rtx 5090 mobile 24gb in the rog scar 18 at around 15 tok/s sustained, same vram tier as a 3090 or 4090 desktop, so whatever you see here translates directly to your card at home. the whole interaction was me not trying and the model reasoning harder than the prompt deserved. that tells me more about where local ai is at in april 2026 than any leaderboard score. next test drops gemma 4 into hermes agent, autonomous tool calling, multi step reasoning, real agentic loop instead of a chat window. let's see what the same model does when it gets the right environment. more experiments coming anon. octopus invaders queued. same hardware, different tasks, all published here on x and all translatable to your 24gb card. for now the video below shows it coding live, gpu going brrr.

Sudo su

28,939 просмотров • 2 месяцев назад

Open-source just won... Agent S2 is an AI agent that uses your computer like a human. → No fake demos → No terminal hallucinations → Just real results and it’s FREE. Here’s why this is the future of work: 👇

Open-source just won... Agent S2 is an AI agent that uses your computer like a human. → No fake demos → No terminal hallucinations → Just real results and it’s FREE. Here’s why this is the future of work: 👇

Shruti

206,646 просмотров • 1 год назад

OK it's a game changer You can create and run an AI agent 100% locally on your laptop. No need to connect or pay a provider like OpenAI or Anthropic. Combine the open source Google Gemma 3 model, Smolagents and LM Studio and you're ready. (Links and resources below) #gifted

OK it's a game changer You can create and run an AI agent 100% locally on your laptop. No need to connect or pay a provider like OpenAI or Anthropic. Combine the open source Google Gemma 3 model, Smolagents and LM Studio and you're ready. (Links and resources below) #gifted

Paul Couvert

78,710 просмотров • 1 год назад

i pointed hermes agent at nvidia's nemotron cascade 2 30B-A3B on a single RTX 3090 24GB. IQ4_XS quant by bartowski, 187 tok/s, 625K context. had it discover its own hardware, create an identity file, then build a full GPU marketplace UI from a single prompt. it one shotted it. first attempt no iteration. qwen 3.5 35B-A3B on the same hardware same 3090 24GB took an iteration to recover from a blank screen on the same type of build. 24 days between these two models releasing. same active parameters, completely different architectures and cascade 2 through hermes agent just keeps going. this model goes on and on. feast your eyes. more iterations and tests dropping soon. nvidia really cooked. no special flags needed. nvidia optimized this mamba MoE so well it just runs. flash attention auto enabled, context auto allocated. the model does the work not the config. but i compiled llama.cpp from source and i'm not sure how it performs on other engines. if you ran nemotron on any hardware drop your numbers below. RTX, AMD, Mac, whatever. model, quant, tok/s, engine. i want to see if it holds everywhere or just on llama.cpp.

i pointed hermes agent at nvidia's nemotron cascade 2 30B-A3B on a single RTX 3090 24GB. IQ4_XS quant by bartowski, 187 tok/s, 625K context. had it discover its own hardware, create an identity file, then build a full GPU marketplace UI from a single prompt. it one shotted it. first attempt no iteration. qwen 3.5 35B-A3B on the same hardware same 3090 24GB took an iteration to recover from a blank screen on the same type of build. 24 days between these two models releasing. same active parameters, completely different architectures and cascade 2 through hermes agent just keeps going. this model goes on and on. feast your eyes. more iterations and tests dropping soon. nvidia really cooked. no special flags needed. nvidia optimized this mamba MoE so well it just runs. flash attention auto enabled, context auto allocated. the model does the work not the config. but i compiled llama.cpp from source and i'm not sure how it performs on other engines. if you ran nemotron on any hardware drop your numbers below. RTX, AMD, Mac, whatever. model, quant, tok/s, engine. i want to see if it holds everywhere or just on llama.cpp.

Sudo su

70,741 просмотров • 3 месяцев назад

When your tool is open source and free, your creativity has no ceiling. The ComfyUI skill in Nous Research Hermes Agent lets you compose sophisticated workflows by chatting to an agent. Try it today.

When your tool is open source and free, your creativity has no ceiling. The ComfyUI skill in Nous Research Hermes Agent lets you compose sophisticated workflows by chatting to an agent. Try it today.

ComfyUI

162,951 просмотров • 1 месяц назад

Introducing Agent Cookie. 🥷🏻🍪 For anyone running OpenClaw🦞 or Nous Research's Hermes on a Mac mini: I kept finding my agent logged out of everything, and it sucked. So I fixed it. "Add this to my Amazon cart." Sorry, logged out again. "Order my usual on Instacart." Nope, not logged in anymore. The fix: your laptop's cookies, CLI tokens, and API keys sync to your Mac mini. Continuously. Encrypted end-to-end over your Tailscale tailnet. No logging in twice. 🌐

Introducing Agent Cookie. 🥷🏻🍪 For anyone running OpenClaw🦞 or Nous Research's Hermes on a Mac mini: I kept finding my agent logged out of everything, and it sucked. So I fixed it. "Add this to my Amazon cart." Sorry, logged out again. "Order my usual on Instacart." Nope, not logged in anymore. The fix: your laptop's cookies, CLI tokens, and API keys sync to your Mac mini. Continuously. Encrypted end-to-end over your Tailscale tailnet. No logging in twice. 🌐

Matt Van Horn

254,612 просмотров • 28 дней назад

HERMES AGENT NOW RUNS ON AN 8GB LAPTOP GPU JUST AS EASILY AS IT RUNS ON A 128GB MINI PC Nous Research shipped the official Hermes Agent Desktop App this week. Someone pointed it at a local llama server running on an RTX 4060 with 16GB system RAM. The integration took two minutes The model behind it: Gemma 4 26B MoE, QAT quantized, running on 8GB of VRAM. A 60k token prompt held a stable 20 tokens a second, flat, no slowdown as context grew. The flags were nothing exotic, just -cmoe -c 248000 on llama.cpp What that 8GB setup does out of the box: reads and patches its own code, runs it in a terminal, debugs errors, manages GitHub repos, spawns sub-agents for parallel work. Browses the web with vision to debug a UI. Schedules cron jobs in plain language. Connects to Notion, Google Workspace, Linear, and Obsidian to manage tasks on its own That's the same agent layer running on a Minisforum MS-S1 MAX with 128GB of unified memory, 96GB of it to the GPU, holding a 120B model at 56 tokens a second instead of a 26B model at 20. Same software, same tool execution, same zero API key. The only thing that changes between an $800 laptop and a $2,000 mini PC is how big a model you can afford to run underneath it The barrier to running a real autonomous agent locally didn't just drop. It dropped all the way down to hardware most people already own

HERMES AGENT NOW RUNS ON AN 8GB LAPTOP GPU JUST AS EASILY AS IT RUNS ON A 128GB MINI PC Nous Research shipped the official Hermes Agent Desktop App this week. Someone pointed it at a local llama server running on an RTX 4060 with 16GB system RAM. The integration took two minutes The model behind it: Gemma 4 26B MoE, QAT quantized, running on 8GB of VRAM. A 60k token prompt held a stable 20 tokens a second, flat, no slowdown as context grew. The flags were nothing exotic, just -cmoe -c 248000 on llama.cpp What that 8GB setup does out of the box: reads and patches its own code, runs it in a terminal, debugs errors, manages GitHub repos, spawns sub-agents for parallel work. Browses the web with vision to debug a UI. Schedules cron jobs in plain language. Connects to Notion, Google Workspace, Linear, and Obsidian to manage tasks on its own That's the same agent layer running on a Minisforum MS-S1 MAX with 128GB of unified memory, 96GB of it to the GPU, holding a 120B model at 56 tokens a second instead of a 26B model at 20. Same software, same tool execution, same zero API key. The only thing that changes between an $800 laptop and a $2,000 mini PC is how big a model you can afford to run underneath it The barrier to running a real autonomous agent locally didn't just drop. It dropped all the way down to hardware most people already own

NO1ennn

40,079 просмотров • 8 дней назад

small local model that falls apart in bloated agents like openclaw just runs like a wild horse in hermes agent. and that's not even my line, someone else called it that, i've just been quietly pointing people at this harness for months because it held up on everything i threw at it, 3b models all the way to one trillion params. watch this happen on my own machine. i pointed hermes agent at a local http endpoint, gemma 4 12b on my 3090 llama.cpp server, and it auto-detected the model and started working immediately. no config wrestling, no broken tool calls, no babysitting the output format, i typed in a url and it just went. the whole clip is exactly that, start to finish, no errors, no retries, butter smooth. and the tool calling, the one thing that quietly breaks most local setups, works here like it's nothing. it's not the model that's flaky, it's the harness around it. hermes agent is the first agent i've run that actually gets that right. one url, one local model on one card, and it runs like a wild horse.

small local model that falls apart in bloated agents like openclaw just runs like a wild horse in hermes agent. and that's not even my line, someone else called it that, i've just been quietly pointing people at this harness for months because it held up on everything i threw at it, 3b models all the way to one trillion params. watch this happen on my own machine. i pointed hermes agent at a local http endpoint, gemma 4 12b on my 3090 llama.cpp server, and it auto-detected the model and started working immediately. no config wrestling, no broken tool calls, no babysitting the output format, i typed in a url and it just went. the whole clip is exactly that, start to finish, no errors, no retries, butter smooth. and the tool calling, the one thing that quietly breaks most local setups, works here like it's nothing. it's not the model that's flaky, it's the harness around it. hermes agent is the first agent i've run that actually gets that right. one url, one local model on one card, and it runs like a wild horse.

Sudo su

27,339 просмотров • 23 дней назад

I thought I needed a powerful GPU to run serious AI locally. I was completely wrong. Late 2025, Ollama quietly launched something that changed everything. They built a cloud inference layer running on NVIDIA's Blackwell architecture, the most powerful GPU hardware available today. No GPU on your desk required. No expensive API subscription. No complex setup. Just one command and you're running Kimi K2.5, a 1 trillion parameter open-source model, completely free. I connected it to OpenClaw, which links my phone's messaging apps to the AI agent running at home. I sent a message from WhatsApp. My machine picked it up, ran the task, and sent me back the result. It felt like magic. It's just good engineering. Save this post, you'll want this when you're building your first workflow. Want the SOP? DM me.

I thought I needed a powerful GPU to run serious AI locally. I was completely wrong. Late 2025, Ollama quietly launched something that changed everything. They built a cloud inference layer running on NVIDIA's Blackwell architecture, the most powerful GPU hardware available today. No GPU on your desk required. No expensive API subscription. No complex setup. Just one command and you're running Kimi K2.5, a 1 trillion parameter open-source model, completely free. I connected it to OpenClaw, which links my phone's messaging apps to the AI agent running at home. I sent a message from WhatsApp. My machine picked it up, ran the task, and sent me back the result. It felt like magic. It's just good engineering. Save this post, you'll want this when you're building your first workflow. Want the SOP? DM me.

Julian Goldie SEO

11,967 просмотров • 3 месяцев назад

Spent the week with Wolffish an AI agent that runs entirely on my own laptop. No cloud, no server, nothing leaves my machine. Set it up in about 5 minutes, plugged in my own model key, and told it to be my morning news desk. It scanned the web, cross-checked every story across outlets, and handed me one clean PDF. Local-first and open source. This is the direction.

Spent the week with Wolffish an AI agent that runs entirely on my own laptop. No cloud, no server, nothing leaves my machine. Set it up in about 5 minutes, plugged in my own model key, and told it to be my morning news desk. It scanned the web, cross-checked every story across outlets, and handed me one clean PDF. Local-first and open source. This is the direction.

Nawi

89,262 просмотров • 4 дней назад

My Hermes agent can now create LIVE datasets on the fly. Just added a real-time data layer to Hermes using the open-source TinyFish BigSet. First it builds the dataset. Then it finds the signal. This is WHAT agent-native research looks like in 2026.

My Hermes agent can now create LIVE datasets on the fly. Just added a real-time data layer to Hermes using the open-source TinyFish BigSet. First it builds the dataset. Then it finds the signal. This is WHAT agent-native research looks like in 2026.

Shubham Saboo

22,042 просмотров • 17 дней назад

🚨 Claude Code just got REPLACED overnight… for $0 An open-source project is now running a 122B model locally on your MacBook 🤯 No API fees. No cloud. No limits. Just raw power on your own machine. 1 command install. Double-click to launch. If this works at scale… it’s game over for paid AI tools. GitHub ↓

🚨 Claude Code just got REPLACED overnight… for $0 An open-source project is now running a 122B model locally on your MacBook 🤯 No API fees. No cloud. No limits. Just raw power on your own machine. 1 command install. Double-click to launch. If this works at scale… it’s game over for paid AI tools. GitHub ↓

Shruti Codes

40,655 просмотров • 2 месяцев назад

happy friday. sneak peek at some new features i'm building for the bankr agent: > sandboxed filesystem (your agent gets its own secure file system in the browser. you control what it can access. no agent running loose on your actual computer) > skill uploads (plugin new capabilities so your agent learns new things) > cli download & access > secure environment variables for api keys > github integration (connect your repo for reads and writes directly from your agent) all of this runs in a secure browser environment. no desktop app. no messy configuration. no downloading agent harnesses from the terminal. just a sandbox you control. in the video: i download an audit skill, run it against a smart contract, and save the report straight to the filesystem. then i ask the agent to pull all my 2026 transactions and write a csv for my accountant.

happy friday. sneak peek at some new features i'm building for the bankr agent: > sandboxed filesystem (your agent gets its own secure file system in the browser. you control what it can access. no agent running loose on your actual computer) > skill uploads (plugin new capabilities so your agent learns new things) > cli download & access > secure environment variables for api keys > github integration (connect your repo for reads and writes directly from your agent) all of this runs in a secure browser environment. no desktop app. no messy configuration. no downloading agent harnesses from the terminal. just a sandbox you control. in the video: i download an audit skill, run it against a smart contract, and save the report straight to the filesystem. then i ask the agent to pull all my 2026 transactions and write a csv for my accountant.

deployer

45,638 просмотров • 2 месяцев назад

Local. Open weights. Native 4K. LTX-2 is now a 100% Open Source AI video model, and I tested it on my rig! Installation, VRAM usage, and the prompts I used for this video, below 👇

Local. Open weights. Native 4K. LTX-2 is now a 100% Open Source AI video model, and I tested it on my rig! Installation, VRAM usage, and the prompts I used for this video, below 👇

TechHalla

34,466 просмотров • 5 месяцев назад