Holy shit... Microsoft open sourced an inference framework that... runs a 100B parameter LLM on a single CPU. It's called BitNet. And it does what was supposed to be impossible. No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed. Here's how it works: Every other LLM stores weights in 32-bit or 16-bit floats. BitNet uses 1.58 bits. Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for. The result: - 100B model runs on a single CPU at 5-7 tokens/second - 2.37x to 6.17x faster than llama.cpp on x86 - 82% lower energy consumption on x86 CPUs - 1.37x to 5.07x speedup on ARM (your MacBook) - Memory drops by 16-32x vs full-precision models The wildest part: Accuracy barely moves. BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat. What this actually means: - Run AI completely offline. Your data never leaves your machine - Deploy LLMs on phones, IoT devices, edge hardware - No more cloud API bills for inference - AI in regions with no reliable internet The model supports ARM and x86. Works on your MacBook, your Linux box, your Windows machine. 27.4K GitHub stars. 2.2K forks. Built by Microsoft Research. 100% Open Source. MIT License.show more

Guri Singh
2,180,357 görüntüleme • 3 ay önce
Someone just built a desktop app that that generates... 3D models from images and runs 100% locally. It's called Modly. It runs entirely on your GPU, no cloud, no API bills. Just drop an image and get a 3D mesh. 100% Open Source.show more

How To AI
222,119 görüntüleme • 1 ay önce
PewDiePie just hit 20K GitHub stars in under 24... hours. The project? Odysseus. A self-hosted AI workspace that runs 100% on your machine. • Agents with tools • MCP built in • Persistent memory • File handling • Windows, macOS, Linux Your data never leaves your device. It supports Ollama, llama.cpp, and vLLM locally with OpenAI and OpenRouter support if you want cloud models too. The crazy part? A YouTuber with 110M+ subscribers just out-shipped most AI startups. And he built half of it using AI.show more

Charlie Hills
15,366 görüntüleme • 10 gün önce
Cancelled ChatGPT -> Built JARVIS -> Pays $0 ->... it works offline + it's smarter than the $20/month version. No WiFi needed, no cloud, no API keys, no rate limits, no queues, no $20/month just to ask a server in Virginia for the weather. Just a local model running directly on the laptop hardware, voice activated, system integrated, controlling apps, answering questions, doing the work. Iron Man had JARVIS embedded in his suit, this guy has it embedded in his MacBook and it works on a plane, in a basement, on a remote cabin with zero signal. OpenAI is burning $700,000 a day on infrastructure to deliver something this guy runs for free. Anthropic charges $200/month for unlimited Claude access, microsoft built Copilot into every product they sell. This guy skipped all of it, downloaded a model and made his laptop the smartest device in the room. No subscription. No login. No internet. No data sent anywhere ever. The most powerful AI assistant on earth is now the one running locally on hardware you already own. ChatGPT charges you to think slower, he pays nothing and thinks alone, he made it himself.show more

Defileo🔮
153,466 görüntüleme • 1 ay önce
🚨 Alibaba just open sourced a GUI agent that... lives inside your webpage and controls it with natural language. It's called Page Agent and it's not a browser extension. It's pure JavaScript no Python, no Puppeteer, no headless browser, no screenshots. Just one script tag and your web app understands natural language. Here's what it actually does: → Embed it with a single tag or npm install → Control any web interface with plain English commands → Text-based DOM manipulation no OCR, no vision models needed → Bring your own LLM (GPT, Claude, Qwen, anything) → Ships a built-in UI with human-in-the-loop support → Turn 20-click ERP/CRM workflows into one sentence → Optional Chrome extension for multi-tab agent tasks → Works on any web app SaaS, admin panels, internal tools Companies are charging $30/month for AI copilots built on this exact idea. This is 3 lines of code. Your users. Your interface. The AI copilot layer for every web app just got open sourced. 1.6K stars. 100% Open Source. (Link in the comments)show more

Ihtesham Ali
134,069 görüntüleme • 3 ay önce
this is the worst local AI will ever be.... tomorrow it gets faster. next month the models get smarter. next year your GPU runs what a data center runs today. Qwen3.5-35B-A3B on a single 3090. told it to visualize its own expert routing. 256 experts, 8 active per token, rendered in 3D on the same GPU running inference. no API key. no subscription. no permission needed. closed AI isn't losing ground. it's losing the argument.show more

Sudo su
106,710 görüntüleme • 3 ay önce
Introducing Pods Hyperspace Pods lets a small group of... people - a family, a startup, a few friends, to pool their laptops and desktops into one AI cluster. Everyone installs the CLI, someone creates a pod, shares an invite link, and the machines form a mesh. Models like Qwen 3.5 32B or GLM-5 Turbo that need more memory than any single laptop has get automatically sharded across the group's devices - layers split proportionally, inference pipelined through the ring. From the outside it looks like one OpenAI-compatible API endpoint with a pk_* key that drops straight into your AI tools and products. No configuration beyond pasting the key and changing the base URL. A team of five paying for cloud AI burns $500–2,000 a month on API calls. The same team's existing machines can serve Qwen 3.5 (competitive on SWE-bench) and GLM-5 Turbo (#1 on BrowseComp for tool-calling and web research) for free - the hardware is already on their desks. When a query genuinely needs a frontier model nobody has locally, the pod falls back to cloud at wholesale rates from a shared treasury. But for the daily work - code reviews, refactors, research, drafting - local models handle it and nobody gets billed. And when it is idle, you can rent out your pod on the compute marketplace, with fine-grained permissions for access management. There's no central server involved in inference. Prompts go from your machine to your pod members' machines and back: all of this enabled by the fully peer-to-peer Hyperspace network. Pod state - who's a member, which API keys are valid, how much treasury is left - is replicated across members with consensus, so the whole thing works on a local network. Members behind home routers don't need port forwarding either. The practical setup for most pods is three models covering different jobs: Qwen 3.5 32B for code and reasoning, GLM-5 Turbo for browsing and research, Gemma 4 for fast lightweight tasks. All running on hardware you already own. Pods ship today in Hyperspace v5.19. Model sharding, API keys, treasury, and Raft coordinator are all live. What Makes This Different - No middleman. Your prompts travel from your IDE to your pod members' hardware and back. There is no server in between reading your data. - No vendor lock-in. Pod membership, API keys, and treasury are replicated across your own machines using Raft consensus. If the internet goes down, your local network keeps working. There is no database in someone else's cloud that your pod depends on. - Automatic sharding. You don't configure layer ranges or calculate VRAM budgets. Tell the pod which model you want. It figures out how to split it across whatever hardware is online. - Real NAT traversal. Your friend behind a home router with a dynamic IP? Works. No VPN, no Tailscale, no port forwarding. The nodes handle it. - Free when local. This is the part that matters most. Cloud AI bills scale with usage. Pod inference on local hardware scales with nothing. The marginal cost of your 10,000th prompt is the electricity your laptop was already using. Coming soon: - Pod federation: pods form alliances with other pods. - Marketplace: pods with spare capacity can sell inference to other pods.show more

Varun
304,284 görüntüleme • 1 ay önce
Just dropped on HF — NeuTTS Air Next-gen on-device... TTS that matches cloud-level quality while staying fully open source. > Real-time speech synthesis on CPU/GPU > 3-second voice cloning, no cloud or data upload > Compact: under 200 MB, runs on mobile and edge devices > Multilingual and expressive > Developed by Neuphonic , optimized for speed and fidelityshow more

steven
72,273 görüntüleme • 8 ay önce
🔥 BREAKING: Open source just leveled up AI agents... Eigent gives you a fully local, customizable AI workforce....built to run on your laptop. → No vendor lock-in → No cloud dependency → 100% open source Just fast, private, parallel agents you control (Here's how):👇show more

Shruti
63,497 görüntüleme • 10 ay önce
JENSEN HUANG UNVEILED A BOARD THAT RUNS 1 TRILLION... PARAMETER AI MODELS. THE $249 NVIDIA BOX UNDER YOUR DESK KILLS A $200/MONTH AI BILL FOR $5 IN ELECTRICITY jensen held it up on stage with one hand and called it the architecture that runs the future of ai. that same technology now ships in a $249 box smaller than your wallet the jetson orin nano super pulls 7-25 watts and does 67 trillion ai operations per second. llama 3, mistral and deepseek run locally with no api fees and no data leaving your machine most developers pay $2,400 a year across chatgpt, openai api, claude pro and cursor. the jetson costs $314 in year one and $60 a year after. 2 year savings hit $4,431 install ollama with one command, change one line of code to point at localhost, and every tool built for openai works identically. zero rewrites, zero rate limits cloud subscriptions keep getting more expensive and rate limits keep getting tighter. the people who own the box in 2026 are going to look very far ahead in 2028 bookmark this and read the article belowshow more

starmex
54,086 görüntüleme • 11 gün önce
This Chinese developer launched Llama 70B locally on a... MacBook on a plane and for a full 11 hours without internet ran client projects. He was sitting by the window on a transatlantic flight with a MacBook Pro M4 with 64 GB of memory. WiFi on board cost $25 for the flight. He declined. No cloud API, no connection to Anthropic or OpenAI servers, no internet at all. Just a local Llama 3.3 70B on bf16 and his own orchestrator script. The model runs through llama.cpp. Generation speed, 71 tokens per second. Context around 60,000 tokens. Memory usage, 48.6 GiB out of 64. Battery at takeoff, 3 hours 21 minutes. And he gave the orchestrator this system prompt before takeoff: "You are an offline orchestrator running on a single MacBook. There is no network. The only resources you have are local files in /Users/dev/work, the Llama 70B inference server at localhost:8080, and a battery budget of 3 hours 21 minutes. Process the queue at /Users/dev/work/queue.jsonl (one client task per line). For each task: draft → run local evals → save artefact to /Users/dev/work/done/. Save context checkpoints every 12 tasks so you can resume after a battery swap. Stop only on empty queue or when battery drops below 5%." So the system knows exactly what resources it is running on. It knows it has no connection to the outside world for the next 11 hours. It knows it has finite memory and a finite battery. It knows the human will not intervene until the plane lands. The system runs in 1 loop. Takes a task from the queue, runs it through inference, saves the artifact, writes a checkpoint. Task after task, just like that. And only when the battery drops below 5% does the orchestrator automatically pause, waits for the laptop to switch to the backup power bank, and continues from the last checkpoint. Here is what the system actually writes in his log during the flight: "saved context checkpoint 8 of 12 (pos_min = 488, pos_max = 50118, size = 62.813 MiB)" "restored context checkpoint (pos_min = 488, pos_max = 50118)" "prompt processing progress: n_tokens = 50 / 60 818" "task 37016 done | tps = 71 s tokens text → /Users/dev/work/done/proposal_westside.md" Outside the window, clouds, blue sky, and no WiFi. On the tray, 1 MacBook, an open terminal on 2 screens, and an inference server on localhost. From what I have observed, this is the cleanest offline AI workflow I have seen in the past year: 11 hours of flight, $0 for WiFi, and the entire client queue closed before landing.show more

Blaze
1,824,020 görüntüleme • 1 ay önce
NVIDIA just made paying for AI feel optional. Open... model, a million tokens of context, free tier with no per-token cost, runs on your own hardware. Entire codebases, whole data rooms, a year of chat logs, all swallowed in one prompt. No chunking, no RAG, no rate limit theater. The closed-AI premium has 90 days to defend itself. Bookmark this and come back. Open beat closed. Again.show more

shmidt
293,215 görüntüleme • 5 gün önce
BlackBird now runs on 8GB RAM Macs. No GPU.... No cloud. Just fast, private AI agents - right on your MacBook Air. We optimized memory, speed, and thermal performance so anyone can build with AI. Try it: Next Stop: Windows Beta Drops This Week! DM Me if you want to try it. #OnDeviceAI #BlackBird #AIforEveryone #macOSshow more

Hina Dixit
1,233,525 görüntüleme • 1 yıl önce
NVIDIA just dropped Nemotron-3.5-ASR: one 0.6B model, 40+ languages,... streaming. parakeet.cpp already runs it. On a plain CPU, 2.5x faster than NVIDIA AI 's Nemo runtime, output byte-for-byte identical (WER 0). No GPU needed. Offline or real-time. Pick a language with --lang, or auto. GPU numbers are coming to compare with Nemo framework.show more

Ettore Di Giacinto
76,296 görüntüleme • 6 gün önce
Today we’re open-sourcing Stable Audio Open Small, a 341M-parameter... text-to-audio model optimized to run entirely on Arm CPUs. This means 99% of smartphones can now generate music-production samples in seconds, right on-device with no internet required. Built for fast, on-the-go creation, it turns your next quick idea into up to 11 seconds of audio. Generate drum loops, foley, riffs, and textures right where you are. No cords 🔌 just chords 🎹 You can learn more here:show more

Stability AI
94,621 görüntüleme • 1 yıl önce
Llama 3.2 is the latest open-source AI model from... Meta, released only a few hours ago. Here is the 3B parameter model running on Akash Chat at 165 tokens/second, powered by NVIDIA A100s on Akash. Try Llama 3.2 for free, no sign-in required:show more

Akash Network
37,087 görüntüleme • 1 yıl önce
The first phone where your AI never leaves your... device. No cloud processing. No data harvesting. Complete AI sovereignty. Built on Galaxy S25 Edge hardware. Earn rewards through the Gaia network. 1,000 units now available. Additional releases planned.show more

Gaia 🌱
157,338 görüntüleme • 9 ay önce
Meet Stable Audio 3.0, the open-weight model family built... for artistic experimentation. This is our open invitation to experiment with generative audio. We believe the best innovations are still waiting to be built. The 4-1-1 on 3.0: 📣 You own your outputs, and can distribute and commercialize them under the Stability AI Community License (up to $1 million in revenue). 🎵 New and improved capabilities include variable-length generation up to six minutes, and full song composition on portable devices, no GPU required. ✅ Trained on a fully licensed dataset. 🎨 You can customize the models on your own library with support for LoRa training, which we’ve documented for the first time. More on the models 👇show more

Stability AI
150,180 görüntüleme • 22 gün önce
Meet #DBRX: a general-purpose LLM that sets a new... standard for efficient open source models. Use the DBRX model in your RAG apps or use the DBRX design to build your own custom LLMs and improve the quality of your GenAI applications.show more

Databricks
327,704 görüntüleme • 2 yıl önce
GOOGLE'S GEMMA 4 12B RUNS AT 21 TOKENS PER... SECOND ON A BUDGET RTX 4060 LOCALLY AND THE BENCHMARKS SHOULD NOT BE THIS GOOD FOR A 6.6GB FILE. 77.5% on MATH Olympiad, 78.8% on expert science, 72% on real code. No API. No cloud. No subscription.show more

0xMarioNawfal
44,387 görüntüleme • 8 gün önce
🚨 One photo of your face. That's all someone... needs to become you on a live video call. In real time. Right now. The tool is free and open source. It's called Deep-Live-Cam. One image. One click. You become anyone on a live webcam feed. No training. No datasets. No waiting. Instant. Your face. Your expressions. Your mouth movements. All stolen from a single photo. Here's what this thing does: → Upload one photo of any face → Turn on your webcam → You are now that person. Live. In real time. → It matches your pose, your expressions, even your lighting → Mouth masking so the swapped face moves its lips when you talk → Multi-face mapping. Swap different faces on different people in the same call. → Virtual camera output. Plug it into Zoom, Google Meet, Teams. Nobody knows. → Works on NVIDIA, AMD, Intel, and Apple Silicon Here's the part that should terrify you: Your boss could be on a Zoom call with someone wearing your face right now. A scammer could call your parents looking exactly like you. A stranger could take your LinkedIn photo and become you in a video meeting. IShowSpeed's reaction when he saw it: "What the F**! This shit is crazy!" SomeOrdinaryGamers: "That's fucking freaky dude... that's so wild." This was the #1 trending repo on GitHub the day it launched. 1,600 stars in 24 hours. 80K+ stars today. No one is ready for what this means. And it's already out there. 100% Open Source.show more

Nav Toor
301,215 görüntüleme • 2 ay önce