Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Your local AI just got up to 5x more memory. Same model. Same device. Nearly zero accuracy loss. QVAC SDK 0.12.0 integrates TurboQuant - Google Research's latest memory optimisation algorithm. What is TurboQuant? The KV cache is the memory your model uses to track a conversation. As context grows,... show more

QVAC

10,124 subscribers

15,799,748 görüntüleme • 1 ay önce •via X (Twitter)

Eğitim Haberler & Politika Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Yesterday we announced that the QVAC SDK update unlocked up to 5x more context on your device thanks to TurboQuant. Today, we’ll go through how we got there. TurboQuant (Google Research, ICLR 2026) is a two-stage KV-cache compression algorithm. Stage 1 - PolarQuant: convert KV vectors from Cartesian (x, y, z...) to polar coordinates. Angles compress predictably down to 3-4 bits. Stage 2 - QJL: 1-bit Johnson-Lindenstrauss correction. Cleans up residual error. Total: ~4-5 bits per value. No retraining. No calibration. QVAC ported it to Vulkan inside qvac-fabric-llm.cpp. Currently, TurboQuant is supported only for AMD & NVIDIA GPUs, support for iOS, Android & Apple Silicon coming next. Full algorithm walkthrough + benchmarks + code examples →

Yesterday we announced that the QVAC SDK update unlocked up to 5x more context on your device thanks to TurboQuant. Today, we’ll go through how we got there. TurboQuant (Google Research, ICLR 2026) is a two-stage KV-cache compression algorithm. Stage 1 - PolarQuant: convert KV vectors from Cartesian (x, y, z...) to polar coordinates. Angles compress predictably down to 3-4 bits. Stage 2 - QJL: 1-bit Johnson-Lindenstrauss correction. Cleans up residual error. Total: ~4-5 bits per value. No retraining. No calibration. QVAC ported it to Vulkan inside qvac-fabric-llm.cpp. Currently, TurboQuant is supported only for AMD & NVIDIA GPUs, support for iOS, Android & Apple Silicon coming next. Full algorithm walkthrough + benchmarks + code examples →

QVAC

14,472,009 görüntüleme • 1 ay önce

QVAC SDK 0.12.0 is now live, bringing longer context, increased memory optimisation, new modalities, and broader ecosystem support directly to your device. Key Features and Updates: - TurboQuant KV-Cache Quantization: Fit much longer context in the same memory. TurboQuant, an algorithm from Google Research, compresses the KV cache by up to 5x, near-lossless. - Text-to-Video: Generate video from a text prompt, fully local, with the new wan2.1 model in the Diffusion addon - Apple Metal Performance for Flux2-klein: Diffusion on Apple Silicon now matches MLX performance, the native benchmark for Apple GPUs - Robot Control (new VLA addon): A GGML-based Vision-Language-Action addon brings fast, efficient robot control to edge devices - Coding Assistant / Harness Support: QVAC now works with OpenCode and OpenClaw as a local provider. A new @qvac/ai-sdk-provider package automates model registry and provider integration - Cross-Platform Voice: Text-to-speech and Parakeet transcription moved from ONNX to the GGML engine for better CPU and GPU support on macOS, iOS, Windows, Linux, and Android. Parakeet also adds long-term streaming diarization (tracking who spoke when on live audio) - Faster Lightweight Visual Classification: A new GGML-based Classification addon delivers millisecond-level classification, useful where a vision-language model (VLM) would be unnecessarily slow - Under the Hood: Fabric synced to llama.cpp v8828 (from v8189), plus GPU acceleration added to image-upscale models for faster results Full release notes:

QVAC SDK 0.12.0 is now live, bringing longer context, increased memory optimisation, new modalities, and broader ecosystem support directly to your device. Key Features and Updates: - TurboQuant KV-Cache Quantization: Fit much longer context in the same memory. TurboQuant, an algorithm from Google Research, compresses the KV cache by up to 5x, near-lossless. - Text-to-Video: Generate video from a text prompt, fully local, with the new wan2.1 model in the Diffusion addon - Apple Metal Performance for Flux2-klein: Diffusion on Apple Silicon now matches MLX performance, the native benchmark for Apple GPUs - Robot Control (new VLA addon): A GGML-based Vision-Language-Action addon brings fast, efficient robot control to edge devices - Coding Assistant / Harness Support: QVAC now works with OpenCode and OpenClaw as a local provider. A new @qvac/ai-sdk-provider package automates model registry and provider integration - Cross-Platform Voice: Text-to-speech and Parakeet transcription moved from ONNX to the GGML engine for better CPU and GPU support on macOS, iOS, Windows, Linux, and Android. Parakeet also adds long-term streaming diarization (tracking who spoke when on live audio) - Faster Lightweight Visual Classification: A new GGML-based Classification addon delivers millisecond-level classification, useful where a vision-language model (VLM) would be unnecessarily slow - Under the Hood: Fabric synced to llama.cpp v8828 (from v8189), plus GPU acceleration added to image-upscale models for faster results Full release notes:

QVAC

9,932,369 görüntüleme • 1 ay önce

Ready to build the future of stable private on-device AI? 🧠 Our latest tutorial shows you how to build a sovereign mobile app in minutes using the QVAC SDK and Expo. Start from a blank template and deploy in minutes a local Llama 3.2 inference running directly on your own devices. What you’ll learn: Modular Setup: Use the QVAC CLI to tree-shake and keep your mobile bundle lean. Local-First Flow: Initialize the SDK, download weights, and run high-speed inference without a cloud uplink. Cross-Platform Power: See the smoke test in action on a physical Samsung S25. No rented clouds. No API keys. Build local, on-device, unstoppable intelligence in your pocket. Watch the full guide and start building:

Ready to build the future of stable private on-device AI? 🧠 Our latest tutorial shows you how to build a sovereign mobile app in minutes using the QVAC SDK and Expo. Start from a blank template and deploy in minutes a local Llama 3.2 inference running directly on your own devices. What you’ll learn: Modular Setup: Use the QVAC CLI to tree-shake and keep your mobile bundle lean. Local-First Flow: Initialize the SDK, download weights, and run high-speed inference without a cloud uplink. Cross-Platform Power: See the smoke test in action on a physical Samsung S25. No rented clouds. No API keys. Build local, on-device, unstoppable intelligence in your pocket. Watch the full guide and start building:

QVAC

4,080,718 görüntüleme • 3 ay önce

Two islands. Two futures. 🏝️ One chose to trust its people with intelligence. The other turned them into the product. QVAC is the foundation for a sovereign future. No central servers, no "Department of Truth," and no surveillance. Just local-first AI that lives on your device, learns with you, and belongs to you. Your data. Your device. Your freedom. Build the right choice:

Two islands. Two futures. 🏝️ One chose to trust its people with intelligence. The other turned them into the product. QVAC is the foundation for a sovereign future. No central servers, no "Department of Truth," and no surveillance. Just local-first AI that lives on your device, learns with you, and belongs to you. Your data. Your device. Your freedom. Build the right choice:

QVAC

3,599,437 görüntüleme • 3 ay önce

The QVAC SDK is the "LEGO block" of the next era of computing. It’s a modular, local-first framework designed to turn anything—from a simple robot to an industrial server—into a sovereign, autonomous mind. Why build with QVAC? Atomic Intelligence: AI as a raw material embedded directly into your hardware. No Cloud Dependency: 0 latency and total privacy. If the internet breaks, your world keeps thinking. Infinite Scale: A single API for local AI that runs on any device, anywhere. From a child’s toy to the fabric of the universe, if you can dream it, you can build it. Start building the future: 🚀

The QVAC SDK is the "LEGO block" of the next era of computing. It’s a modular, local-first framework designed to turn anything—from a simple robot to an industrial server—into a sovereign, autonomous mind. Why build with QVAC? Atomic Intelligence: AI as a raw material embedded directly into your hardware. No Cloud Dependency: 0 latency and total privacy. If the internet breaks, your world keeps thinking. Infinite Scale: A single API for local AI that runs on any device, anywhere. From a child’s toy to the fabric of the universe, if you can dream it, you can build it. Start building the future: 🚀

QVAC

4,707,146 görüntüleme • 3 ay önce

QVAC SDK 0.11.0 is live. 🛠️ This release focuses entirely on unlocking next-generation local compute and advanced visual workflows. What’s new: Next-Gen Models: Core engine updated to the latest version of Fabric, unlocking full support for Qwen 3.5, Qwen 3.6, and Gemma 4. Multi-GPU Support: The SDK can now split workloads across multiple graphics cards on the same machine, allowing you to run significantly larger models completely locally. Multi-Image Conditioning: Blend multiple reference images together in a single generation for advanced style mixing and composition control. On-Device Upscaling: Boost your generated images to high-quality resolutions, running securely on your own hardware. More improvements are waiting under the hood. Check the change logs, update your SDK today, and start building with

QVAC SDK 0.11.0 is live. 🛠️ This release focuses entirely on unlocking next-generation local compute and advanced visual workflows. What’s new: Next-Gen Models: Core engine updated to the latest version of Fabric, unlocking full support for Qwen 3.5, Qwen 3.6, and Gemma 4. Multi-GPU Support: The SDK can now split workloads across multiple graphics cards on the same machine, allowing you to run significantly larger models completely locally. Multi-Image Conditioning: Blend multiple reference images together in a single generation for advanced style mixing and composition control. On-Device Upscaling: Boost your generated images to high-quality resolutions, running securely on your own hardware. More improvements are waiting under the hood. Check the change logs, update your SDK today, and start building with

QVAC

2,006,449 görüntüleme • 2 ay önce

Introducing QVAC - Infinite intelligence. Local. Any Hardware. Peer-to-Peer Hyper Swarm. No cloud. No compromise. QVAC is the decentralized AI platform for humans and machines. Learn more:

Introducing QVAC - Infinite intelligence. Local. Any Hardware. Peer-to-Peer Hyper Swarm. No cloud. No compromise. QVAC is the decentralized AI platform for humans and machines. Learn more:

QVAC

15,933 görüntüleme • 1 yıl önce

The world of tomorrow cannot run on a rented cloud. 🚫 With 10 billion humans and 10 billion autonomous agents, intelligence must be embedded at the edge - not centralized in a server farm. The QVAC SDK is the invisible engine for this transition. We’ve built the foundational toolkit for the next era: highly efficient, fully modular, and 100% sovereign. From a single light to an industrial grid, the power to build local-first AI is now in your hands. The revolution will not be hosted. It will be local. Learn more:

The world of tomorrow cannot run on a rented cloud. 🚫 With 10 billion humans and 10 billion autonomous agents, intelligence must be embedded at the edge - not centralized in a server farm. The QVAC SDK is the invisible engine for this transition. We’ve built the foundational toolkit for the next era: highly efficient, fully modular, and 100% sovereign. From a single light to an industrial grid, the power to build local-first AI is now in your hands. The revolution will not be hosted. It will be local. Learn more:

QVAC

2,908,291 görüntüleme • 2 ay önce

QVAC SDK 0.13.0 is live, and this version brings a lot of exciting updates! Local AI now plugs into your coding agent, ships as a desktop app in one command, and runs even more models. Highlights: NEW INTEGRATIONS - OpenCode and coding agents: the new @qvac/ai-sdk-provider makes QVAC a local provider. Less setup, same-model requests queue cleanly, and managed mode starts and supervises qvac serve for you. - Broader OpenAI-compatible API, validated across supported flows so covered capabilities stay consistent and testable. - Turn your QVAC project into a real desktop app for Mac, Windows, or Linux with a single command. The new Electron plugin handles the packaging and keeps the app small by including only what it needs. NEW MODELS - New pi0.5 model support - run a vision-language "robot brain" on a single ordinary graphics card, at full accuracy. - Image-to-video, fully local, via the Wan2.1 model in the Diffusion addon. - New BCI add-on: brain-computer interface transcription, fully local. Decode recorded neural signals into text on-device via the Whisper.cpp-based BCI model. IMPROVEMENTS - Whisper GPU transcription on Android, auto-picking the best backend (OpenCL on Adreno 700+, Vulkan elsewhere), unified on one ggml engine. - Parakeet steadier on mobile, with real end-of-utterance detection for streaming. - Supertonic TTS now runs full GPU across Metal, Vulkan, and OpenCL, with native streaming.

QVAC SDK 0.13.0 is live, and this version brings a lot of exciting updates! Local AI now plugs into your coding agent, ships as a desktop app in one command, and runs even more models. Highlights: NEW INTEGRATIONS - OpenCode and coding agents: the new @qvac/ai-sdk-provider makes QVAC a local provider. Less setup, same-model requests queue cleanly, and managed mode starts and supervises qvac serve for you. - Broader OpenAI-compatible API, validated across supported flows so covered capabilities stay consistent and testable. - Turn your QVAC project into a real desktop app for Mac, Windows, or Linux with a single command. The new Electron plugin handles the packaging and keeps the app small by including only what it needs. NEW MODELS - New pi0.5 model support - run a vision-language "robot brain" on a single ordinary graphics card, at full accuracy. - Image-to-video, fully local, via the Wan2.1 model in the Diffusion addon. - New BCI add-on: brain-computer interface transcription, fully local. Decode recorded neural signals into text on-device via the Whisper.cpp-based BCI model. IMPROVEMENTS - Whisper GPU transcription on Android, auto-picking the best backend (OpenCL on Adreno 700+, Vulkan elsewhere), unified on one ggml engine. - Parakeet steadier on mobile, with real end-of-utterance detection for streaming. - Supertonic TTS now runs full GPU across Metal, Vulkan, and OpenCL, with native streaming.

QVAC

20,922,918 görüntüleme • 1 ay önce

A toy robot. A home assistant. A humanoid on a factory floor. They all need the same thing: a brain. QVAC is that brain, and it runs on the robot itself. No cloud to phone, no latency, nothing leaving your home. The intelligence is on the machine, so it keeps thinking even with the internet off. The robots are coming. This is what wakes them up:

A toy robot. A home assistant. A humanoid on a factory floor. They all need the same thing: a brain. QVAC is that brain, and it runs on the robot itself. No cloud to phone, no latency, nothing leaving your home. The intelligence is on the machine, so it keeps thinking even with the internet off. The robots are coming. This is what wakes them up:

QVAC

5,248,713 görüntüleme • 1 ay önce

The engine of the 21st century is here. 🧠 The QVAC SDK is the "steam engine" of the AI era—decoupling intelligence from the cloud and putting it in your hands. A single API for local-first, modular AI that runs anywhere. - Sovereign: Own your engine, don't rent it. - Local: 0 latency, no cloud dependency. - Modular: Stackable, universal building blocks. The era of Stable Intelligence has begun.

The engine of the 21st century is here. 🧠 The QVAC SDK is the "steam engine" of the AI era—decoupling intelligence from the cloud and putting it in your hands. A single API for local-first, modular AI that runs anywhere. - Sovereign: Own your engine, don't rent it. - Local: 0 latency, no cloud dependency. - Modular: Stackable, universal building blocks. The era of Stable Intelligence has begun.

QVAC

10,663,625 görüntüleme • 3 ay önce

QVAC SDK 0.10.0 is now live, bringing advanced local compute capabilities and specialized hardware optimization directly to your device Key Features and Updates: - Image-to-Image Diffusion: Transform and edit images using simple prompts with 100% local compute—no cloud uploads or external servers required - Dynamic Tooling & KV Cache Management:Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache clearing to maintain high-speed inference - Doctor CLI: A new diagnostic tool that analyzes your hardware and memory, providing specific instructions on how to optimize your GPU for local AI - Suspend & Resume API: Specifically designed for mobile environments, this allows apps to pause P2P swarms and RAG workspaces to meet background rules without losing model state - GPT-OSS Compatibility: Added support for the latest GPT-OSS models loaded externally, expanding the range of open-source intelligence available on the platform Build the future of private, unstoppable AI:

QVAC SDK 0.10.0 is now live, bringing advanced local compute capabilities and specialized hardware optimization directly to your device Key Features and Updates: - Image-to-Image Diffusion: Transform and edit images using simple prompts with 100% local compute—no cloud uploads or external servers required - Dynamic Tooling & KV Cache Management:Your local LLM now receives a tailored toolbox for every interaction, with automatic KV cache clearing to maintain high-speed inference - Doctor CLI: A new diagnostic tool that analyzes your hardware and memory, providing specific instructions on how to optimize your GPU for local AI - Suspend & Resume API: Specifically designed for mobile environments, this allows apps to pause P2P swarms and RAG workspaces to meet background rules without losing model state - GPT-OSS Compatibility: Added support for the latest GPT-OSS models loaded externally, expanding the range of open-source intelligence available on the platform Build the future of private, unstoppable AI:

QVAC

34,508 görüntüleme • 2 ay önce

Google's Gemma 4 26B A4B QAT hits 25+ tokens/sec and 320+ tokens/sec prefill on 8 GB VRAM (RTX 4060) + 16 GB RAM using TurboQuant Prefill just went from 200 → 320+ tok/s on the same 8GB card. 1.6x, no new hardware, no new quant, just a KV cache trick stacked on top of the Gemma 4 26B MoE setup from a few days ago. A few days ago I posted Gemma 4 26B A4B hitting 28 tok/s decode on 8GB VRAM using native MTP. prefill was stuck around 200 tok/s. fair callout by the community. So today I tested something I'd already been meaning to try: TheTom/llama-cpp-turboquant, the TurboQuant KV cache fork by Tom Turney (Tom Turney). (github link in the comments) thanks to him, the fork just got resynced to mainline, so MTP + TurboQuant now run together cleanly (I didnt see any meaningful gains by using MTP with this setup though but you can try). The flags (No MTP): -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -cnv -c 64000 --cache-type-k q8_0 --cache-type-v turbo3 Results on the same RTX 4060 8GB, tested with a 27k token prompt at 64k context loaded: Prefill: 200 tok/s → 320+ tok/s Decode: stayed above 25 tok/s (without MTP) Why it works: TurboQuant uses walsh hadamard rotation + polar quantization on the KV cache. keys are sensitive to compression, values aren't much, so it splits the difference: K stays at q8_0, V drops to turbo3 (~3 bits). bonus from the memory savings: same 8GB card can now stretch to 100-120k context with minimal decode penalty. It should now be snappier with any agent harness such as hermes agent without compromise on intelligence. If you're already running Gemma 4 on a small card, this stacks on top for free. Try --cache-type-k q8_0 --cache-type-v turbo3 on your setup and report back what your prefill/decode split looks like. unsloth model gguf and llama.cpp turboquant fork links in the comments. what's your prefill number before vs after?

Google's Gemma 4 26B A4B QAT hits 25+ tokens/sec and 320+ tokens/sec prefill on 8 GB VRAM (RTX 4060) + 16 GB RAM using TurboQuant Prefill just went from 200 → 320+ tok/s on the same 8GB card. 1.6x, no new hardware, no new quant, just a KV cache trick stacked on top of the Gemma 4 26B MoE setup from a few days ago. A few days ago I posted Gemma 4 26B A4B hitting 28 tok/s decode on 8GB VRAM using native MTP. prefill was stuck around 200 tok/s. fair callout by the community. So today I tested something I'd already been meaning to try: TheTom/llama-cpp-turboquant, the TurboQuant KV cache fork by Tom Turney (Tom Turney). (github link in the comments) thanks to him, the fork just got resynced to mainline, so MTP + TurboQuant now run together cleanly (I didnt see any meaningful gains by using MTP with this setup though but you can try). The flags (No MTP): -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf -cnv -c 64000 --cache-type-k q8_0 --cache-type-v turbo3 Results on the same RTX 4060 8GB, tested with a 27k token prompt at 64k context loaded: Prefill: 200 tok/s → 320+ tok/s Decode: stayed above 25 tok/s (without MTP) Why it works: TurboQuant uses walsh hadamard rotation + polar quantization on the KV cache. keys are sensitive to compression, values aren't much, so it splits the difference: K stays at q8_0, V drops to turbo3 (~3 bits). bonus from the memory savings: same 8GB card can now stretch to 100-120k context with minimal decode penalty. It should now be snappier with any agent harness such as hermes agent without compromise on intelligence. If you're already running Gemma 4 on a small card, this stacks on top for free. Try --cache-type-k q8_0 --cache-type-v turbo3 on your setup and report back what your prefill/decode split looks like. unsloth model gguf and llama.cpp turboquant fork links in the comments. what's your prefill number before vs after?

Alok

119,821 görüntüleme • 1 ay önce

Say goodbye to fragmented data and hello to a unified wellness experience. Introducing QVAC Health - The app that brings your data together in one, encrypted, offline-capable environment. QVAC - Your Device, Your AI Download the App now:👉

Say goodbye to fragmented data and hello to a unified wellness experience. Introducing QVAC Health - The app that brings your data together in one, encrypted, offline-capable environment. QVAC - Your Device, Your AI Download the App now:👉

QVAC

35,707 görüntüleme • 7 ay önce

The QVAC SDK puts the "brain" directly into your pocket. From real-time on-device translation to multimodal understanding, build apps that work everywhere, even 30,000 feet in the air. Local AI is here: 💡Offline-First: No cloud, no latency, no "Department of Truth". 💻 Universal API: One codebase for iOS, Android, macOS, and Linux. 🔍 Multimodal: Understanding text, audio, and images without a server. If you can dream it, you can build it. The era of Stable Intelligence has begun. Start building:

The QVAC SDK puts the "brain" directly into your pocket. From real-time on-device translation to multimodal understanding, build apps that work everywhere, even 30,000 feet in the air. Local AI is here: 💡Offline-First: No cloud, no latency, no "Department of Truth". 💻 Universal API: One codebase for iOS, Android, macOS, and Linux. 🔍 Multimodal: Understanding text, audio, and images without a server. If you can dream it, you can build it. The era of Stable Intelligence has begun. Start building:

QVAC

36,457 görüntüleme • 3 ay önce

Superior methodology beats raw parameter count. 🧠 Introducing QVAC MedPsy: Local-first medical AI that redefines the possible. 1/ Unprecedented Power: MedPsy 1.7B model outperforms Google’s MedGemma 4B by 11 points and our 4B model beats MedGemma 27B on real-world health benchmarks. 2/ Extreme Efficiency: 3.2x fewer tokens means near-instant inference on your phone or wearable. 3/ Absolute Privacy: Expert-level reasoning running 100% locally. No data leaves your device. We aren’t simply shrinking models; we’re anchoring intelligence where it matters most. High-level medical logic is now a sovereign right. The future of healthcare is local. Learn more:

Superior methodology beats raw parameter count. 🧠 Introducing QVAC MedPsy: Local-first medical AI that redefines the possible. 1/ Unprecedented Power: MedPsy 1.7B model outperforms Google’s MedGemma 4B by 11 points and our 4B model beats MedGemma 27B on real-world health benchmarks. 2/ Extreme Efficiency: 3.2x fewer tokens means near-instant inference on your phone or wearable. 3/ Absolute Privacy: Expert-level reasoning running 100% locally. No data leaves your device. We aren’t simply shrinking models; we’re anchoring intelligence where it matters most. High-level medical logic is now a sovereign right. The future of healthcare is local. Learn more:

QVAC

2,416,342 görüntüleme • 2 ay önce

$Google just had its DeepSeek moment — and almost nobody's talking about it. Here's the story you need to know. 🧵 When DeepSeek dropped in early 2025, it didn't just impress people. It scared them. A model competing with the biggest AI players — at a fraction of the cost — through math, not hardware. Chip stocks tanked. The industry panicked. Then on March 24th, 2026, Google published a research paper called TurboQuant. Cloudflare's CEO immediately called it Google's DeepSeek moment. Memory chip stocks for Micron and Western Digital fell on the news. Why? Because TurboQuant compresses AI memory by 6x — through software alone. → No new chips needed → No retraining required → One server can now host more models than before The era of throwing hardware at AI problems is ending. The era of mathematical efficiency is here. ✅Save this post, you'll thank yourself when this reshapes every AI tool you use. 📌 Want the SOP? DM me.$

Google just had its DeepSeek moment — and almost nobody's talking about it. Here's the story you need to know. 🧵 When DeepSeek dropped in early 2025, it didn't just impress people. It scared them. A model competing with the biggest AI players — at a fraction of the cost — through math, not hardware. Chip stocks tanked. The industry panicked. Then on March 24th, 2026, Google published a research paper called TurboQuant. Cloudflare's CEO immediately called it Google's DeepSeek moment. Memory chip stocks for Micron and Western Digital fell on the news. Why? Because TurboQuant compresses AI memory by 6x — through software alone. → No new chips needed → No retraining required → One server can now host more models than before The era of throwing hardware at AI problems is ending. The era of mathematical efficiency is here. ✅Save this post, you'll thank yourself when this reshapes every AI tool you use. 📌 Want the SOP? DM me.

Julian Goldie SEO

12,731 görüntüleme • 4 ay önce

QVAC SDK 0.15.0 is live. This release adds multiple prompts batching, brings a native AMD GPU backend to the stack, moves more vision encoders onto mobile GPUs, and adds a second local coding-agent integration. Main highlights: - Prompt batching for the LLM addon. Batch multiple prompts into one job and process them concurrently, with each answer returned the moment its generation finishes. - Native AMD GPU backend. A first-class HIP/ROCm backend in @qvac/vla-ggml, auto-selected over Vulkan with clean fallback when ROCm is absent. - A second local coding agent. OpenClaw joins OpenCode for local, cloud-free agent workflows. AGENTS - OpenCode plugin update (@qvac/opencode-plugin). Aligned with the current SDK, CLI, and AI SDK provider packages. A fresh install runs OpenCode against managed local QVAC models out of the box, from the default qvac/qwen3.5-9b, with no manual qvac serve setup. - OpenClaw plugin (@qvac/openclaw-plugin). A second coding-agent integration alongside OpenCode. A fresh setup installs the plugin, creates a local qvac provider through onboarding, and runs a QVAC model through OpenClaw🦞's local service path. LANGUAGE MODELS - Prompt batching (LLM addon). Batch multiple prompts in one job and run them concurrently, each answer returns the moment its generation finishes, no waiting on the others. - Reasoning-context trimming on hybrid + recurrent models (@qvac/llm-llamacpp). remove_thinking_from_context now works beyond pure-attention models. Same JS API, no throw. VOICE AND SPEECH - Transcription (transcription-parakeet 0.9.0). More robust CPU fallback on GPU failure and a faster Vulkan backend on Pixel 9. - Text-to-speech features (tts-ggml 0.4.0). Adds LavaSR for noise removal and adjustable output frequency up to 48 kHz, plus Japanese via Chatterbox. - Text-to-speech fixes (tts-ggml 0.4.1). CPU fallback on GPU failure, a q8_0 KV crash fix on Metal with Chatterbox. VISION - Qwen3.5 vision encoder on GPU (Android). Image encoder moves onto the phone GPU, with a smarter tile-grid preprocessor and default image-token caps, for flagship Android: Vulkan on Mali (Pixel 9 Pro) and OpenCL on Adreno 830 (Galaxy S25). - Gemma-4 vision encoder on GPU (Android). Vision encoder runs on the phone GPU instead of CPU, same flagship Android targets. PLATFORM AND PERFORMANCE - AMD GPU backend (@qvac/vla-ggml). Native HIP/ROCm backend, auto-selected over Vulkan with clean fallback when ROCm is absent (Linux x64 only). Comes with ~23% faster than Vulkan, ~14% faster than PyTorch-ROCm, parity preserved. Unified code style. A cleaner, more consistent, easier-to-contribute codebase. Let's build. npm install @qvac/sdk

QVAC SDK 0.15.0 is live. This release adds multiple prompts batching, brings a native AMD GPU backend to the stack, moves more vision encoders onto mobile GPUs, and adds a second local coding-agent integration. Main highlights: - Prompt batching for the LLM addon. Batch multiple prompts into one job and process them concurrently, with each answer returned the moment its generation finishes. - Native AMD GPU backend. A first-class HIP/ROCm backend in @qvac/vla-ggml, auto-selected over Vulkan with clean fallback when ROCm is absent. - A second local coding agent. OpenClaw joins OpenCode for local, cloud-free agent workflows. AGENTS - OpenCode plugin update (@qvac/opencode-plugin). Aligned with the current SDK, CLI, and AI SDK provider packages. A fresh install runs OpenCode against managed local QVAC models out of the box, from the default qvac/qwen3.5-9b, with no manual qvac serve setup. - OpenClaw plugin (@qvac/openclaw-plugin). A second coding-agent integration alongside OpenCode. A fresh setup installs the plugin, creates a local qvac provider through onboarding, and runs a QVAC model through OpenClaw🦞's local service path. LANGUAGE MODELS - Prompt batching (LLM addon). Batch multiple prompts in one job and run them concurrently, each answer returns the moment its generation finishes, no waiting on the others. - Reasoning-context trimming on hybrid + recurrent models (@qvac/llm-llamacpp). remove_thinking_from_context now works beyond pure-attention models. Same JS API, no throw. VOICE AND SPEECH - Transcription (transcription-parakeet 0.9.0). More robust CPU fallback on GPU failure and a faster Vulkan backend on Pixel 9. - Text-to-speech features (tts-ggml 0.4.0). Adds LavaSR for noise removal and adjustable output frequency up to 48 kHz, plus Japanese via Chatterbox. - Text-to-speech fixes (tts-ggml 0.4.1). CPU fallback on GPU failure, a q8_0 KV crash fix on Metal with Chatterbox. VISION - Qwen3.5 vision encoder on GPU (Android). Image encoder moves onto the phone GPU, with a smarter tile-grid preprocessor and default image-token caps, for flagship Android: Vulkan on Mali (Pixel 9 Pro) and OpenCL on Adreno 830 (Galaxy S25). - Gemma-4 vision encoder on GPU (Android). Vision encoder runs on the phone GPU instead of CPU, same flagship Android targets. PLATFORM AND PERFORMANCE - AMD GPU backend (@qvac/vla-ggml). Native HIP/ROCm backend, auto-selected over Vulkan with clean fallback when ROCm is absent (Linux x64 only). Comes with ~23% faster than Vulkan, ~14% faster than PyTorch-ROCm, parity preserved. Unified code style. A cleaner, more consistent, easier-to-contribute codebase. Let's build. npm install @qvac/sdk

QVAC

29,251,774 görüntüleme • 15 gün önce

QVAC Health 1.1.0 is officially live! 🏥✨ Your wellness data belongs to you, not the cloud. This latest update, powered by the upgraded QVAC SDK 0.8.0, brings significant performance gains and local-first features to your sovereign health dashboard. What’s New: Calorie Tracking: Log meals and monitor intake directly on-device. Advanced Biomarkers: Weight tracking now includes automatic BMI calculations. Improved Vitals: Organized dashboard and critical fixes for Apple Watch blood oxygen data. Total Privacy: Faster performance with 100% local, encrypted processing. Update today and experience health insights without the surveillance. Build the future:

QVAC Health 1.1.0 is officially live! 🏥✨ Your wellness data belongs to you, not the cloud. This latest update, powered by the upgraded QVAC SDK 0.8.0, brings significant performance gains and local-first features to your sovereign health dashboard. What’s New: Calorie Tracking: Log meals and monitor intake directly on-device. Advanced Biomarkers: Weight tracking now includes automatic BMI calculations. Improved Vitals: Organized dashboard and critical fixes for Apple Watch blood oxygen data. Total Privacy: Faster performance with 100% local, encrypted processing. Update today and experience health insights without the surveillance. Build the future:

QVAC

20,289 görüntüleme • 3 ay önce

QVAC SDK 0.14.0 is live. This release makes the on-device stack faster on mobile, ships the developer-agent path, and takes local text-to-speech to 31 languages. Main highlights: - OpenCode and OpenClaw. The first official OpenCode plugin, plus a maintained OpenClaw compatibility path, both built on managed mode and qvac serve. Point a coding agent at a local model with far less setup and far fewer surprises. - Brain-computer interface transcription, on the SDK. Take recorded neural signal data and decode it into text, fully on-device, no cloud. Stream it in chunks through a simple API. In 0.14 it runs GPU-accelerated on iOS. - Text to Speech in 31 languages with our Supertonic3 upgrade. VOICE AND SPEECH - Supertonic3 multilingual TTS, 5 languages to 31. - Chatterbox and Supertonic now run on the Android GPU, with lower memory use (especially on iOS), quantized s3gen Chatterbox support, and a fix for Chatterbox occasionally emitting random speech. - Whisper transcription now runs on the iOS GPU. Parakeet runs on the Android GPU, with steadier real-time streaming. VISION AND OCR - VLM multi-tile batching: high-resolution Pan and Scan images are encoded in one pass instead of tile by tile, for faster vision throughput. - OCR on ggml (EasyOCR and DocTR) reaches full speed parity with the onnx path, across Metal, OpenCL, and Vulkan. PLATFORM AND RELIABILITY - Dynamic compute backends on Linux: one build picks the right backend at runtime, and opens the door to ROCm and CUDA support without per-backend builds. - Thinking tokens are kept out of the model context, so reasoning no longer fills the KV cache. SDK 0.14.0 is now leaner and faster to start. Let’s build.

QVAC SDK 0.14.0 is live. This release makes the on-device stack faster on mobile, ships the developer-agent path, and takes local text-to-speech to 31 languages. Main highlights: - OpenCode and OpenClaw. The first official OpenCode plugin, plus a maintained OpenClaw compatibility path, both built on managed mode and qvac serve. Point a coding agent at a local model with far less setup and far fewer surprises. - Brain-computer interface transcription, on the SDK. Take recorded neural signal data and decode it into text, fully on-device, no cloud. Stream it in chunks through a simple API. In 0.14 it runs GPU-accelerated on iOS. - Text to Speech in 31 languages with our Supertonic3 upgrade. VOICE AND SPEECH - Supertonic3 multilingual TTS, 5 languages to 31. - Chatterbox and Supertonic now run on the Android GPU, with lower memory use (especially on iOS), quantized s3gen Chatterbox support, and a fix for Chatterbox occasionally emitting random speech. - Whisper transcription now runs on the iOS GPU. Parakeet runs on the Android GPU, with steadier real-time streaming. VISION AND OCR - VLM multi-tile batching: high-resolution Pan and Scan images are encoded in one pass instead of tile by tile, for faster vision throughput. - OCR on ggml (EasyOCR and DocTR) reaches full speed parity with the onnx path, across Metal, OpenCL, and Vulkan. PLATFORM AND RELIABILITY - Dynamic compute backends on Linux: one build picks the right backend at runtime, and opens the door to ROCm and CUDA support without per-backend builds. - Thinking tokens are kept out of the model context, so reasoning no longer fills the KV cache. SDK 0.14.0 is now leaner and faster to start. Let’s build.

QVAC

23,973,950 görüntüleme • 29 gün önce