
Akshay 🚀
@akshay_pachaar • 273,971 subscribers
Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
Shorts
Videos

This is the DeepSeek moment for Voice AI. Chatterbox Turbo is an MIT-licensed voice model that beats ElevenLabs Turbo & Cartesia Sonic 3! - <150ms time-to-first-sound - Voice cloning from just 5-second audio - Paralinguistic tags for real human expression 100% open-source.
Akshay 🚀467,359 просмотров • 5 месяцев назад

Anthropic's most viral feature is now open-source! Until now, Anthropic's Generative UI capabilities only existed inside its own products. CopilotKit🪁 just shipped Open Generative UI, an open-source implementation of Claude Artifacts that works in any app. The agent generates HTML/SVG at runtime, and CopilotKit streams it token-by-token into a sandboxed iframe inside the app's chat. So the user can watch the UI assemble itself in real time, not after the full response is ready. The sandbox is fully isolated with no access to the parent app, the DOM, or user data. So if the agent hallucinates broken markup or unexpected JavaScript, nothing leaks outside the iframe. Under the hood, the agent does not select from pre-built components. Instead, it generates arbitrary visuals from scratch every time. The output is unconstrained by default, but you can shape it by defining prompt-based skills that teach the agent specific visual formats or guidelines. For instance, a skill prompt can guide the agent toward producing a Chart.js dashboard with proper axis labels and responsive sizing, or an interactive 3D model with rotation controls. The video below shows this in action, and the output quality you see actually comes from the skills layer. Open Generative UI runs on AG-UI, so it works out of the box with LangGraph, CrewAI, Mastra, Google ADK, AWS Strands, and more. It also ships with a standalone MCP server that plugs into Claude Code, Cursor, or any MCP-compatible client. And the entire stack is built on top of CopilotKit, the open-source frontend framework for agents and generative UI. 30k+ GitHub stars, with SDKs for React, Next.js, Angular, and Vue. I have shared the GitHub repo and a live playground in the replies!
Akshay 🚀84,504 просмотров • 28 дней назад

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.
Akshay 🚀165,995 просмотров • 2 месяцев назад

Software engineers are going to love this! I found an open-source error monitoring agent that scans production logs, finds the root cause, and sends a Slack message with full context before you even notice something broke. Cuts down production downtime by 95%! Check this:
Akshay 🚀180,996 просмотров • 3 месяцев назад

I rebuilt most of OpenClaw's core in a single workflow: - 25 blocks - 29 connections - Short + long-term memory - Multi-channel (Telegram + Slack) Didn't build it manually. Stack is fully open-source. Self-host, run local models, own it end-to-end. Full walkthrough: Chapters: 00:00 - Intro 01:00 - SimClaw in action: planning my day, finding meetings, sending email 04:05 - Long-term memory capability 05:52 - Inside the workflow: how it's wired 12:09 - The plot twist 12:50 - Building an entire workflow using a single prompt 15:42 - Why this is an OS for your AI workforce 17:00 - Try it yourself If you want to see the open-source stack that powers all of this, check out Sim on GitHub and drop a star if you find it useful:
Akshay 🚀65,291 просмотров • 1 месяц назад

Claude Skills might be the biggest upgrade to AI agents so far! Some say it's even bigger than MCP. I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything. In this video, I'll share everything I've learned so far. It covers: > The core idea (skills as SOPs for agents) > Anatomy of a skill > Skills vs. MCP vs. Projects vs. Subagents > Building your own skill > Hands-on example Skills are the early signs of continual learning, and they can change how we work with agents forever! Here's everything you need to know:
Akshay 🚀286,002 просмотров • 7 месяцев назад

Make Claude Code 10x more powerful. Claude-Mem is a free plugin to persist memory across Claude sessions. It captures tool usage, so you always start where you left off. Endless Mode allows 95% token reduction & 20x more tool use before context exhaustion. 100% open-source.
Akshay 🚀184,061 просмотров • 5 месяцев назад

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:
Akshay 🚀176,158 просмотров • 5 месяцев назад

i decided to put together all my AI engineering posts in a single pdf. it covers: > LLM foundations > prompt engineering > fine-tuning > RAG > context engineering > AI agents > MCP > optimization > deployment > eval and observability 375+ pages. download link in next tweet!
Akshay 🚀159,732 просмотров • 5 месяцев назад

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.
Akshay 🚀251,785 просмотров • 9 месяцев назад

This is how you make your OpenClaw server invisible to the internet. (world's most SECURE OpenClaw deployment) The security fundamentals you learn in this video directly apply to any personal AI assistant or VPS setup. Enjoy! Chapters: 0:00 - Intro 1:00 - What we'll cover 1:58 - DigitalOcean Droplet setup + getting OpenClaw running 8:18 - Connecting your agent to Telegram 12:13 - Tailscale: making your server invisible to the internet 14:52 - Locking down SSH + creating a non-root user 19:39 - Firewall: blocking everything except Tailscale 21:17 - Summarising everything done so far 22:50 - Set up a secure tunnel: Your machine → VPS 24:50 - Execution policies: going from chatbot to full agent 26:43 - Adding custom skills 31:03 - Use cases and going from 1 to 10 agents 31:52 - Outro
Akshay 🚀81,513 просмотров • 2 месяцев назад

Nothing beats open-source! MiniMax just dropped M2.1, and devs are calling it "Claude at 10% the cost." - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build an AI studio that turns any website into a podcast. 100% open-source.
Akshay 🚀140,068 просмотров • 5 месяцев назад

Vector DBs can't reason. Top-k similarity ranks chunks one at a time against a query. That's fine for single-hop fact lookups, and it breaks the moment a question needs information stitched across multiple chunks. That's what the FalkorDB GraphRAG-Bench results expose. The gap is widest on Complex Reasoning (83.61) and Contextual Summarization (85.08), the exact query types where retrieval needs to traverse relations between entities, not score chunks in isolation. Worth a closer look if your workload leans long-form. GraphRAG SDK is 100% open-source:
Akshay 🚀34,729 просмотров • 1 месяц назад

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.
Akshay 🚀363,204 просмотров • 1 год назад

Microsoft has launched a powerful new data analysis tool! Introducing Data Formulator, a 100% open-source LLM-powered, no-code tool that transforms data in a snap and creates stunning visualizations. Key features include: 🤖 AI-powered data transformation 🖱️ Interactive drag-and-drop UI for visualizations 💬 Seamless blend of UI & natural language inputs But that’s not all: You can even create charts beyond your initial dataset. Data Formulator automatically identifies extra computation needs, generates fields for you, and outputs the final visualization. Find the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials on AI and Machine Learning.
Akshay 🚀280,385 просмотров • 1 год назад

What they don't tell you about vibe coding: • Moltbook exposed 1.5M auth tokens. The owner hadn't written a single line of code. • Tea App leaked 72,000 government IDs. The database was just open, no sophisticated hack needed. • A researcher took control of a journalist's computer through her own vibe-coded game, without a single click. The code ran fine in all three cases, tests passed, reviews looked clean, and nothing raised a flag. That's the problem nobody is talking about. Teams are shipping faster than ever. AI writes the code. CI catches build failures. Tests catch regressions. Observability catches outages. But nobody is asking the one question that actually matters: What can an attacker do with this, right now? Because the bottleneck is no longer writing code. It's understanding what that code actually exposes once it's live. PR reviews miss auth edge cases. Unit tests don't probe broken access control. Staging environments don't simulate adversarial behavior. And business logic flaws look completely fine until someone decides to break them on purpose. Strix is an open-source tool that fills this gap. It reviews your running app the way an attacker would: - Crawls the app and maps every exposed route and flow - Probes abuse paths dynamically, not just at build time - Returns findings with proof-of-concepts and suggested fixes Strix was benchmarked against 200 real companies and open-source repos, where it found 600+ verified vulnerabilities including assigned CVEs. It's designed to fit into how modern teams already work. Run it before a release, after major changes, or continuously as the app evolves. If your team is shipping AI-generated code and you don't currently have a way to answer "what does this actually expose", it's worth looking at. GitHub link in the next tweet.
Akshay 🚀52,284 просмотров • 2 месяцев назад

Turn any workflow into an agent skill. I built a YC job finder, deployed it as MCP server & connected it to Claude Desktop. It finds matching roles & sends personalized application emails to the recruiter. If you can break a process into steps, this guide will help you automate it:
Akshay 🚀61,658 просмотров • 2 месяцев назад

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!
Akshay 🚀45,100 просмотров • 2 месяцев назад