
Akshay ๐
@akshay_pachaar โข 273,971 subscribers
Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! โข Co-founder @dailydoseofds_โข BITS Pilani โข 3 Patents โข ex-AI Engineer @ LightningAI
Shorts
Videos

This is the DeepSeek moment for Voice AI. Chatterbox Turbo is an MIT-licensed voice model that beats ElevenLabs Turbo & Cartesia Sonic 3! - <150ms time-to-first-sound - Voice cloning from just 5-second audio - Paralinguistic tags for real human expression 100% open-source.
Akshay ๐467,359 ๆฌก่ง็ โข 5 ไธชๆๅ

Anthropic's most viral feature is now open-source! Until now, Anthropic's Generative UI capabilities only existed inside its own products. CopilotKit๐ช just shipped Open Generative UI, an open-source implementation of Claude Artifacts that works in any app. The agent generates HTML/SVG at runtime, and CopilotKit streams it token-by-token into a sandboxed iframe inside the app's chat. So the user can watch the UI assemble itself in real time, not after the full response is ready. The sandbox is fully isolated with no access to the parent app, the DOM, or user data. So if the agent hallucinates broken markup or unexpected JavaScript, nothing leaks outside the iframe. Under the hood, the agent does not select from pre-built components. Instead, it generates arbitrary visuals from scratch every time. The output is unconstrained by default, but you can shape it by defining prompt-based skills that teach the agent specific visual formats or guidelines. For instance, a skill prompt can guide the agent toward producing a Chart.js dashboard with proper axis labels and responsive sizing, or an interactive 3D model with rotation controls. The video below shows this in action, and the output quality you see actually comes from the skills layer. Open Generative UI runs on AG-UI, so it works out of the box with LangGraph, CrewAI, Mastra, Google ADK, AWS Strands, and more. It also ships with a standalone MCP server that plugs into Claude Code, Cursor, or any MCP-compatible client. And the entire stack is built on top of CopilotKit, the open-source frontend framework for agents and generative UI. 30k+ GitHub stars, with SDKs for React, Next.js, Angular, and Vue. I have shared the GitHub repo and a live playground in the replies!
Akshay ๐84,504 ๆฌก่ง็ โข 28 ๅคฉๅ

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.
Akshay ๐165,995 ๆฌก่ง็ โข 2 ไธชๆๅ

Software engineers are going to love this! I found an open-source error monitoring agent that scans production logs, finds the root cause, and sends a Slack message with full context before you even notice something broke. Cuts down production downtime by 95%! Check this:
Akshay ๐180,996 ๆฌก่ง็ โข 3 ไธชๆๅ

I rebuilt most of OpenClaw's core in a single workflow: - 25 blocks - 29 connections - Short + long-term memory - Multi-channel (Telegram + Slack) Didn't build it manually. Stack is fully open-source. Self-host, run local models, own it end-to-end. Full walkthrough: Chapters: 00:00 - Intro 01:00 - SimClaw in action: planning my day, finding meetings, sending email 04:05 - Long-term memory capability 05:52 - Inside the workflow: how it's wired 12:09 - The plot twist 12:50 - Building an entire workflow using a single prompt 15:42 - Why this is an OS for your AI workforce 17:00 - Try it yourself If you want to see the open-source stack that powers all of this, check out Sim on GitHub and drop a star if you find it useful:
Akshay ๐65,291 ๆฌก่ง็ โข 1 ไธชๆๅ

Claude Skills might be the biggest upgrade to AI agents so far! Some say it's even bigger than MCP. I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything. In this video, I'll share everything I've learned so far. It covers: > The core idea (skills as SOPs for agents) > Anatomy of a skill > Skills vs. MCP vs. Projects vs. Subagents > Building your own skill > Hands-on example Skills are the early signs of continual learning, and they can change how we work with agents forever! Here's everything you need to know:
Akshay ๐286,002 ๆฌก่ง็ โข 7 ไธชๆๅ

Make Claude Code 10x more powerful. Claude-Mem is a free plugin to persist memory across Claude sessions. It captures tool usage, so you always start where you left off. Endless Mode allows 95% token reduction & 20x more tool use before context exhaustion. 100% open-source.
Akshay ๐184,061 ๆฌก่ง็ โข 5 ไธชๆๅ

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:
Akshay ๐176,158 ๆฌก่ง็ โข 5 ไธชๆๅ

i decided to put together all my AI engineering posts in a single pdf. it covers: > LLM foundations > prompt engineering > fine-tuning > RAG > context engineering > AI agents > MCP > optimization > deployment > eval and observability 375+ pages. download link in next tweet!
Akshay ๐159,732 ๆฌก่ง็ โข 5 ไธชๆๅ

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.
Akshay ๐251,785 ๆฌก่ง็ โข 9 ไธชๆๅ

This is how you make your OpenClaw server invisible to the internet. (world's most SECURE OpenClaw deployment) The security fundamentals you learn in this video directly apply to any personal AI assistant or VPS setup. Enjoy! Chapters: 0:00 - Intro 1:00 - What we'll cover 1:58 - DigitalOcean Droplet setup + getting OpenClaw running 8:18 - Connecting your agent to Telegram 12:13 - Tailscale: making your server invisible to the internet 14:52 - Locking down SSH + creating a non-root user 19:39 - Firewall: blocking everything except Tailscale 21:17 - Summarising everything done so far 22:50 - Set up a secure tunnel: Your machine โ VPS 24:50 - Execution policies: going from chatbot to full agent 26:43 - Adding custom skills 31:03 - Use cases and going from 1 to 10 agents 31:52 - Outro
Akshay ๐81,513 ๆฌก่ง็ โข 2 ไธชๆๅ

Nothing beats open-source! MiniMax just dropped M2.1, and devs are calling it "Claude at 10% the cost." - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build an AI studio that turns any website into a podcast. 100% open-source.
Akshay ๐140,068 ๆฌก่ง็ โข 5 ไธชๆๅ

Vector DBs can't reason. Top-k similarity ranks chunks one at a time against a query. That's fine for single-hop fact lookups, and it breaks the moment a question needs information stitched across multiple chunks. That's what the FalkorDB GraphRAG-Bench results expose. The gap is widest on Complex Reasoning (83.61) and Contextual Summarization (85.08), the exact query types where retrieval needs to traverse relations between entities, not score chunks in isolation. Worth a closer look if your workload leans long-form. GraphRAG SDK is 100% open-source:
Akshay ๐34,729 ๆฌก่ง็ โข 1 ไธชๆๅ

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: ๐ซ No need for retraining or finetuning ๐ฏ Boosts success rate and precision ๐ค Motion-aware memory selection ๐ช Zero-shot performance on diverse datasets But that's not all: ๐ฌ Refines mask selection ๐ฎ Predicts object motion effectively ๐ Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k ๐ Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me โ Akshay ๐ โ๏ธ For more insights & tutorials on AI and Machine Learning.
Akshay ๐363,204 ๆฌก่ง็ โข 1 ๅนดๅ

Microsoft has launched a powerful new data analysis tool! Introducing Data Formulator, a 100% open-source LLM-powered, no-code tool that transforms data in a snap and creates stunning visualizations. Key features include: ๐ค AI-powered data transformation ๐ฑ๏ธ Interactive drag-and-drop UI for visualizations ๐ฌ Seamless blend of UI & natural language inputs But thatโs not all: You can even create charts beyond your initial dataset. Data Formulator automatically identifies extra computation needs, generates fields for you, and outputs the final visualization. Find the GitHub repo in the next tweet! _____ Find me โ Akshay ๐ โ๏ธ For more insights and tutorials on AI and Machine Learning.
Akshay ๐280,385 ๆฌก่ง็ โข 1 ๅนดๅ

What they don't tell you about vibe coding: โข Moltbook exposed 1.5M auth tokens. The owner hadn't written a single line of code. โข Tea App leaked 72,000 government IDs. The database was just open, no sophisticated hack needed. โข A researcher took control of a journalist's computer through her own vibe-coded game, without a single click. The code ran fine in all three cases, tests passed, reviews looked clean, and nothing raised a flag. That's the problem nobody is talking about. Teams are shipping faster than ever. AI writes the code. CI catches build failures. Tests catch regressions. Observability catches outages. But nobody is asking the one question that actually matters: What can an attacker do with this, right now? Because the bottleneck is no longer writing code. It's understanding what that code actually exposes once it's live. PR reviews miss auth edge cases. Unit tests don't probe broken access control. Staging environments don't simulate adversarial behavior. And business logic flaws look completely fine until someone decides to break them on purpose. Strix is an open-source tool that fills this gap. It reviews your running app the way an attacker would: - Crawls the app and maps every exposed route and flow - Probes abuse paths dynamically, not just at build time - Returns findings with proof-of-concepts and suggested fixes Strix was benchmarked against 200 real companies and open-source repos, where it found 600+ verified vulnerabilities including assigned CVEs. It's designed to fit into how modern teams already work. Run it before a release, after major changes, or continuously as the app evolves. If your team is shipping AI-generated code and you don't currently have a way to answer "what does this actually expose", it's worth looking at. GitHub link in the next tweet.
Akshay ๐52,284 ๆฌก่ง็ โข 2 ไธชๆๅ

Turn any workflow into an agent skill. I built a YC job finder, deployed it as MCP server & connected it to Claude Desktop. It finds matching roles & sends personalized application emails to the recruiter. If you can break a process into steps, this guide will help you automate it:
Akshay ๐61,658 ๆฌก่ง็ โข 2 ไธชๆๅ

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!
Akshay ๐45,100 ๆฌก่ง็ โข 2 ไธชๆๅ