Avi Chawla's banner
Avi Chawla's profile picture

Avi Chawla

@_avichawla69,368 subscribers

Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI

Shorts

A simple technique makes RAG 32x memory efficient! - Perplexity uses it in its search index - Azure uses it in its search pipeline - HubSpot uses it in its AI assistant (learn how it works below, with code)

A simple technique makes RAG 32x memory efficient! - Perplexity uses it in its search index - Azure uses it in its search pipeline - HubSpot uses it in its AI assistant (learn how it works below, with code)

84,940 Aufrufe

Everyone is sleeping on this new OCR model! Datalab's Chandra topped independent benchmarks and beat the previous best dots-ocr. - Supports 40+ languages - Extracts complex texts, tables, formulas easily I tested on Ramanujan's handwritten letter from 1913. 100% open-source.

Everyone is sleeping on this new OCR model! Datalab's Chandra topped independent benchmarks and beat the previous best dots-ocr. - Supports 40+ languages - Extracts complex texts, tables, formulas easily I tested on Ramanujan's handwritten letter from 1913. 100% open-source.

158,212 Aufrufe

Everyone is sleeping on MiniMax's new LLM! Devs are calling it "Claude at 10% the cost" - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build a stock analyst that generates code, executes it & returns insights. 100% open-source!

Everyone is sleeping on MiniMax's new LLM! Devs are calling it "Claude at 10% the cost" - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build a stock analyst that generates code, executes it & returns insights. 100% open-source!

117,404 Aufrufe

Wow!! You can now scrape ANY website by just writing a prompt. Using 's /extract endpoint, just describe what you want to extract in a prompt. This produces LLM-ready structured output. No more hard coding!

Sensitive content

Wow!! You can now scrape ANY website by just writing a prompt. Using 's /extract endpoint, just describe what you want to extract in a prompt. This produces LLM-ready structured output. No more hard coding!

250,352 Aufrufe

A RAG engine for deep document understanding! RAGFlow lets you build enterprise-grade RAG workflows on complex docs with well-founded citations. Supports multimodal data understanding, web search, deep research, etc. 100% local & open-source with 55k+ stars!

A RAG engine for deep document understanding! RAGFlow lets you build enterprise-grade RAG workflows on complex docs with well-founded citations. Supports multimodal data understanding, web search, deep research, etc. 100% local & open-source with 55k+ stars!

163,659 Aufrufe

Finally! A RAG over code solution that actually works (open-source). Naive chunking used in RAG isn't suited for code. This is because codebases have long-range dependencies, cross-file references, etc., that independent text chunks just can't capture. Graph-Code is a graph-driven RAG system that solves this. It analyzes the Python codebase and builds knowledge graphs to enable natural language querying. Key features: - Deep code parsing to extract classes, functions, and relationships. - Uses Memgraph to store the codebase as a graph. - Parses pyproject to understand external dependencies. - Retrieves actual source code snippets for found functions. Find the repo in the replies!

Finally! A RAG over code solution that actually works (open-source). Naive chunking used in RAG isn't suited for code. This is because codebases have long-range dependencies, cross-file references, etc., that independent text chunks just can't capture. Graph-Code is a graph-driven RAG system that solves this. It analyzes the Python codebase and builds knowledge graphs to enable natural language querying. Key features: - Deep code parsing to extract classes, functions, and relationships. - Uses Memgraph to store the codebase as a graph. - Parses pyproject to understand external dependencies. - Retrieves actual source code snippets for found functions. Find the repo in the replies!

121,819 Aufrufe

Figma canvas to build AI agent workflows. Sim is a lightweight, user-friendly platform for building AI agent workflows in minutes. It natively supports all major LLMs, Vector DBs, etc. 100% open-source with 7k+ stars!

Figma canvas to build AI agent workflows. Sim is a lightweight, user-friendly platform for building AI agent workflows in minutes. It natively supports all major LLMs, Vector DBs, etc. 100% open-source with 7k+ stars!

79,322 Aufrufe

I just put together all my AI Agents posts in a single PDF. It covers: - Agent fundamentals - LLM vs RAG vs Agents - Agentic design patterns - Building Blocks of Agents - Building custom tools via MCP - 12 hands-on projects for AI Engineers Download link in next tweet.

I just put together all my AI Agents posts in a single PDF. It covers: - Agent fundamentals - LLM vs RAG vs Agents - Agentic design patterns - Building Blocks of Agents - Building custom tools via MCP - 12 hands-on projects for AI Engineers Download link in next tweet.

65,109 Aufrufe

Check this!! A 100% open-source toolkit to work with LLMs. Transformer Lab is an app to experiment with LLMs: - Train, fine-tune, or chat. - One-click LLM download (DeepSeek, Gemma, etc.) - Drag-n-drop UI for RAG. - Built-in logging, and more. 100% local!

Check this!! A 100% open-source toolkit to work with LLMs. Transformer Lab is an app to experiment with LLMs: - Train, fine-tune, or chat. - One-click LLM download (DeepSeek, Gemma, etc.) - Drag-n-drop UI for RAG. - Built-in logging, and more. 100% local!

74,778 Aufrufe

Deploy and run LLMs directly on your phone! Unsloth now lets you fine-tune LLMs and deploy them 100% locally on iOS/Android devices. The video shows this in action, where I ran Qwen3 on an iPhone 17 Pro at ~25 tokens/s. I have shared a guide in the replies.

Deploy and run LLMs directly on your phone! Unsloth now lets you fine-tune LLMs and deploy them 100% locally on iOS/Android devices. The video shows this in action, where I ran Qwen3 on an iPhone 17 Pro at ~25 tokens/s. I have shared a guide in the replies.

24,530 Aufrufe

Postman's AI-readiness Playbook is one of the most important documents you can read today as a developer! We are headed into an era where every website must be "Agent-ready". - Agents will make purchases, not humans. - Agents will find the best options, not humans. - Agents will fill out job applications, not humans. The same applies to APIs. While human devs can hustle through poor docs and broken endpoints, most Agents can’t (yet). They need: - Predictable structures - Machine-readable metadata - Standardized behavior Postman's 90-day AI readiness playbook details how to turn your APIs into reliable, AI-ready tools. My two biggest takeaways from the Playbook: 1) Automatic documentation (Week 3): Once you standardize your API format, Postman’s Spec Hub automatically generates and validates API docs for both humans and AI agents without any manual work. 2) Seamless AI tooling (Week 9): Turn your validated specs into hosted, function-style endpoints, letting AI agents invoke your APIs like native commands. Find the link to the Playbook in the comments. Thanks to the Postman team for partnering on today's post!

Postman's AI-readiness Playbook is one of the most important documents you can read today as a developer! We are headed into an era where every website must be "Agent-ready". - Agents will make purchases, not humans. - Agents will find the best options, not humans. - Agents will fill out job applications, not humans. The same applies to APIs. While human devs can hustle through poor docs and broken endpoints, most Agents can’t (yet). They need: - Predictable structures - Machine-readable metadata - Standardized behavior Postman's 90-day AI readiness playbook details how to turn your APIs into reliable, AI-ready tools. My two biggest takeaways from the Playbook: 1) Automatic documentation (Week 3): Once you standardize your API format, Postman’s Spec Hub automatically generates and validates API docs for both humans and AI agents without any manual work. 2) Seamless AI tooling (Week 9): Turn your validated specs into hosted, function-style endpoints, letting AI agents invoke your APIs like native commands. Find the link to the Playbook in the comments. Thanks to the Postman team for partnering on today's post!

22,512 Aufrufe

I just created my own LaTeX-OCR app using Llama 3.2 Vision! Upload the LaTeX code as an image, and it gives you the corresponding LaTeX code using Llama 3.2 multimodal! Here's what I used: - Ollama for serving Llama 3.2 vision locally - Streamlit for the UI Everything is just 50 lines of code! Find the code in the next tweet. -- Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs. Find me → Avi Chawla

I just created my own LaTeX-OCR app using Llama 3.2 Vision! Upload the LaTeX code as an image, and it gives you the corresponding LaTeX code using Llama 3.2 multimodal! Here's what I used: - Ollama for serving Llama 3.2 vision locally - Streamlit for the UI Everything is just 50 lines of code! Find the code in the next tweet. -- Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs. Find me → Avi Chawla

15,608 Aufrufe

Videos

_avichawla's profile picture

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.

Avi Chawla

345,818 Aufrufe • vor 14 Tagen

_avichawla's profile picture

Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: Traditional RAG chunks documents, embeds them into vectors, and retrieves based on semantic similarity. But similarity ≠ relevance. When you ask "What were the debt trends in 2023?", a vector search returns chunks that look similar. But the actual answer might be buried in some Appendix, referenced on some page, in a section that shares zero semantic overlap with your query. Traditional RAG would likely never find it. PageIndex (open-source) solves this. Instead of chunking and embedding, PageIndex builds a hierarchical tree structure from your documents, like an intelligent table of contents. Then it uses reasoning to traverse that tree. For instance, the model doesn't ask: "What text looks similar to this query?" Instead, it asks: "Based on this document's structure, where would a human expert look for this answer?" That's a fundamentally different approach with: - No arbitrary chunking that breaks context. - No vector DB infrastructure to maintain. - Traceable retrieval to see exactly why it chose a specific section. - The ability to see in-document references ("see Table 5.3") the way a human would. But here's the deeper issue that it solves. Vector search treats every query as independent. But documents have structure and logic, like sections that reference other sections and context that builds across pages. PageIndex respects that structure instead of flattening it into embeddings. Do note that this approach may not make sense in every use case since traditional vector search is still fast, simple, and works well for many applications. But for professional documents that require domain expertise and multi-step reasoning, this tree-based, reasoning-first approach shines. For instance, PageIndex achieved 98.7% accuracy on FinanceBench, significantly outperforming traditional vector-based RAG systems on complex financial document analysis. Everything is fully open-source, so you can see the full implementation in GitHub and try it yourself. I have shared the GitHub repo in the replies!

Avi Chawla

970,893 Aufrufe • vor 4 Monaten

_avichawla's profile picture

Researchers found a way to make LLMs 8.5x faster! (without compromising accuracy) Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference. A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass. If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding. But current drafters in Speculative decoding still guess one token at a time. That makes the drafting step itself a bottleneck, capping real-world speedups at 2-3x. DFlash is a new technique that swaps the autoregressive drafter with a lightweight block diffusion model that guesses all tokens in one parallel shot. Drafting cost stays flat no matter how many tokens you speculate. On top of that, the drafter is conditioned on hidden features pulled from multiple layers of the target model and injected into every draft layer, so it makes significantly better guesses than a drafter working from scratch. In the side-by-side demo below, vanilla decoding runs at 48.5 tokens/sec. DFlash hits 415 tokens/sec on the same model, with zero quality loss. It's already integrated with vLLM, SGLang, and Transformers, with draft models on HuggingFace for several models like Qwen3, Qwen3.5, Llama 3.1, Kimi-K2.5, gpt-oss, and many more. I have shared the GitHub repo in the replies! KV caching is another must-know technique to boost LLM inference. I recently wrote an article about it. Read it below. 👉 Over to you: What use case are you working on that can benefit from this new technique?

Avi Chawla

155,880 Aufrufe • vor 25 Tagen

_avichawla's profile picture

OpenClaw meets RL! OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change. OpenClaw-RL solves this! It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL. The architecture is fully async. This means serving, reward scoring, and training all run in parallel. Once done, weights get hot-swapped after every batch while the agent keeps responding. Currently, it has two training modes: - Binary RL (GRPO): A process reward model scores each turn as good, bad, or neutral. That scalar reward drives policy updates via a PPO-style clipped objective. - On-Policy Distillation: When concrete corrections come in like "you should have checked that file first," it uses that feedback as a richer, directional training signal at the token level. When to use OpenClaw-RL? To be fair, a lot of agent behavior can already be improved through better memory and skill design. OpenClaw's existing skill ecosystem and community-built self-improvement skills handle a wide range of use cases without touching model weights at all. If the agent keeps forgetting preferences, that's a memory problem. And if it doesn't know how to handle a specific workflow, that's a skill problem. Both are solvable at the prompt and context layer. Where RL becomes interesting is when the failure pattern lives deeper in the model's reasoning itself. Things like consistently poor tool selection order, weak multi-step planning, or failing to interpret ambiguous instructions the way a specific user intends. Research on agentic RL (like ARTIST and Agent-R1) has shown that these behavioral patterns hit a ceiling with prompt-based approaches alone, especially in complex multi-turn tasks where the model needs to recover from tool failures or adapt its strategy mid-execution. That's the layer OpenClaw-RL targets, and it's a meaningful distinction from what OpenClaw offers. I have shared the repo in the replies!

Avi Chawla

138,182 Aufrufe • vor 2 Monaten

_avichawla's profile picture

Pentesting firms don't want you to see this. An open-source AI agent just replicated their $50k service. A "normal" pentest today looks like this: - $20k-$50k per engagement - 4-6 weeks of scoping, NDAs, kickoff calls - A big PDF that's outdated the moment you ship a new feature Meanwhile, AI agents are quietly starting to perform on-par with human pentester on the stuff that actually matters day-to-day: ↳ Enumerating attack surface ↳ Fuzzing endpoints ↳ Chaining simple vulns into real impact ↳ Producing PoCs and remediation steps developers can actually use And they do it in hours instead of weeks and at a fraction of the cost. This approach is actually implemented in Strix, a recently-trending open-source framework (14k+ stars) for AI pentesting agent. The framework spins up a team of AI "attackers" that probe your web apps, APIs, and code. It then returns validated findings with exploit evidence, remediation steps, and a full PDF report that looks exactly like what you'd get from a traditional firm, but without a $50k invoice and a month-long wait time. You can see the full implementation on GitHub and try it yourself. Just run: `strix --target https: //your-app .com` and you are good to go. Human red teams aren't disappearing but the routine pentest (pre-launch, post-refactor, quarterly checks) is clearly shifting to AI. Strix is one of the first tools that makes that shift feel real instead of hypothetical. I've shared the GitHub repo in the replies.

Avi Chawla

223,841 Aufrufe • vor 6 Monaten

_avichawla's profile picture

Big moment for Postgres! AI coding tools have been surprisingly bad at writing Postgres code. Not because the models are dumb, but because of how they learned SQL in the first place. LLMs are trained on the internet, which is full of outdated Stack Overflow answers and quick-fix tutorials. So when you ask an AI to generate a schema, it gives you something that technically runs but misses decades of Postgres evolution, like: - No GENERATED ALWAYS AS IDENTITY (added in PG10) - No expression or partial indexes - No NULLS NOT DISTINCT (PG15) - Missing CHECK constraints and proper foreign keys - Generic naming that tells you nothing But this is actually a solvable problem. You can teach AI tools to write better Postgres by giving them access to the right documentation at inference time. This exact solution is actually implemented in the newly released pg-aiguide by Tiger Data - Creators of TimescaleDB, which is an open-source MCP server that provides coding tools access to 35 years of Postgres expertise. In a gist, the MCP server enables: - Semantic search over the official PostgreSQL manual (version-aware, so it knows PG14 vs PG17 differences) - Curated skills with opinionated best practices for schema design, indexing, and constraints. I ran an experiment with Claude Code to see how well this works, and worked with the team to put this together. Prompt: "Generate a schema for an e-commerce site twice, one with the MCP server disabled, one with it enabled. Finally, run an assessment to compare the generated schemas." The run with the MCP server led to: - 420% more indexes (including partial and expression indexes) - 235% more constraints - 60% more tables (proper normalization) - 11 automation functions and triggers - Modern PG17 patterns throughout The MCP-assisted schema had proper data integrity, performance optimizations baked in, and followed naming conventions that actually make sense in production. pg-aiguide works with Claude Code, Cursor, VS Code, and any MCP-compatible tool. It's free and fully open source. I have shared the repo in the replies!

Avi Chawla

186,381 Aufrufe • vor 5 Monaten