Akshay 🚀's banner

Akshay 🚀

@akshay_pachaar • 273,971 subscribers

Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI

Shorts

Hermes meets SuperGrok! xAI just made every SuperGrok subscription work inside Hermes Agent. One browser login, no API key, no separate billing. And it doesn't just unlock text chat with Grok 4.3. The same OAuth token gives the agent access to: → Grok Text-to-Speech for spoken responses → Grok Imagine for image and video generation → x_search for real-time X/Twitter search I just added a new X Research Agent profile to my Hermes. Now my agent watches X while I ship. Setup takes about 60 seconds: Available on every SuperGrok tier, no restrictions. I wrote a full deep dive covering Hermes agent's architecture, memory system, self-evolving skills, GEPA optimization, and setting up multiple specialized agents The article is quoted below.

142,907 次观看

Real-time object detection will never be the same. Traditional YOLO needs NMS to remove duplicate boxes; it's slow and inconsistent. YOLO26 skips it entirely: single-pass predictions, faster inference and up to 300 detections per image. Download model:

Real-time object detection will never be the same. Traditional YOLO needs NMS to remove duplicate boxes; it's slow and inconsistent. YOLO26 skips it entirely: single-pass predictions, faster inference and up to 300 detections per image. Download model:

373,358 次观看

Big moment for Postgres! Search has always been Postgres' weak spot, and everyone just accepted it. If you needed a real relevance-ranked keyword search, the default answer was to spin up Elasticsearch or add Algolia and deal with the data sync headaches forever. The problem isn't that Postgres can't do text search. It can. But the built-in `ts_rank` function uses a basic term frequency algorithm that doesn't come close to what modern search engines deliver. So teams end up: - Running a separate Elasticsearch cluster just for search - Building sync pipelines that inevitably drift out of consistency - Paying for managed search services that charge per query - Accepting mediocre search relevance because "good enough" ships faster But this is actually a solvable problem. You can realistically bring industry-standard search ranking directly into Postgres, which eliminates the need for external infra entirely. This exact solution is now available with the newly open-sourced pg_textsearch by Tiger Data - Creators of TimescaleDB, a Postgres extension that brings true BM25 relevance ranking into the database. BM25 is the algorithm behind Elasticsearch, Lucene, and most modern search engines. Now it runs natively in Postgres. Here's what pg_textsearch enables: - True BM25 ranking with configurable parameters (the same algorithm powering production search systems) - Simple SQL syntax: `ORDER BY content 'search terms'` - Works with Postgres text search configurations for multiple languages - Pairs naturally with pgvector for hybrid keyword + semantic search That last point matters a lot for RAG apps. The video below shows this in action, and I worked with the team to put this together. You can now do hybrid retrieval (combining keyword matching with vector similarity) in a single database, without stitching together multiple systems. The syntax is clean enough that you can add relevance-ranked search to existing queries in minutes. pg_textsearch is fully open-source under the PostgreSQL license. You can find a link to their GitHub repo in the next tweet.

Big moment for Postgres! Search has always been Postgres' weak spot, and everyone just accepted it. If you needed a real relevance-ranked keyword search, the default answer was to spin up Elasticsearch or add Algolia and deal with the data sync headaches forever. The problem isn't that Postgres can't do text search. It can. But the built-in `ts_rank` function uses a basic term frequency algorithm that doesn't come close to what modern search engines deliver. So teams end up: - Running a separate Elasticsearch cluster just for search - Building sync pipelines that inevitably drift out of consistency - Paying for managed search services that charge per query - Accepting mediocre search relevance because "good enough" ships faster But this is actually a solvable problem. You can realistically bring industry-standard search ranking directly into Postgres, which eliminates the need for external infra entirely. This exact solution is now available with the newly open-sourced pg_textsearch by Tiger Data - Creators of TimescaleDB, a Postgres extension that brings true BM25 relevance ranking into the database. BM25 is the algorithm behind Elasticsearch, Lucene, and most modern search engines. Now it runs natively in Postgres. Here's what pg_textsearch enables: - True BM25 ranking with configurable parameters (the same algorithm powering production search systems) - Simple SQL syntax: `ORDER BY content 'search terms'` - Works with Postgres text search configurations for multiple languages - Pairs naturally with pgvector for hybrid keyword + semantic search That last point matters a lot for RAG apps. The video below shows this in action, and I worked with the team to put this together. You can now do hybrid retrieval (combining keyword matching with vector similarity) in a single database, without stitching together multiple systems. The syntax is clean enough that you can add relevance-ranked search to existing queries in minutes. pg_textsearch is fully open-source under the PostgreSQL license. You can find a link to their GitHub repo in the next tweet.

215,043 次观看

Microsoft just changed the game! 🔥 They've open-sourced bitnet.cpp: a blazing-fast 1-bit LLM inference framework that runs directly on CPUs. Why is this a game-changer❓ You can now run 100B parameter models on local devices with up to 6x speed improvements and 82% less energy consumption—all without a GPU! The future we've been waiting for: fast, efficient, and private AI that works anytime, anywhere.✨ The model is also available on HuggingFace, you can try it right away! I've shared Link to the GitHub repo in next tweet, where you'll find all the details. _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials on AI and Machine Learning!

Sensitive content

Microsoft just changed the game! 🔥 They've open-sourced bitnet.cpp: a blazing-fast 1-bit LLM inference framework that runs directly on CPUs. Why is this a game-changer❓ You can now run 100B parameter models on local devices with up to 6x speed improvements and 82% less energy consumption—all without a GPU! The future we've been waiting for: fast, efficient, and private AI that works anytime, anywhere.✨ The model is also available on HuggingFace, you can try it right away! I've shared Link to the GitHub repo in next tweet, where you'll find all the details. _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials on AI and Machine Learning!

511,729 次观看

Neural network visualized!

Neural network visualized!

571,507 次观看

Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10× fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!

Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10× fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. ↳ Base model: 149% character error rate (CER) ↳ Fine-tuned model: 60% CER (57% more accurate) ↳ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!

125,939 次观看

Drag-and-drop UI to build AI agents! Langflow is a powerful visual tool for building and deploying AI-powered agents and workflows—without writing any code. Supports all major LLMs, vector DBs, etc. 100% open-source, 62k+ stars 🌟

Drag-and-drop UI to build AI agents! Langflow is a powerful visual tool for building and deploying AI-powered agents and workflows—without writing any code. Supports all major LLMs, vector DBs, etc. 100% open-source, 62k+ stars 🌟

207,932 次观看

AI agents can finally talk to your frontend! The AG-UI Protocol bridges the critical gap between AI agents and frontend apps, making human-agent collaboration seamless. MCP: Agents to tools A2A: Agents to agents AG-UI: Agents to users 100% open-source.

AI agents can finally talk to your frontend! The AG-UI Protocol bridges the critical gap between AI agents and frontend apps, making human-agent collaboration seamless. MCP: Agents to tools A2A: Agents to agents AG-UI: Agents to users 100% open-source.

188,684 次观看

Google released Gemma 3 270M, a new model for hyper-efficient local AI! We'll fine-tune this model and make it very smart at playing chess and predict the next move. Tech stack: - Unsloth AI for efficient fine-tuning. - Hugging Face transformers to run it locally. Let's go! 🚀

Google released Gemma 3 270M, a new model for hyper-efficient local AI! We'll fine-tune this model and make it very smart at playing chess and predict the next move. Tech stack: - Unsloth AI for efficient fine-tuning. - Hugging Face transformers to run it locally. Let's go! 🚀

145,567 次观看

OpenClaw, but built for normal people. Sim is an open-source platform that lets you build AI agent workflows on a drag-and-drop canvas. Connect them to channels like Telegram and WhatsApp and deploy without writing a single line of code. They also have a built-in Copilot that generates entire workflows from plain English, which you can then tweak and customize in the UI. Key features: - Free and open-source (Apache 2.0) - Vector store integration for RAG-grounded agents - Self-host with one command (`npx simstudio`) - Run fully local with Ollama, no API keys needed - Supports vLLM for production-grade self-hosted inference The thing I really like about Sim is the level of control you get. You can add conditional branching, parallel execution, human-in-the-loop approval gates, and even nest workflows inside other workflows. Everything is visible on the canvas, so you know exactly what your agent is doing at every step. And you can build a workflow in Sim, deploy it as an MCP server, and plug it into any agent, including OpenClaw. I've shared the link to Sim's GitHub repo in the next tweet.

OpenClaw, but built for normal people. Sim is an open-source platform that lets you build AI agent workflows on a drag-and-drop canvas. Connect them to channels like Telegram and WhatsApp and deploy without writing a single line of code. They also have a built-in Copilot that generates entire workflows from plain English, which you can then tweak and customize in the UI. Key features: - Free and open-source (Apache 2.0) - Vector store integration for RAG-grounded agents - Self-host with one command (`npx simstudio`) - Run fully local with Ollama, no API keys needed - Supports vLLM for production-grade self-hosted inference The thing I really like about Sim is the level of control you get. You can add conditional branching, parallel execution, human-in-the-loop approval gates, and even nest workflows inside other workflows. Everything is visible on the canvas, so you know exactly what your agent is doing at every step. And you can build a workflow in Sim, deploy it as an MCP server, and plug it into any agent, including OpenClaw. I've shared the link to Sim's GitHub repo in the next tweet.

52,291 次观看

Google just open-sourced LangExtract Python library! It uses LLMs to extract entities, attributes, and relations—with exact source grounding—from unstructured documents. Flexible LLM support (Gemini, OpenAI, Ollama) 100% open-source.

Google just open-sourced LangExtract Python library! It uses LLMs to extract entities, attributes, and relations—with exact source grounding—from unstructured documents. Flexible LLM support (Gemini, OpenAI, Ollama) 100% open-source.

120,343 次观看

K-Means is simple. Making it fast on GPU isn't. Flash-KMeans is an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves: - 30x speedup over cuML - 200x speedup over FAISS Using the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. Here's why this matters today: K-means has always been an offline primitive. Something you run once to preprocess data and move on. These speedups change that. ↳ Vector databases like FAISS use k-means to build search indices. Faster k-means means you can re-index dynamically as data changes, not batch it overnight. ↳ LLM quantization methods need k-means to find optimal weight codebooks, per layer, repeatedly. What takes hours could now take minutes. ↳ MoE models need fast token routing at inference time. Millisecond k-means makes it viable to run this inside the inference loop, not just in preprocessing. The 200x over FAISS is the number to internalize. FAISS is the industry standard. Most production vector search systems sit on top of it. Link to the paper and code in next tweet!

K-Means is simple. Making it fast on GPU isn't. Flash-KMeans is an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves: - 30x speedup over cuML - 200x speedup over FAISS Using the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. Here's why this matters today: K-means has always been an offline primitive. Something you run once to preprocess data and move on. These speedups change that. ↳ Vector databases like FAISS use k-means to build search indices. Faster k-means means you can re-index dynamically as data changes, not batch it overnight. ↳ LLM quantization methods need k-means to find optimal weight codebooks, per layer, repeatedly. What takes hours could now take minutes. ↳ MoE models need fast token routing at inference time. Millisecond k-means makes it viable to run this inside the inference loop, not just in preprocessing. The 200x over FAISS is the number to internalize. FAISS is the industry standard. Most production vector search systems sit on top of it. Link to the paper and code in next tweet!

36,317 次观看

Turn complex docs into clean, LLM-ready data! Every AI company I've talked to is solving the same problem: how do you build systems that don't hallucinate and back up every answer with proper citations? Tensorlake is a tool that extracts custom-defined structured data from any unstructured document in 3 steps: ↳ Define your schema ↳ Enable citations ↳ Extract You get RAG-ready data with precise citations and bounding boxes. Feed this to your LLM, and you'll generate responses that are citation-backed and fully auditable. This is the difference between a demo and a production system. When your AI can show exactly where it got its information, you move from proof-of-concept to something people can actually trust and deploy. I've shared the Tensorlake GitHub repo in the replies!

Turn complex docs into clean, LLM-ready data! Every AI company I've talked to is solving the same problem: how do you build systems that don't hallucinate and back up every answer with proper citations? Tensorlake is a tool that extracts custom-defined structured data from any unstructured document in 3 steps: ↳ Define your schema ↳ Enable citations ↳ Extract You get RAG-ready data with precise citations and bounding boxes. Feed this to your LLM, and you'll generate responses that are citation-backed and fully auditable. This is the difference between a demo and a production system. When your AI can show exactly where it got its information, you move from proof-of-concept to something people can actually trust and deploy. I've shared the Tensorlake GitHub repo in the replies!

58,117 次观看

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

Transformers & LLMs cheatsheets for Stanford's CME-295! Covering tokenization, self-attention, prompting, fine-tuning, LLM-as-a-judge, RAG, AI Agents, and reasoning models. 100% free and open-source.

101,561 次观看

Big moment for text-to-speech. Qwen just open-sourced a text-to-speech model that lets you clone voices, design new ones, and control speech using natural language. Let me explain what I mean: You can literally tell it "speak in a cheerful tone with slight nervousness," and it actually does that. No complex audio engineering needed. What makes this special: - 3-second voice cloning - Covers 10 languages: English, German, French, and more - Latency as low as 97ms for real-time applications - Supports both streaming and non-streaming generation The model comes in two sizes (0.6B and 1.7B parameters), so you can pick based on your hardware and quality needs. Three modes to work with: 1. Custom Voice: Use pre-built premium voices with instruction-based style control 2. Voice Design: Describe the voice you want in plain English (or Chinese), and the model creates it 3. Voice Clone: Provide a 3-second reference audio and clone that voice The best part? It integrates with vLLM for production deployment and has a simple Python package you can pip install. I've shared a link to the GitHub repo in the next tweet.

Big moment for text-to-speech. Qwen just open-sourced a text-to-speech model that lets you clone voices, design new ones, and control speech using natural language. Let me explain what I mean: You can literally tell it "speak in a cheerful tone with slight nervousness," and it actually does that. No complex audio engineering needed. What makes this special: - 3-second voice cloning - Covers 10 languages: English, German, French, and more - Latency as low as 97ms for real-time applications - Supports both streaming and non-streaming generation The model comes in two sizes (0.6B and 1.7B parameters), so you can pick based on your hardware and quality needs. Three modes to work with: 1. Custom Voice: Use pre-built premium voices with instruction-based style control 2. Voice Design: Describe the voice you want in plain English (or Chinese), and the model creates it 3. Voice Clone: Provide a 3-second reference audio and clone that voice The best part? It integrates with vLLM for production deployment and has a simple Python package you can pip install. I've shared a link to the GitHub repo in the next tweet.

31,072 次观看

100x faster alternative to Pandas! (It can even beat GPU dataframe libraries) While Pandas is the most popular DataFrame library, it has some major limitations: - Pandas only uses a single CPU core. - It often creates bulky DataFrames - Its eager (immediate) execution prevents global optimization Introducing FireDucks a highly optimized, drop-in replacement for Pandas. Just change one line of code and you're good to go: ↳ import fireducks.pandas as pd The video below shows a comparison of FireDucks with cuDF—a GPU DataFrame library. In this case, it's even faster than cuDF. That said, the query in the video has chaining operations and it uses all columns. By manually optimizing and working only with the necessary columns, the run-time decreased to: - Pandas: 14 seconds (down from 48 seconds) - FireDucks: 0.8 seconds (unchanged) [SIMILAR] - cuDF: 0.9 seconds (down from 2.6 seconds) This demonstrates that FireDucks' compiler automatically performs the same optimizations that you would need to explicitly implement in cuDF and Pandas. Most importantly, the optimization does not affect the final result. I've shared a link to the Colab notebook in the next tweet. _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials in AI and Machine Learning.

100x faster alternative to Pandas! (It can even beat GPU dataframe libraries) While Pandas is the most popular DataFrame library, it has some major limitations: - Pandas only uses a single CPU core. - It often creates bulky DataFrames - Its eager (immediate) execution prevents global optimization Introducing FireDucks a highly optimized, drop-in replacement for Pandas. Just change one line of code and you're good to go: ↳ import fireducks.pandas as pd The video below shows a comparison of FireDucks with cuDF—a GPU DataFrame library. In this case, it's even faster than cuDF. That said, the query in the video has chaining operations and it uses all columns. By manually optimizing and working only with the necessary columns, the run-time decreased to: - Pandas: 14 seconds (down from 48 seconds) - FireDucks: 0.8 seconds (unchanged) [SIMILAR] - cuDF: 0.9 seconds (down from 2.6 seconds) This demonstrates that FireDucks' compiler automatically performs the same optimizations that you would need to explicitly implement in cuDF and Pandas. Most importantly, the optimization does not affect the final result. I've shared a link to the Colab notebook in the next tweet. _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials in AI and Machine Learning.

79,631 次观看

OpenClaw meets RL! Most agents evolve via prompt tricks and markdown hacks. MetaClaw updates actual model weights from every failed interaction. Everything happens on the fly, without any dataset or code changes. GitHub:

OpenClaw meets RL! Most agents evolve via prompt tricks and markdown hacks. MetaClaw updates actual model weights from every failed interaction. Everything happens on the fly, without any dataset or code changes. GitHub:

17,228 次观看

Check this!! Now you can scrape ANY website by just writing a prompt. Using 's /extract endpoint, just describe what you want to extract in a prompt. This produces LLM-ready structured output. No more hard coding!

Check this!! Now you can scrape ANY website by just writing a prompt. Using 's /extract endpoint, just describe what you want to extract in a prompt. This produces LLM-ready structured output. No more hard coding!

64,158 次观看

Meta's SAM 3 is a beast. before: collect data, train custom object detector, use tracker to estimate object motion - days now: track anything with text prompt - seconds real-time object detection will never be the same.

Meta's SAM 3 is a beast. before: collect data, train custom object detector, use tracker to estimate object motion - days now: track anything with text prompt - seconds real-time object detection will never be the same.

31,386 次观看

Makes Pandas 20x Faster using FireDucks... ...by changing JUST ONE LINE of code. Pandas has a few limitations: - a single-core computation. - creates bulky DataFrames. - always follows an eager execution mode (every op triggers immediate computation), which is why it cannot prepare a smart execution plan that optimizes the entire sequence of operations. FireDucks is a heavily optimized alternative with exactly the same API as Pandas’ that addresses these. There are three ways to use it: 1) Load the extension: %𝐥𝐨𝐚𝐝_𝐞𝐱𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬; 𝗶𝗺𝗽𝗼𝗿𝘁 𝗽𝗮𝗻𝗱𝗮𝘀 𝗮𝘀 𝗽𝗱 2) Import FireDucks instead of Pandas: 𝐢𝐦𝐩𝐨𝐫𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬 𝐚𝐬 𝐩𝐝 3) If you have a Python script, execute is as follows: 𝗽𝘆𝘁𝗵𝗼𝗻3 -𝗺 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝗽𝗮𝗻𝗱𝗮𝘀 𝗰𝗼𝗱𝗲.𝗽𝘆 Done! ✅ Check this out👇

Makes Pandas 20x Faster using FireDucks... ...by changing JUST ONE LINE of code. Pandas has a few limitations: - a single-core computation. - creates bulky DataFrames. - always follows an eager execution mode (every op triggers immediate computation), which is why it cannot prepare a smart execution plan that optimizes the entire sequence of operations. FireDucks is a heavily optimized alternative with exactly the same API as Pandas’ that addresses these. There are three ways to use it: 1) Load the extension: %𝐥𝐨𝐚𝐝_𝐞𝐱𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬; 𝗶𝗺𝗽𝗼𝗿𝘁 𝗽𝗮𝗻𝗱𝗮𝘀 𝗮𝘀 𝗽𝗱 2) Import FireDucks instead of Pandas: 𝐢𝐦𝐩𝐨𝐫𝐭 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝐩𝐚𝐧𝐝𝐚𝐬 𝐚𝐬 𝐩𝐝 3) If you have a Python script, execute is as follows: 𝗽𝘆𝘁𝗵𝗼𝗻3 -𝗺 𝗳𝗶𝗿𝗲𝗱𝘂𝗰𝗸𝘀.𝗽𝗮𝗻𝗱𝗮𝘀 𝗰𝗼𝗱𝗲.𝗽𝘆 Done! ✅ Check this out👇

73,967 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

This is the DeepSeek moment for Voice AI. Chatterbox Turbo is an MIT-licensed voice model that beats ElevenLabs Turbo & Cartesia Sonic 3! - <150ms time-to-first-sound - Voice cloning from just 5-second audio - Paralinguistic tags for real human expression 100% open-source.

This is the DeepSeek moment for Voice AI. Chatterbox Turbo is an MIT-licensed voice model that beats ElevenLabs Turbo & Cartesia Sonic 3! - <150ms time-to-first-sound - Voice cloning from just 5-second audio - Paralinguistic tags for real human expression 100% open-source.

467,359 次观看 • 5 个月前

Anthropic's most viral feature is now open-source! Until now, Anthropic's Generative UI capabilities only existed inside its own products. CopilotKit🪁 just shipped Open Generative UI, an open-source implementation of Claude Artifacts that works in any app. The agent generates HTML/SVG at runtime, and CopilotKit streams it token-by-token into a sandboxed iframe inside the app's chat. So the user can watch the UI assemble itself in real time, not after the full response is ready. The sandbox is fully isolated with no access to the parent app, the DOM, or user data. So if the agent hallucinates broken markup or unexpected JavaScript, nothing leaks outside the iframe. Under the hood, the agent does not select from pre-built components. Instead, it generates arbitrary visuals from scratch every time. The output is unconstrained by default, but you can shape it by defining prompt-based skills that teach the agent specific visual formats or guidelines. For instance, a skill prompt can guide the agent toward producing a Chart.js dashboard with proper axis labels and responsive sizing, or an interactive 3D model with rotation controls. The video below shows this in action, and the output quality you see actually comes from the skills layer. Open Generative UI runs on AG-UI, so it works out of the box with LangGraph, CrewAI, Mastra, Google ADK, AWS Strands, and more. It also ships with a standalone MCP server that plugs into Claude Code, Cursor, or any MCP-compatible client. And the entire stack is built on top of CopilotKit, the open-source frontend framework for agents and generative UI. 30k+ GitHub stars, with SDKs for React, Next.js, Angular, and Vue. I have shared the GitHub repo and a live playground in the replies!

Anthropic's most viral feature is now open-source! Until now, Anthropic's Generative UI capabilities only existed inside its own products. CopilotKit🪁 just shipped Open Generative UI, an open-source implementation of Claude Artifacts that works in any app. The agent generates HTML/SVG at runtime, and CopilotKit streams it token-by-token into a sandboxed iframe inside the app's chat. So the user can watch the UI assemble itself in real time, not after the full response is ready. The sandbox is fully isolated with no access to the parent app, the DOM, or user data. So if the agent hallucinates broken markup or unexpected JavaScript, nothing leaks outside the iframe. Under the hood, the agent does not select from pre-built components. Instead, it generates arbitrary visuals from scratch every time. The output is unconstrained by default, but you can shape it by defining prompt-based skills that teach the agent specific visual formats or guidelines. For instance, a skill prompt can guide the agent toward producing a Chart.js dashboard with proper axis labels and responsive sizing, or an interactive 3D model with rotation controls. The video below shows this in action, and the output quality you see actually comes from the skills layer. Open Generative UI runs on AG-UI, so it works out of the box with LangGraph, CrewAI, Mastra, Google ADK, AWS Strands, and more. It also ships with a standalone MCP server that plugs into Claude Code, Cursor, or any MCP-compatible client. And the entire stack is built on top of CopilotKit, the open-source frontend framework for agents and generative UI. 30k+ GitHub stars, with SDKs for React, Next.js, Angular, and Vue. I have shared the GitHub repo and a live playground in the replies!

84,504 次观看 • 28 天前

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.

Everyone is sleeping on this new OCR model! - 85.9% (sota) on olmocr bench - 90+ language support w/benchmarks - 4B model (down from 9B) - Full layout information - Extracts + captions images and diagrams - Strong handwriting, math, form, table support 100% open-source.

165,995 次观看 • 2 个月前

Software engineers are going to love this! I found an open-source error monitoring agent that scans production logs, finds the root cause, and sends a Slack message with full context before you even notice something broke. Cuts down production downtime by 95%! Check this:

Software engineers are going to love this! I found an open-source error monitoring agent that scans production logs, finds the root cause, and sends a Slack message with full context before you even notice something broke. Cuts down production downtime by 95%! Check this:

180,996 次观看 • 3 个月前

I rebuilt most of OpenClaw's core in a single workflow: - 25 blocks - 29 connections - Short + long-term memory - Multi-channel (Telegram + Slack) Didn't build it manually. Stack is fully open-source. Self-host, run local models, own it end-to-end. Full walkthrough: Chapters: 00:00 - Intro 01:00 - SimClaw in action: planning my day, finding meetings, sending email 04:05 - Long-term memory capability 05:52 - Inside the workflow: how it's wired 12:09 - The plot twist 12:50 - Building an entire workflow using a single prompt 15:42 - Why this is an OS for your AI workforce 17:00 - Try it yourself If you want to see the open-source stack that powers all of this, check out Sim on GitHub and drop a star if you find it useful:

I rebuilt most of OpenClaw's core in a single workflow: - 25 blocks - 29 connections - Short + long-term memory - Multi-channel (Telegram + Slack) Didn't build it manually. Stack is fully open-source. Self-host, run local models, own it end-to-end. Full walkthrough: Chapters: 00:00 - Intro 01:00 - SimClaw in action: planning my day, finding meetings, sending email 04:05 - Long-term memory capability 05:52 - Inside the workflow: how it's wired 12:09 - The plot twist 12:50 - Building an entire workflow using a single prompt 15:42 - Why this is an OS for your AI workforce 17:00 - Try it yourself If you want to see the open-source stack that powers all of this, check out Sim on GitHub and drop a star if you find it useful:

65,291 次观看 • 1 个月前

Claude Skills might be the biggest upgrade to AI agents so far! Some say it's even bigger than MCP. I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything. In this video, I'll share everything I've learned so far. It covers: > The core idea (skills as SOPs for agents) > Anatomy of a skill > Skills vs. MCP vs. Projects vs. Subagents > Building your own skill > Hands-on example Skills are the early signs of continual learning, and they can change how we work with agents forever! Here's everything you need to know:

Claude Skills might be the biggest upgrade to AI agents so far! Some say it's even bigger than MCP. I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything. In this video, I'll share everything I've learned so far. It covers: > The core idea (skills as SOPs for agents) > Anatomy of a skill > Skills vs. MCP vs. Projects vs. Subagents > Building your own skill > Hands-on example Skills are the early signs of continual learning, and they can change how we work with agents forever! Here's everything you need to know:

286,002 次观看 • 7 个月前

I just built my own OCR app powered by Google's Gemma 3. It runs 100% locally on my computer using Ollama and extracts text from images as structured markdown. The code is open-source, and you can find it in the next tweet.

I just built my own OCR app powered by Google's Gemma 3. It runs 100% locally on my computer using Ollama and extracts text from images as structured markdown. The code is open-source, and you can find it in the next tweet.

477,439 次观看 • 1 年前

Make Claude Code 10x more powerful. Claude-Mem is a free plugin to persist memory across Claude sessions. It captures tool usage, so you always start where you left off. Endless Mode allows 95% token reduction & 20x more tool use before context exhaustion. 100% open-source.

Make Claude Code 10x more powerful. Claude-Mem is a free plugin to persist memory across Claude sessions. It captures tool usage, so you always start where you left off. Endless Mode allows 95% token reduction & 20x more tool use before context exhaustion. 100% open-source.

184,061 次观看 • 5 个月前

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

176,158 次观看 • 5 个月前

i decided to put together all my AI engineering posts in a single pdf. it covers: > LLM foundations > prompt engineering > fine-tuning > RAG > context engineering > AI agents > MCP > optimization > deployment > eval and observability 375+ pages. download link in next tweet!

i decided to put together all my AI engineering posts in a single pdf. it covers: > LLM foundations > prompt engineering > fine-tuning > RAG > context engineering > AI agents > MCP > optimization > deployment > eval and observability 375+ pages. download link in next tweet!

159,732 次观看 • 5 个月前

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

Everyone is sleeping on this new OCR model! dots-ocr is a new 1.7B vision-language model that achieves SOTA performance on multilingual document parsing. - Supports 100+ languages - Works with both images and PDFs - Handles text, tables, formulas seamlessly 100% open-source.

251,785 次观看 • 9 个月前

This is how you make your OpenClaw server invisible to the internet. (world's most SECURE OpenClaw deployment) The security fundamentals you learn in this video directly apply to any personal AI assistant or VPS setup. Enjoy! Chapters: 0:00 - Intro 1:00 - What we'll cover 1:58 - DigitalOcean Droplet setup + getting OpenClaw running 8:18 - Connecting your agent to Telegram 12:13 - Tailscale: making your server invisible to the internet 14:52 - Locking down SSH + creating a non-root user 19:39 - Firewall: blocking everything except Tailscale 21:17 - Summarising everything done so far 22:50 - Set up a secure tunnel: Your machine → VPS 24:50 - Execution policies: going from chatbot to full agent 26:43 - Adding custom skills 31:03 - Use cases and going from 1 to 10 agents 31:52 - Outro

This is how you make your OpenClaw server invisible to the internet. (world's most SECURE OpenClaw deployment) The security fundamentals you learn in this video directly apply to any personal AI assistant or VPS setup. Enjoy! Chapters: 0:00 - Intro 1:00 - What we'll cover 1:58 - DigitalOcean Droplet setup + getting OpenClaw running 8:18 - Connecting your agent to Telegram 12:13 - Tailscale: making your server invisible to the internet 14:52 - Locking down SSH + creating a non-root user 19:39 - Firewall: blocking everything except Tailscale 21:17 - Summarising everything done so far 22:50 - Set up a secure tunnel: Your machine → VPS 24:50 - Execution policies: going from chatbot to full agent 26:43 - Adding custom skills 31:03 - Use cases and going from 1 to 10 agents 31:52 - Outro

81,513 次观看 • 2 个月前

Nothing beats open-source! MiniMax just dropped M2.1, and devs are calling it "Claude at 10% the cost." - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build an AI studio that turns any website into a podcast. 100% open-source.

Nothing beats open-source! MiniMax just dropped M2.1, and devs are calling it "Claude at 10% the cost." - 72.5% SWE-Multilingual. Beats Sonnet 4.5 - 88.6% VIBE-bench. Beats Gemini 3 Pro I used it to build an AI studio that turns any website into a podcast. 100% open-source.

140,068 次观看 • 5 个月前

Vector DBs can't reason. Top-k similarity ranks chunks one at a time against a query. That's fine for single-hop fact lookups, and it breaks the moment a question needs information stitched across multiple chunks. That's what the FalkorDB GraphRAG-Bench results expose. The gap is widest on Complex Reasoning (83.61) and Contextual Summarization (85.08), the exact query types where retrieval needs to traverse relations between entities, not score chunks in isolation. Worth a closer look if your workload leans long-form. GraphRAG SDK is 100% open-source:

Vector DBs can't reason. Top-k similarity ranks chunks one at a time against a query. That's fine for single-hop fact lookups, and it breaks the moment a question needs information stitched across multiple chunks. That's what the FalkorDB GraphRAG-Bench results expose. The gap is widest on Complex Reasoning (83.61) and Contextual Summarization (85.08), the exact query types where retrieval needs to traverse relations between entities, not score chunks in isolation. Worth a closer look if your workload leans long-form. GraphRAG SDK is 100% open-source:

34,729 次观看 • 1 个月前

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

363,204 次观看 • 1 年前

I decided to put together all my MCP posts in a single PDF. It covers: - The fundamentals of MCP - Explanations with visuals and code - 11 hands-on projects for AI engineers Download link in next tweet!

I decided to put together all my MCP posts in a single PDF. It covers: - The fundamentals of MCP - Explanations with visuals and code - 11 hands-on projects for AI engineers Download link in next tweet!

218,466 次观看 • 11 个月前

Microsoft has launched a powerful new data analysis tool! Introducing Data Formulator, a 100% open-source LLM-powered, no-code tool that transforms data in a snap and creates stunning visualizations. Key features include: 🤖 AI-powered data transformation 🖱️ Interactive drag-and-drop UI for visualizations 💬 Seamless blend of UI & natural language inputs But that’s not all: You can even create charts beyond your initial dataset. Data Formulator automatically identifies extra computation needs, generates fields for you, and outputs the final visualization. Find the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials on AI and Machine Learning.

Microsoft has launched a powerful new data analysis tool! Introducing Data Formulator, a 100% open-source LLM-powered, no-code tool that transforms data in a snap and creates stunning visualizations. Key features include: 🤖 AI-powered data transformation 🖱️ Interactive drag-and-drop UI for visualizations 💬 Seamless blend of UI & natural language inputs But that’s not all: You can even create charts beyond your initial dataset. Data Formulator automatically identifies extra computation needs, generates fields for you, and outputs the final visualization. Find the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights and tutorials on AI and Machine Learning.

280,385 次观看 • 1 年前

What they don't tell you about vibe coding: • Moltbook exposed 1.5M auth tokens. The owner hadn't written a single line of code. • Tea App leaked 72,000 government IDs. The database was just open, no sophisticated hack needed. • A researcher took control of a journalist's computer through her own vibe-coded game, without a single click. The code ran fine in all three cases, tests passed, reviews looked clean, and nothing raised a flag. That's the problem nobody is talking about. Teams are shipping faster than ever. AI writes the code. CI catches build failures. Tests catch regressions. Observability catches outages. But nobody is asking the one question that actually matters: What can an attacker do with this, right now? Because the bottleneck is no longer writing code. It's understanding what that code actually exposes once it's live. PR reviews miss auth edge cases. Unit tests don't probe broken access control. Staging environments don't simulate adversarial behavior. And business logic flaws look completely fine until someone decides to break them on purpose. Strix is an open-source tool that fills this gap. It reviews your running app the way an attacker would: - Crawls the app and maps every exposed route and flow - Probes abuse paths dynamically, not just at build time - Returns findings with proof-of-concepts and suggested fixes Strix was benchmarked against 200 real companies and open-source repos, where it found 600+ verified vulnerabilities including assigned CVEs. It's designed to fit into how modern teams already work. Run it before a release, after major changes, or continuously as the app evolves. If your team is shipping AI-generated code and you don't currently have a way to answer "what does this actually expose", it's worth looking at. GitHub link in the next tweet.

What they don't tell you about vibe coding: • Moltbook exposed 1.5M auth tokens. The owner hadn't written a single line of code. • Tea App leaked 72,000 government IDs. The database was just open, no sophisticated hack needed. • A researcher took control of a journalist's computer through her own vibe-coded game, without a single click. The code ran fine in all three cases, tests passed, reviews looked clean, and nothing raised a flag. That's the problem nobody is talking about. Teams are shipping faster than ever. AI writes the code. CI catches build failures. Tests catch regressions. Observability catches outages. But nobody is asking the one question that actually matters: What can an attacker do with this, right now? Because the bottleneck is no longer writing code. It's understanding what that code actually exposes once it's live. PR reviews miss auth edge cases. Unit tests don't probe broken access control. Staging environments don't simulate adversarial behavior. And business logic flaws look completely fine until someone decides to break them on purpose. Strix is an open-source tool that fills this gap. It reviews your running app the way an attacker would: - Crawls the app and maps every exposed route and flow - Probes abuse paths dynamically, not just at build time - Returns findings with proof-of-concepts and suggested fixes Strix was benchmarked against 200 real companies and open-source repos, where it found 600+ verified vulnerabilities including assigned CVEs. It's designed to fit into how modern teams already work. Run it before a release, after major changes, or continuously as the app evolves. If your team is shipping AI-generated code and you don't currently have a way to answer "what does this actually expose", it's worth looking at. GitHub link in the next tweet.

52,284 次观看 • 2 个月前

Turn any workflow into an agent skill. I built a YC job finder, deployed it as MCP server & connected it to Claude Desktop. It finds matching roles & sends personalized application emails to the recruiter. If you can break a process into steps, this guide will help you automate it:

Turn any workflow into an agent skill. I built a YC job finder, deployed it as MCP server & connected it to Claude Desktop. It finds matching roles & sends personalized application emails to the recruiter. If you can break a process into steps, this guide will help you automate it:

61,658 次观看 • 2 个月前

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!

Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!

45,100 次观看 • 2 个月前