Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

I just built an open NotebookLM clone! Here's what it can do for you: - Process multi-modal data - Scrape websites and YouTube videos - Create a unified knowledge base - Lets you do RAG over it - Remember every conversation - Generate a podcast 🎙️ The idea here... is not to reinvent the wheel but to understand how one of the most powerful tools for learning and research actually works, by building it step-by-step! So by the end of this video, you'll learn how to: ↳ Process multimodal data (text, audio, video, URLs, and YouTube videos) into a format ready for LLMs ↳ Store everything in a vector database for fast retrieval ↳ Add a memory layer that remembers conversations and preferences for a personalized experience ↳ Chat with your knowledge base or generate podcasts using a fully open-source, locally running text-to-speech model The podcast generation feature is my favorite part! There's something powerful about turning written content into conversational audio that you can listen to while doing something else. The entire code is 100% open-source. I've shared a link in the replies! ____ Don't forget to drop a like if you enjoy my videos. It shows me I should be making more content like this. Cheers! :)show more

Akshay 🚀

278,003 subscribers

83,187 просмотров • 8 месяцев назад •via X (Twitter)

Наука и технологии Образование Искусство

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

This is one of the coolest open-source AI agent projects I've seen in a while: 'Understand Anything' It's a plugin for Claude Code, Codex, OpenCode etc. that analyzes your codebase and turns it into a knowledge base that you can interact with. It explains the codebase to you, rather than showing you the structure. It seems like it's designed for code but I opened my Obsidian vault of podcast highlights in Claude Code, then ran /understand. The result is a knowledge graph that I can search of highlights from 888 podcast episodes and 144K lines of markdown text.

This is one of the coolest open-source AI agent projects I've seen in a while: 'Understand Anything' It's a plugin for Claude Code, Codex, OpenCode etc. that analyzes your codebase and turns it into a knowledge base that you can interact with. It explains the codebase to you, rather than showing you the structure. It seems like it's designed for code but I opened my Obsidian vault of podcast highlights in Claude Code, then ran /understand. The result is a knowledge graph that I can search of highlights from 888 podcast episodes and 144K lines of markdown text.

Dan McAteer

156,531 просмотров • 5 дней назад

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Santiago

125,928 просмотров • 9 месяцев назад

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,624 просмотров • 1 год назад

I created a real-time Voice RAG Agent, powered by Gemma 3. - It lets you to talk to you docs in a voice you prefer - You can also clone your voice in just 5 seconds The code is open-source, and here's a step-by-step guide:

I created a real-time Voice RAG Agent, powered by Gemma 3. - It lets you to talk to you docs in a voice you prefer - You can also clone your voice in just 5 seconds The code is open-source, and here's a step-by-step guide:

Akshay 🚀

133,192 просмотров • 1 год назад

I built a service desk agent using ElevenLabs’ new Conversational AI Agents feature. Watch the video to see how responsive it is! Previously, I used ElevenLabs for cloning my voice and used my generated voice for narrations in some of my youtube videos. This feature takes elevenlabs' voice AI to a whole new level! It simplifies systems that used to require separate TTS (text-to-speech) and STT (speech-to-text) processes for both sides of the conversation. Now, it’s much simpler! Create your own agent here What will you create with this?

I built a service desk agent using ElevenLabs’ new Conversational AI Agents feature. Watch the video to see how responsive it is! Previously, I used ElevenLabs for cloning my voice and used my generated voice for narrations in some of my youtube videos. This feature takes elevenlabs' voice AI to a whole new level! It simplifies systems that used to require separate TTS (text-to-speech) and STT (speech-to-text) processes for both sides of the conversation. Now, it’s much simpler! Create your own agent here What will you create with this?

Melvin Vivas

27,722 просмотров • 1 год назад

Building with AI gets easier every day. Here is an open-source library that makes integrating AI into an application extremely easy: Star the repository! This library alone can make React the best front-end framework out there! There are a bunch of cool things I like about CopilotKit. Here are 3 of them: 1. It allows you to take any -powered agent and bring it into your application. (This is a brand-new feature!) 2. You can build an AI-powered chatbot in your application. The chatbot will have access to your context and can act on the application. 3. You can build a RAG workflow to process and answer questions from a real-time knowledge base. I recorded a video to show you how simple it is to make some of this happen. A few lines of code, and you are in business. Here is a link to the sample application: CopilotKit is open-source. You can self-host it. You can use it with any LLM. Thanks to the team for showing me their tool and collaborating with me on this post!

Santiago

108,824 просмотров • 2 лет назад

Be Smart as Karpathy Andrej Karpathy with Teamily AI 🧠 Your Personal Knowledge Base: ✅ Built in One Chat. 📈 Compounded via Conversations. Karpathy’s insight is spot on ( It attracts 10 million views in a few days. The idea is simple: AI should build personal knowledge from everything you feed it, so it stops rediscovering things from scratch like a Retrieval-Augmented Generation (RAG). But here’s the reality — most people aren’t Stanford PhD-level geeks like Karpathy. For the rest of us, operating a hacky collection of scripts and tools (Obsidian Web Clipper, Marp, Dataview, etc.) as seen in Karpathy’s idea file is far too complex ( The Internet needs an intuitive product where a personal knowledge base is a persistent, compounding artifact — one that grows alongside the content you consume, the contexts you inhabit, and the questions you ask. Teamily AI ( is the answer. The conversation IS the knowledge base. It’s an AI-native messenger where AI teammates join your chats. They remember your past discussions, your preferences, and your team’s context — getting smarter the more you talk. No setup. No complicated workflows. Just text as you normally do. Whether you’re saving articles and videos, brainstorming at work, or collaborating with colleagues, your AI teammates are right there. They listen, remember, and help — not from scratch every time, but by building a personal knowledge graph of everything you’re involved in. In essence, your knowledge compounds automatically. ✨ The user experience is effortless. Whenever you need a well-organized view of your data, just ask the "Personal AI" at the top of the Teamily window: "Visualize my personal knowledge base" Want to customize the style or indexes? Just chat with it. You define how you manage your knowledge. Our co-founder Aiden has prepared a short video to show you just how easy it is. 📽️

Be Smart as Karpathy Andrej Karpathy with Teamily AI 🧠 Your Personal Knowledge Base: ✅ Built in One Chat. 📈 Compounded via Conversations. Karpathy’s insight is spot on ( It attracts 10 million views in a few days. The idea is simple: AI should build personal knowledge from everything you feed it, so it stops rediscovering things from scratch like a Retrieval-Augmented Generation (RAG). But here’s the reality — most people aren’t Stanford PhD-level geeks like Karpathy. For the rest of us, operating a hacky collection of scripts and tools (Obsidian Web Clipper, Marp, Dataview, etc.) as seen in Karpathy’s idea file is far too complex ( The Internet needs an intuitive product where a personal knowledge base is a persistent, compounding artifact — one that grows alongside the content you consume, the contexts you inhabit, and the questions you ask. Teamily AI ( is the answer. The conversation IS the knowledge base. It’s an AI-native messenger where AI teammates join your chats. They remember your past discussions, your preferences, and your team’s context — getting smarter the more you talk. No setup. No complicated workflows. Just text as you normally do. Whether you’re saving articles and videos, brainstorming at work, or collaborating with colleagues, your AI teammates are right there. They listen, remember, and help — not from scratch every time, but by building a personal knowledge graph of everything you’re involved in. In essence, your knowledge compounds automatically. ✨ The user experience is effortless. Whenever you need a well-organized view of your data, just ask the "Personal AI" at the top of the Teamily window: "Visualize my personal knowledge base" Want to customize the style or indexes? Just chat with it. You define how you manage your knowledge. Our co-founder Aiden has prepared a short video to show you just how easy it is. 📽️

Teamily AI

15,400 просмотров • 2 месяцев назад

Finally! A Text-to-SQL tool that actually works! Vanna is an open-source RAG framework for complex Text-to-SQL generation. It manages dynamic data and allows custom RAG model training for greater accuracy. 100% open-source.

Finally! A Text-to-SQL tool that actually works! Vanna is an open-source RAG framework for complex Text-to-SQL generation. It manages dynamic data and allows custom RAG model training for greater accuracy. 100% open-source.

Akshay 🚀

168,600 просмотров • 1 год назад

New short course Multimodal RAG: Chat with Videos, developed with Intel and taught by vasudevlal! In this course, you’ll work with LLaVA (Large Language and Vision Assistant), a Large Vision Language Model (LVLM) that can process both images and text. For example, given an image of a person doing a handstand on a skateboard at the beach, LLaVA doesn't just caption the scene, it’s able to predict possible outcomes, like the person losing balance or falling off. By understanding not just what's in a video frame, but what might happen next, your application can provide more insightful answers to questions about video. You'll build a full multimodal RAG pipeline that can chat about video content: - Use the BridgeTower model to create joint text-image embeddings in a 512-dimensional multimodal semantic space. - Learn video processing techniques to extract keyframes, generate transcripts using Whisper, and create captions. - Use the LanceDB vector database to store and retrieve high-dimensional multimodal embeddings. - Integrate the LLaVA model, combining CLIP's (Contrastive Language Image Pretraining) vision transformer with Llama, for advanced visual-textual reasoning. Your final system will ingest video data, generate embeddings for frames and text, perform similarity searches for relevant content, and use the retrieved multimodal context to inform LVLM-based response generation. The result is a system capable of answering nuanced questions about video content, effectively chatting about the video it has processed. Please sign up here!

New short course Multimodal RAG: Chat with Videos, developed with Intel and taught by vasudevlal! In this course, you’ll work with LLaVA (Large Language and Vision Assistant), a Large Vision Language Model (LVLM) that can process both images and text. For example, given an image of a person doing a handstand on a skateboard at the beach, LLaVA doesn't just caption the scene, it’s able to predict possible outcomes, like the person losing balance or falling off. By understanding not just what's in a video frame, but what might happen next, your application can provide more insightful answers to questions about video. You'll build a full multimodal RAG pipeline that can chat about video content: - Use the BridgeTower model to create joint text-image embeddings in a 512-dimensional multimodal semantic space. - Learn video processing techniques to extract keyframes, generate transcripts using Whisper, and create captions. - Use the LanceDB vector database to store and retrieve high-dimensional multimodal embeddings. - Integrate the LLaVA model, combining CLIP's (Contrastive Language Image Pretraining) vision transformer with Llama, for advanced visual-textual reasoning. Your final system will ingest video data, generate embeddings for frames and text, perform similarity searches for relevant content, and use the retrieved multimodal context to inform LVLM-based response generation. The result is a system capable of answering nuanced questions about video content, effectively chatting about the video it has processed. Please sign up here!

Andrew Ng

107,548 просмотров • 1 год назад

karpathy just broke the internet with something called auto research it’s basically an ai research agent that runs experiments for you 24/7 you give it a goal like “make this model better” “find a higher converting landing page” “lower customer acquisition cost” then it runs a loop: 1) plan an experiment 2) edit the code or config 3) run a short test on a gpu 4) read the metrics 5) keep the winner 6) try again over and over while you sleep by the morning you wake up to the best version actual tested improvements think of it like a robot research intern that runs hundreds of experiments and only keeps the winners this is link to his repo for your to mess around with it in the latest episode of The Startup Ideas Podcast (SIP) 🧃 i break down: • what auto research actually is • how it works step by step • 10 business ideas you can build with it • how to install it and start using it this one is saucy because tools like this change how startups get built watch

karpathy just broke the internet with something called auto research it’s basically an ai research agent that runs experiments for you 24/7 you give it a goal like “make this model better” “find a higher converting landing page” “lower customer acquisition cost” then it runs a loop: 1) plan an experiment 2) edit the code or config 3) run a short test on a gpu 4) read the metrics 5) keep the winner 6) try again over and over while you sleep by the morning you wake up to the best version actual tested improvements think of it like a robot research intern that runs hundreds of experiments and only keeps the winners this is link to his repo for your to mess around with it in the latest episode of The Startup Ideas Podcast (SIP) 🧃 i break down: • what auto research actually is • how it works step by step • 10 business ideas you can build with it • how to install it and start using it this one is saucy because tools like this change how startups get built watch

GREG ISENBERG

434,928 просмотров • 3 месяцев назад

NotebookLM is great for research. Claude’s browser agent is great for clicking. Combine them and you get a “researcher + executor” that works while you do something else. → Install Claude for Chrome and enable browser use. → Set “Ask before acting” for safety. → Prompt: “Open NotebookLM → create notebook → add sources → build a data table → export to Google Sheets → generate an audio overview.” → Let it run in the background while you work. This turns competitor research into a repeatable SOP instead of a one-time grind. Save this video, you’ll stop doing busywork. Want the SOP? DM me. 💬

NotebookLM is great for research. Claude’s browser agent is great for clicking. Combine them and you get a “researcher + executor” that works while you do something else. → Install Claude for Chrome and enable browser use. → Set “Ask before acting” for safety. → Prompt: “Open NotebookLM → create notebook → add sources → build a data table → export to Google Sheets → generate an audio overview.” → Let it run in the background while you work. This turns competitor research into a repeatable SOP instead of a one-time grind. Save this video, you’ll stop doing busywork. Want the SOP? DM me. 💬

Julian Goldie SEO

15,895 просмотров • 5 месяцев назад

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

A 100% open-source alternative to n8n! Sim is a drag-and-drop UI for creating powerful AI agent workflows: - Runs locally on your machine - Works with local LLMs I built a stock market research agent & connected it to Telegram in minutes. Here's a step-by-step guide:

Akshay 🚀

176,333 просмотров • 6 месяцев назад

If you could only learn one thing that will be relevant for the next 10-20 years, focus on learning how to deal with data. The future is not about faster hardware, smarter algorithms, or better ideas. The future is about DATA, and those who know how to deal with it will stay relevant much longer than anyone else. I recorded a video to show you how easy it is to get started. In the video, I'm using Kestra. For a long time, I was a fan of AirFlow. Then, I moved to AWS Step Functions. Today, I only use Kestra. Kestra is open-source (repo link below) and kind enough to sponsor my work. The video will show you how easy it is to do the following: 1. Run Kestra locally (literally, one command) 2. Build a simple flow 3. Run Python scripts as part of your flow 4. Connect to HuggingFace models If you have never built a data pipeline, open Kestra's Quick Start Guide and follow their examples. (I think it will take you one weekend to feel comfortable with the application and build the courage you need to get into more serious work.)

If you could only learn one thing that will be relevant for the next 10-20 years, focus on learning how to deal with data. The future is not about faster hardware, smarter algorithms, or better ideas. The future is about DATA, and those who know how to deal with it will stay relevant much longer than anyone else. I recorded a video to show you how easy it is to get started. In the video, I'm using Kestra. For a long time, I was a fan of AirFlow. Then, I moved to AWS Step Functions. Today, I only use Kestra. Kestra is open-source (repo link below) and kind enough to sponsor my work. The video will show you how easy it is to do the following: 1. Run Kestra locally (literally, one command) 2. Build a simple flow 3. Run Python scripts as part of your flow 4. Connect to HuggingFace models If you have never built a data pipeline, open Kestra's Quick Start Guide and follow their examples. (I think it will take you one weekend to feel comfortable with the application and build the courage you need to get into more serious work.)

Santiago

51,012 просмотров • 1 год назад

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

I just created an agentic-workflow to automatically write and publish content for me! It's powered by CrewAI Flows and Llama 3.2, running 100% locally. Tech stack: - CrewAI to build an agentic workflow - FireCrawl for web scraping - Typefully for scheduling Here's how it works: - You provide a link to a website. - It scrapes and saves the data as markdown. - A router triggers the desired Crew of agents. - The Crew prepares a ready-to-publish draft. - Finally, use Typefully to post it to your socials. Totally hands-off and 100% automated! In this video, I provide a deep dive into how it actually works! Find the link to all the code in the next tweet! Enjoy the video! 🥂

Akshay 🚀

98,126 просмотров • 1 год назад

Dreamina Seedance 2.0 is by far the best video model I've tried, and Dreamina makes it even better. You can try it in the link below. It's not just about the quality of the output, but about how much control you have over your video's look. Watch my video here. I'm attaching reference images and asking the model to generate a video using them. I can reference each image using the @ symbol to instruct the model which image to use. You can even upload a clip and use its camera movement, styles from an image, and audio vibe from a track. By the way, you can take an existing video and replace, remove, or add elements to it while the model preserves everything else. This is the closest we've gotten to "editing videos like photos".

Dreamina Seedance 2.0 is by far the best video model I've tried, and Dreamina makes it even better. You can try it in the link below. It's not just about the quality of the output, but about how much control you have over your video's look. Watch my video here. I'm attaching reference images and asking the model to generate a video using them. I can reference each image using the @ symbol to instruct the model which image to use. You can even upload a clip and use its camera movement, styles from an image, and audio vibe from a track. By the way, you can take an existing video and replace, remove, or add elements to it while the model preserves everything else. This is the closest we've gotten to "editing videos like photos".

Santiago

45,143 просмотров • 3 месяцев назад

Along with text, images, video and code, Gemini is able to process raw audio signal end-to-end. 🔊 It can listen to and understand speech, making it not only useful for transcription but a model that has a much more nuanced perception of its environment. ↓

Along with text, images, video and code, Gemini is able to process raw audio signal end-to-end. 🔊 It can listen to and understand speech, making it not only useful for transcription but a model that has a much more nuanced perception of its environment. ↓

Google DeepMind

140,150 просмотров • 2 лет назад

How to Create Music That Matches Your Video's Sound with Suno Studio I made a quick video showing how I use Suno Studio to generate background music that fits the audio and mood of a video, similar to the example in the quoted post. This is the method I use at the moment but if you have a better approach, feel free to share it. Before starting, export the original audio from your video. I usually do this in CapCut. After generating a track you like in Suno Studio, you can import it back into CapCut and add it to your video. *This is not a paid partnership.

How to Create Music That Matches Your Video's Sound with Suno Studio I made a quick video showing how I use Suno Studio to generate background music that fits the audio and mood of a video, similar to the example in the quoted post. This is the method I use at the moment but if you have a better approach, feel free to share it. Before starting, export the original audio from your video. I usually do this in CapCut. After generating a track you like in Suno Studio, you can import it back into CapCut and add it to your video. *This is not a paid partnership.

Kōda

32,490 просмотров • 10 дней назад