Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model.... No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.show more

Santiago

452,245 subscribers

89,624 Aufrufe • vor 1 Jahr •via X (Twitter)

Bildung Wissenschaft & Technologie Nachrichten & Politik

Anya Rossi• Live Now

Private livecam show

11 Kommentare

Profilbild von Santiago

Santiagovor 1 Jahr

This is a game-changer for anyone who wants world-class RAG performance with top-notch security. Here is GroundX's on-prem website: Sign up for a cloud account to start testing your documents for free. Then download the open-source repository and follow the instructions on this GitHub repository: Disclaimer: I've been working with the team behind GroundX for a year+ now. I believe they have built one of the best RAG ecosystems in the world.

Profilbild von Connor Groce

Connor Grocevor 1 Jahr

I'm a multi-brand franchise entrepreneur. I’m also a franchise consultant who helps people looking to get into business ownership. Here are my 7 keys to success in franchising (bookmark this): 1. Go all in or partner with someone who will. Franchises are not mutual funds. Do not let anyone convince you that this is a passive investment. 2. “Plant your Flag” before growing. Grow intentionally. Don’t let FOMO cloud your judgement. Opportunity is infinite and will present itself when you’re ready to claim it. 3. Focus on people, both when choosing franchises and operating them. A business is only as strong as the team behind it. The team is only as strong as their leader empowers them to be. 4. Never confuse passion with opportunity. Strategically pick businesses based on the opportunity they present and how they align with your skillset. Then use all the money you make to chase your passions as hobbies. 5. “Flatten The Curve” when it comes to learning. Milk every last ounce of value out of the support you get from the franchisor and the network of other franchisees. You’re paying for it either way. 6. Replace yourself. Hire great people and built great systems to the point that you can remove yourself from the business. But have so much fun that you don’t want to. 7. Repeat. My favorite part about being a franchise entrepreneur is the scalability. There is no ceiling to what you can build. Do it again, and again, and again until or unless you don’t want to anymore.

Profilbild von Douglas Fugazi

Douglas Fugazivor 1 Jahr

Very nice.

Profilbild von Ray | AI marketer - Social Media Assistant

Ray | AI marketer - Social Media Assistantvor 1 Jahr

How do you plan to address data privacy concerns when implementing a RAG system, especially with sensitive company information?

Profilbild von Santiago

Santiagovor 1 Jahr

Keeping it in your network (instead of sending your data to a third-party company) is a great start.

Profilbild von Adam David Long

Adam David Longvor 1 Jahr

Thanks so much for this. You may have converted this already -- but would REALLY appreciate your advice on hardware. When I tried running local RAG before, I quickly learned that my Mac Mini M2 was not up to creating the embeddings.

Profilbild von Santiago

Santiagovor 1 Jahr

I know you found the information you asked for, but for anyone's benefit, this solution will not run on a personal computer. This solution will run a few models requiring GPU support to process your information. A personal computer won't cut it.

Profilbild von Coffee Nootrop

Coffee Nootropvor 1 Jahr

Government and military would probably use Azure AI platform since they already have big partnerships with Microsoft.

Profilbild von A

Avor 1 Jahr

How does it deal with domain specific knowledge? Outside of the general enterprise conceptual spectrum

Profilbild von AI Times

AI Timesvor 1 Jahr

Exciting to see advancements in document understanding with an enterprise RAG system installation guide! Secure data processing on-premises is crucial.

Profilbild von gdbsmg

gdbsmgvor 1 Jahr

@Readwise save thread

Ähnliche Videos

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

Santiago

255,362 Aufrufe • vor 1 Jahr

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Knowledge graphs for representing information are unbeatable. After this, you will never build a RAG system without knowledge graphs. It will take you five lines of code to build a knowledge graph with your data. I recorded a video to show you how you can do this. I used Cognee, an open-source library that outperforms any basic vector search approach in terms of retrieval relevance. They are collaborating with me on this post. Cognee is: • Easy to use • Reduces hallucinations • Open-source Here is a link to the repository: They also offer a comprehensive platform and UI with Python notebooks you can utilize to manage your data. Here is the link:

Santiago

125,928 Aufrufe • vor 9 Monaten

Building a RAG system that works with real-life documents is crazy hard. Why is nobody talking about this? (I'll show you what a complex document looks like in the attached video. Good luck with the 10-line code demos if you want to deal with this.) All I see online are "how to talk to a PDF" demos that won't take you far. If you think that's all you need, you won't like what happens when you try. In the video below, I'll show you what it takes to build a system that processes huge numbers of documents without losing accuracy. There are a few tricks here that I'm sure will impress you. The magic is happening in three phases: • In the way we process the documents • In the way we chunk the content • In the way we augment and store the chunks This is all built-in in GroundX, EyeLevel.AI's out-of-the-box RAG system. You can use it as a SaaS or install it in your own Kubernetes cluster and run it locally. If you have seen any other RAG system doing what I do in the attached video, please, let me know. I'd love to check them out. Disclaimer: I work with the team at EyeLevel. My advice is to take any document you have lying around, go to their site, create an account, and try them out. Here is their website:

Building a RAG system that works with real-life documents is crazy hard. Why is nobody talking about this? (I'll show you what a complex document looks like in the attached video. Good luck with the 10-line code demos if you want to deal with this.) All I see online are "how to talk to a PDF" demos that won't take you far. If you think that's all you need, you won't like what happens when you try. In the video below, I'll show you what it takes to build a system that processes huge numbers of documents without losing accuracy. There are a few tricks here that I'm sure will impress you. The magic is happening in three phases: • In the way we process the documents • In the way we chunk the content • In the way we augment and store the chunks This is all built-in in GroundX, EyeLevel.AI's out-of-the-box RAG system. You can use it as a SaaS or install it in your own Kubernetes cluster and run it locally. If you have seen any other RAG system doing what I do in the attached video, please, let me know. I'd love to check them out. Disclaimer: I work with the team at EyeLevel. My advice is to take any document you have lying around, go to their site, create an account, and try them out. Here is their website:

Santiago

75,780 Aufrufe • vor 1 Jahr

Building with AI gets easier every day. Here is an open-source library that makes integrating AI into an application extremely easy: Star the repository! This library alone can make React the best front-end framework out there! There are a bunch of cool things I like about CopilotKit. Here are 3 of them: 1. It allows you to take any -powered agent and bring it into your application. (This is a brand-new feature!) 2. You can build an AI-powered chatbot in your application. The chatbot will have access to your context and can act on the application. 3. You can build a RAG workflow to process and answer questions from a real-time knowledge base. I recorded a video to show you how simple it is to make some of this happen. A few lines of code, and you are in business. Here is a link to the sample application: CopilotKit is open-source. You can self-host it. You can use it with any LLM. Thanks to the team for showing me their tool and collaborating with me on this post!

Santiago

108,824 Aufrufe • vor 2 Jahren

Here's how I would learn data engineering basics in 2025: - Find a data source you care about (examples: gaming APIs, stock market, web scraping, etc) - Use Python to interact and ingest your source. Initially just write the data to a CSV. - Setup an account with Snowflake or Google BigQuery. - update your Python script to load a table in Snowflake/BigQuery - schedule your script with CRON in the cloud with a service like Heroku. - build aggregations and visualizations on top of your ingested data Only thing this misses is data quality and complex job orchestration which you can learn later! How would you learn data engineering nowadays?

Here's how I would learn data engineering basics in 2025: - Find a data source you care about (examples: gaming APIs, stock market, web scraping, etc) - Use Python to interact and ingest your source. Initially just write the data to a CSV. - Setup an account with Snowflake or Google BigQuery. - update your Python script to load a table in Snowflake/BigQuery - schedule your script with CRON in the cloud with a service like Heroku. - build aggregations and visualizations on top of your ingested data Only thing this misses is data quality and complex job orchestration which you can learn later! How would you learn data engineering nowadays?

Zach Wilson

20,368 Aufrufe • vor 11 Monaten

Imagine an AI application that can type anywhere you can and use the full context of what's on your screen. This is the application we all deserve (at least if you have macOS.) Check out Omnipilot. It's an app that works with every other macOS application and uses Claude Sonet 3.5 in the background—it also supports Gemini and GPT-4o. Here is the idea: You can use the tool to ask questions about anything on your screen. Or you can use it to autocomplete the text you are typing. You don't need to copy and paste anymore or waste your time providing context to a model. It sees what you see. It works right where you are. That's pretty cool. Here are a couple of cool examples: • Use it to reply to an email • Use it in the terminal to autocomplete a command • Use it to finish a document • Use to send a message on Slack AI at the system level is bonkers. You can read a ton more on their Product Hunt launch page: Thanks to the Omnipilot team for collaborating with me on this post!

Imagine an AI application that can type anywhere you can and use the full context of what's on your screen. This is the application we all deserve (at least if you have macOS.) Check out Omnipilot. It's an app that works with every other macOS application and uses Claude Sonet 3.5 in the background—it also supports Gemini and GPT-4o. Here is the idea: You can use the tool to ask questions about anything on your screen. Or you can use it to autocomplete the text you are typing. You don't need to copy and paste anymore or waste your time providing context to a model. It sees what you see. It works right where you are. That's pretty cool. Here are a couple of cool examples: • Use it to reply to an email • Use it in the terminal to autocomplete a command • Use it to finish a document • Use to send a message on Slack AI at the system level is bonkers. You can read a ton more on their Product Hunt launch page: Thanks to the Omnipilot team for collaborating with me on this post!

Santiago

72,306 Aufrufe • vor 1 Jahr

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. 2. The connector ecosystem to load data from unstructured data sources is very immature. 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. The goal of a RAG Pipeline is to solve these problems. The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. At a high level, there are four different stages in the architecture of a RAG pipeline: 1. Ingestion: Here is where the pipeline loads the information from the data source. 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. 3. Transform: Where the pipeline chunks the data and generates document embeddings. 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. There are different rabbit holes at each one of these stages. Here are three of them: 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 Aufrufe • vor 1 Jahr

Web scraping is a critical skill, and yet nobody talks about it. How do you think companies are training their Large Language Models? Where do you think the data comes from? But web scraping goes beyond all of that. Imagine giving an AI agent access to any public online data in real time! I like to call this "web-scrapping on demand", and I'm pretty sure it's going to unlock unlimited power for AI applications. I recorded a quick video to show you how you can do this using Apify. I've talked about them before, and they are collaborating with me on this post. They have one of the best open-source web scraping and browser automation libraries out there: But it gets much better than this! You can use MCP to connect your AI Agents and applications to the Apify platform and use any specialized actor on demand to scrape and process online data. In the video, I used Cursor to scrape LinkedIn posts with the words "Machine Learning" in real time. Worked like a charm with no code needed! Here is a link to the platform: Think about this: You can now feed your AI applications with any public data on demand! We aren't ready for what's coming.

Web scraping is a critical skill, and yet nobody talks about it. How do you think companies are training their Large Language Models? Where do you think the data comes from? But web scraping goes beyond all of that. Imagine giving an AI agent access to any public online data in real time! I like to call this "web-scrapping on demand", and I'm pretty sure it's going to unlock unlimited power for AI applications. I recorded a quick video to show you how you can do this using Apify. I've talked about them before, and they are collaborating with me on this post. They have one of the best open-source web scraping and browser automation libraries out there: But it gets much better than this! You can use MCP to connect your AI Agents and applications to the Apify platform and use any specialized actor on demand to scrape and process online data. In the video, I used Cursor to scrape LinkedIn posts with the words "Machine Learning" in real time. Worked like a charm with no code needed! Here is a link to the platform: Think about this: You can now feed your AI applications with any public data on demand! We aren't ready for what's coming.

Santiago

101,473 Aufrufe • vor 11 Monaten

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 Aufrufe • vor 3 Jahren

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

This Python script helps you better understand how embeddings work for SEO. Input a sentence + a query and BERT will calculate a content "Similarity Score": Search engines use embedding models to translate your content into numeric values. This is how they're able to mathematically determine whether a page on your site is actually relevant to a query someone is searching for. Once both the content and query are run through embeddings - they can calculate the "cosine similarity" or how similar the two entities are to each one. With this Python script, you can actually visualize how cosine similarity works for yourself. To you use you'll simply: 1. Save the Python script to a text file 2. Use your Terminal to run the script 3. You'll be prompted for both a "Sentence" and "Query" 4. Once entered, the script will calculate a "Similarity Score" between the text and query. This is how relevant your sentence is for the target keyword. By running this, you'll see that search engines are able to come with an calculation of content relevance. You'll need to think about the content on your site the same way - which sections have high scores and which ones have low ones? I've linked the Python script in the comments below. No coding knowledge is required and it walks you step by step on how to implement it.

Chris Long

13,185 Aufrufe • vor 1 Jahr

Robots will bring billionaire living to a lot more people. I had the blessing to eat with Guy Savoy several times. One of the best chefs in the world. He, and other top chefs taught me about the importance of getting fresh ingredients. Here is how robots and World Models will bring that and what do I mean by “everything as a service?” In three years I will have this conversation with my 1X Neo humanoid robot: “Hey Neo I want to upgrade our food to billionaire level.” “I can do that. Food as a service costs $500 a month. I will buy only hand grown fresh organic food and I will prepare amazing meals for you and your family.” Where is the supply chain for such food? Farmers’ markets where everything is fresh and organic. You gotta stop buying at grocery stores to upgrade your diet. “Hey Neo here are the keys to Tesla Robotaxi. And here is my credit card. Start up food as a service.” Neo will take an autonomous car to the market. “But Neo how do you know where to go?” “Well a guy on X did a video of the farmer’s market nearby.” “I watched it, and now know roughly the kinds of things I can get there.” We are too late to start today, the market is closed now, but we can start next week. Look at this video the way Grok does. I am playing humanoid today. In one visit my Neo will ingest all of this into its World Model. In the second visit it will get even better. In the third visit even better. World models are going to be real time by the end of next year from a variety of companies. فيصل Tesla Robotaxi already serves both our home and the market. Our Tesla drove us there and already knows where it is. Grok is already a world model. In a few minutes it can tell you what it learned by watching this video. It watches all my videos before distributing them to you. So it knows how not to overwhelm @jason’s feed with my prolific posting. It will get a lot better soon. But after three trips to this farmer’s market my robot will know everything about this market including the names of the farmers. Watch this video, you meet one. Grok can do a RAG search and learn everything about him, including that he doesn’t have a Website, and only posts on Facebook. Also that he takes Apple Pay. It already knows everything it sees. The names of the vegetables, fruits, nuts, and what is ravioli. One vendor sells fresh ravioli made early this morning. If you are freaked out by privacy have your Neo stay in the garage until it is time to do something for you. In three years I will be eating fresh food with my brother in law while football is on the TV. If you don’t have a robot you won’t eat as well unless you are a billionaire who can afford to pay the human to shop and cook for you. The Robotaxi network starts up next year (without humans). The world models get good next year. By 2030 every one of you will have a robot in your home, at least part time. Who has the best world model? Tesla. Who understands the real world better? Grok. (I didn’t give this video to anyone else). Who soon will have the best humanoid? Tesla. Which company already has a Robotaxi in my driveway? Tesla. Which company has the best video ingestion engine? Tesla. Which company is about to turn on a real time world model? xAI. Which company would you want to invest in? Tesla and xAI. Which is why, if you are a Tesla investor and you didn’t vote for Tesla to invest in xAI you hurting yourself.. Everything as a service is about to arrive. Everyone who can afford a $20,000 robot, which can be financed will have it next year. I will. Anyone worried about privacy has no idea how useful this all will be to make your lives better. And how much money it will make for a robot company to put it all together. And only Tesla has all the pieces to make the meal.

Robots will bring billionaire living to a lot more people. I had the blessing to eat with Guy Savoy several times. One of the best chefs in the world. He, and other top chefs taught me about the importance of getting fresh ingredients. Here is how robots and World Models will bring that and what do I mean by “everything as a service?” In three years I will have this conversation with my 1X Neo humanoid robot: “Hey Neo I want to upgrade our food to billionaire level.” “I can do that. Food as a service costs $500 a month. I will buy only hand grown fresh organic food and I will prepare amazing meals for you and your family.” Where is the supply chain for such food? Farmers’ markets where everything is fresh and organic. You gotta stop buying at grocery stores to upgrade your diet. “Hey Neo here are the keys to Tesla Robotaxi. And here is my credit card. Start up food as a service.” Neo will take an autonomous car to the market. “But Neo how do you know where to go?” “Well a guy on X did a video of the farmer’s market nearby.” “I watched it, and now know roughly the kinds of things I can get there.” We are too late to start today, the market is closed now, but we can start next week. Look at this video the way Grok does. I am playing humanoid today. In one visit my Neo will ingest all of this into its World Model. In the second visit it will get even better. In the third visit even better. World models are going to be real time by the end of next year from a variety of companies. فيصل Tesla Robotaxi already serves both our home and the market. Our Tesla drove us there and already knows where it is. Grok is already a world model. In a few minutes it can tell you what it learned by watching this video. It watches all my videos before distributing them to you. So it knows how not to overwhelm @jason’s feed with my prolific posting. It will get a lot better soon. But after three trips to this farmer’s market my robot will know everything about this market including the names of the farmers. Watch this video, you meet one. Grok can do a RAG search and learn everything about him, including that he doesn’t have a Website, and only posts on Facebook. Also that he takes Apple Pay. It already knows everything it sees. The names of the vegetables, fruits, nuts, and what is ravioli. One vendor sells fresh ravioli made early this morning. If you are freaked out by privacy have your Neo stay in the garage until it is time to do something for you. In three years I will be eating fresh food with my brother in law while football is on the TV. If you don’t have a robot you won’t eat as well unless you are a billionaire who can afford to pay the human to shop and cook for you. The Robotaxi network starts up next year (without humans). The world models get good next year. By 2030 every one of you will have a robot in your home, at least part time. Who has the best world model? Tesla. Who understands the real world better? Grok. (I didn’t give this video to anyone else). Who soon will have the best humanoid? Tesla. Which company already has a Robotaxi in my driveway? Tesla. Which company has the best video ingestion engine? Tesla. Which company is about to turn on a real time world model? xAI. Which company would you want to invest in? Tesla and xAI. Which is why, if you are a Tesla investor and you didn’t vote for Tesla to invest in xAI you hurting yourself.. Everything as a service is about to arrive. Everyone who can afford a $20,000 robot, which can be financed will have it next year. I will. Anyone worried about privacy has no idea how useful this all will be to make your lives better. And how much money it will make for a robot company to put it all together. And only Tesla has all the pieces to make the meal.

Robert Scoble

1,363,973 Aufrufe • vor 7 Monaten

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

You can now fine-tune Llama 3 without writing a single line of code! We are moving at breakneck speed. I recorded a video to show you how to fine-tune any open-source model in a few minutes. I'm using a GPT capable of taking a problem and turning it into a fine-tuned model that will solve it. You don't have to write any code. You only need to explain to a GPT what problem you want to solve and tell it you want to use Llama 3. For example, "fine-tune Llama 3" or "deploy zephyr." It feels magic. The system will recommend a dataset and fine-tune the model for you. I'm using Monster API, a platform that specializes in making fine-tuning and deploying open-source models easy and fast. Their stack is well-optimized to maximize fine-tuning efficiency using techniques like Q-Lora and vLLM. They are behind the GPT. Here is what you need to do: 1. Create an account at 2. Load the GPT with the link below This is as simple as it gets. When you are done, you can click a button to deploy the model and start using it. I have 10,000 free credits for anyone using the code "SANTIAGO" in the dashboard. You can use these credits to access, fine-tune, and deploy these open-source models. You can also keep up with their latest updates, and get free credits and special offers on their Discord server:

Santiago

324,578 Aufrufe • vor 2 Jahren

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 Aufrufe • vor 11 Monaten

Most developers can't explain how Single Sign-On (SSO) works. This was one of my favorite questions during technical interviews. I love to ask about it because it's not a trivial topic. Here is a 5-minute overview of how Single Sign-On works. We all hate passwords; the less we use them, the better, and SSO helps with that. When you log in to Google once and visit YouTube, Gmail, Drive, and any other connected service without re-entering your password, three players are working behind the scenes: • A user trying to access an application. You, in this case. • The application you want to access. For example, YouTube. • An Identity Provider (IDP) that will verify your identity. Google, in this case. Here is what happens when you try to access one application for the first time: 1. You try to log in to YouTube, and the application redirects you to the Identity Provider (IDP) for authentication. 2. The IDP (Google) checks your credentials and confirms your identity. It creates a new session for you on its server and sets a session cookie in your browser. 3. The IDP also creates a token for YouTube—a small piece of data that contains information about your identity. 4. Your browser grabs the token and presents it to YouTube. 5. YouTube checks the token, and if it is valid, lets you in. But then you want to access Google Drive: 1. You go to Google Drive, and the application redirects you to the IDP. 2. The IDP recognizes that you are still logged in because you have the session cookie. It doesn't need to ask for your credentials. 3. Instead, the IDP generates a new token for Drive. 4. Your browser grabs the token and presents it to Google Drive. If the token is valid, Drive lets you in. You can now access multiple applications without re-entering your password. This is probably one of the best things we've invented since sliced bread! But, of course, implementing Single Sign-On is a nightmare! If you are a developer, don't try to reinvent the wheel. I've been implementing SSO since dinosaurs were around, and I can tell you you want to check out Auth0. Auth0 makes implementing SSO 100x easier. They just updated their free plan, and you get a lot without having to pay a single cent. 25,000 monthly active users, unlimited social connections, and you can go to production with custom domains. FOR FREE! They are sponsoring this post. To save your time, keep your sanity, and have a really solid and secure solution, head over to their website:

Most developers can't explain how Single Sign-On (SSO) works. This was one of my favorite questions during technical interviews. I love to ask about it because it's not a trivial topic. Here is a 5-minute overview of how Single Sign-On works. We all hate passwords; the less we use them, the better, and SSO helps with that. When you log in to Google once and visit YouTube, Gmail, Drive, and any other connected service without re-entering your password, three players are working behind the scenes: • A user trying to access an application. You, in this case. • The application you want to access. For example, YouTube. • An Identity Provider (IDP) that will verify your identity. Google, in this case. Here is what happens when you try to access one application for the first time: 1. You try to log in to YouTube, and the application redirects you to the Identity Provider (IDP) for authentication. 2. The IDP (Google) checks your credentials and confirms your identity. It creates a new session for you on its server and sets a session cookie in your browser. 3. The IDP also creates a token for YouTube—a small piece of data that contains information about your identity. 4. Your browser grabs the token and presents it to YouTube. 5. YouTube checks the token, and if it is valid, lets you in. But then you want to access Google Drive: 1. You go to Google Drive, and the application redirects you to the IDP. 2. The IDP recognizes that you are still logged in because you have the session cookie. It doesn't need to ask for your credentials. 3. Instead, the IDP generates a new token for Drive. 4. Your browser grabs the token and presents it to Google Drive. If the token is valid, Drive lets you in. You can now access multiple applications without re-entering your password. This is probably one of the best things we've invented since sliced bread! But, of course, implementing Single Sign-On is a nightmare! If you are a developer, don't try to reinvent the wheel. I've been implementing SSO since dinosaurs were around, and I can tell you you want to check out Auth0. Auth0 makes implementing SSO 100x easier. They just updated their free plan, and you get a lot without having to pay a single cent. 25,000 monthly active users, unlimited social connections, and you can go to production with custom domains. FOR FREE! They are sponsoring this post. To save your time, keep your sanity, and have a really solid and secure solution, head over to their website:

Santiago

204,826 Aufrufe • vor 1 Jahr

Tune Studio is an end-to-end platform for developing applications using Large Language Models. So far, I haven't seen any other platform like this one. You can do everything here: 1. You can curate your data. 2. Use the playground to play with different models and try your ideas. 3. Fine-tune an open-source model on your data. 4. Deploy the model when you are done. This is awesome for anyone building generative AI applications. You can use Tune Studio to work with any of the open-source models out there. They were one of the few companies to host Llama 2 and Llama 3 before anyone else. Here is a link to check it out: One of their main selling points is that Tune Studio scales! You don't have to worry about serving your model to lots of users. They also have built-in user management, authentication, on-prem support, user context management, and pretty much everything you need to build generative AI applications. Thanks to the Tune team for collaborating with me on this post. We are living through the best years of development tools for AI developers. The field is unstoppable.

Tune Studio is an end-to-end platform for developing applications using Large Language Models. So far, I haven't seen any other platform like this one. You can do everything here: 1. You can curate your data. 2. Use the playground to play with different models and try your ideas. 3. Fine-tune an open-source model on your data. 4. Deploy the model when you are done. This is awesome for anyone building generative AI applications. You can use Tune Studio to work with any of the open-source models out there. They were one of the few companies to host Llama 2 and Llama 3 before anyone else. Here is a link to check it out: One of their main selling points is that Tune Studio scales! You don't have to worry about serving your model to lots of users. They also have built-in user management, authentication, on-prem support, user context management, and pretty much everything you need to build generative AI applications. Thanks to the Tune team for collaborating with me on this post. We are living through the best years of development tools for AI developers. The field is unstoppable.

Santiago

39,101 Aufrufe • vor 2 Jahren

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Our new short course, Knowledge Graphs for RAG, is now available! Knowledge graphs are a data structure that is great at capturing complex relationships between data of multiple types. By enabling more sophisticated retrieval of text than similarity search alone, knowledge graphs can improve the context you pass to the LLM and the performance of your RAG applications. In this course, taught by Andreas Kollegger of Neo4j, you’ll - Explore how knowledge graphs work by building a graph of public financial documents from scratch - Learn to write queries that retrieve text and data from the graph and use it to enhance the context you pass to an LLM chatbot - Combine a knowledge graph with a question-answer chain to build better RAG-powered chat systems Sign up here!

Andrew Ng

244,257 Aufrufe • vor 2 Jahren

108 workflow templates you can use to build AI applications without writing any code. You can use these templates with n8n. I recorded the attached video to show you how it works. n8n is the workhorse behind an open-source, self-hosted AI starter kit you can install on your computer. They are sponsoring this post. Here is the link to the starter kit repository: And here is the spreadsheet with the 108 templates: Whatever idea you have, search for something similar in the list of templates, and you'll save a ton of time. Lately, I've talked to many non-coders who want to start using AI more seriously to build things. n8n is perfect for that.

108 workflow templates you can use to build AI applications without writing any code. You can use these templates with n8n. I recorded the attached video to show you how it works. n8n is the workhorse behind an open-source, self-hosted AI starter kit you can install on your computer. They are sponsoring this post. Here is the link to the starter kit repository: And here is the spreadsheet with the 108 templates: Whatever idea you have, search for something similar in the list of templates, and you'll save a ton of time. Lately, I've talked to many non-coders who want to start using AI more seriously to build things. n8n is perfect for that.

Santiago

78,133 Aufrufe • vor 1 Jahr

99% of AI applications are cool-looking demos. Impressive, but don't get fooled by the hype. It takes a lot to build enterprise-grade products that deliver real value. I have at least three weekly conversations with companies that want to use a Large Language Model with their data. The demand is huge! Here is one idea about what you can do to help. The use cases that most of these companies want to solve are similar: They have an extensive knowledge base and want to build a simple application that uses that information to answer questions. In other words, they need help building Retrieval Augmented Generation (RAG) applications they can use in many different scenarios: 1. To train new employees 2. To help their support team 3. To search old meetings and documents 4. To help with their research However, building these systems is not straightforward. Yes, there's a lot of information online, but there aren't enough people who know how to create solutions that work. Here is the idea: Today, you can build an enterprise-grade RAG application without writing code. A couple of MIT PhDs with 10+ years of experience building AI applications created . It's a no-code platform for building applications using Large Language Models. They are partnering with me on this post. You can use Stack AI to create, test, and deploy an end-to-end production-ready AI system. It's SOC-2, HIPAA, and GDPR compliant and offers SSO, role management, access control, and on-premise deployments. Of course, you can use the platform with any LLM on the market now. It's the whole nine yards for building AI applications. Check them out here: 2023 was about models. 2024 is about the tools using these models to build production-ready applications. That's where I'd start.

99% of AI applications are cool-looking demos. Impressive, but don't get fooled by the hype. It takes a lot to build enterprise-grade products that deliver real value. I have at least three weekly conversations with companies that want to use a Large Language Model with their data. The demand is huge! Here is one idea about what you can do to help. The use cases that most of these companies want to solve are similar: They have an extensive knowledge base and want to build a simple application that uses that information to answer questions. In other words, they need help building Retrieval Augmented Generation (RAG) applications they can use in many different scenarios: 1. To train new employees 2. To help their support team 3. To search old meetings and documents 4. To help with their research However, building these systems is not straightforward. Yes, there's a lot of information online, but there aren't enough people who know how to create solutions that work. Here is the idea: Today, you can build an enterprise-grade RAG application without writing code. A couple of MIT PhDs with 10+ years of experience building AI applications created . It's a no-code platform for building applications using Large Language Models. They are partnering with me on this post. You can use Stack AI to create, test, and deploy an end-to-end production-ready AI system. It's SOC-2, HIPAA, and GDPR compliant and offers SSO, role management, access control, and on-premise deployments. Of course, you can use the platform with any LLM on the market now. It's the whole nine yards for building AI applications. Check them out here: 2023 was about models. 2024 is about the tools using these models to build production-ready applications. That's where I'd start.

Santiago

197,675 Aufrufe • vor 2 Jahren

How to Create Music That Matches Your Video's Sound with Suno Studio I made a quick video showing how I use Suno Studio to generate background music that fits the audio and mood of a video, similar to the example in the quoted post. This is the method I use at the moment but if you have a better approach, feel free to share it. Before starting, export the original audio from your video. I usually do this in CapCut. After generating a track you like in Suno Studio, you can import it back into CapCut and add it to your video. *This is not a paid partnership.

How to Create Music That Matches Your Video's Sound with Suno Studio I made a quick video showing how I use Suno Studio to generate background music that fits the audio and mood of a video, similar to the example in the quoted post. This is the method I use at the moment but if you have a better approach, feel free to share it. Before starting, export the original audio from your video. I usually do this in CapCut. After generating a track you like in Suno Studio, you can import it back into CapCut and add it to your video. *This is not a paid partnership.

Kōda

32,490 Aufrufe • vor 10 Tagen