Загрузка видео...

Не удалось загрузить видео

На главную

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model....

89,624 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Santiago
Santiago1 год назад

This is a game-changer for anyone who wants world-class RAG performance with top-notch security. Here is GroundX's on-prem website: Sign up for a cloud account to start testing your documents for free. Then download the open-source repository and follow the instructions on this GitHub repository: Disclaimer: I've been working with the team behind GroundX for a year+ now. I believe they have built one of the best RAG ecosystems in the world.

Фото профиля Connor Groce
Connor Groce1 год назад

I'm a multi-brand franchise entrepreneur. I’m also a franchise consultant who helps people looking to get into business ownership. Here are my 7 keys to success in franchising (bookmark this): 1. Go all in or partner with someone who will. Franchises are not mutual funds. Do not let anyone convince you that this is a passive investment. 2. “Plant your Flag” before growing. Grow intentionally. Don’t let FOMO cloud your judgement. Opportunity is infinite and will present itself when you’re ready to claim it. 3. Focus on people, both when choosing franchises and operating them. A business is only as strong as the team behind it. The team is only as strong as their leader empowers them to be. 4. Never confuse passion with opportunity. Strategically pick businesses based on the opportunity they present and how they align with your skillset. Then use all the money you make to chase your passions as hobbies. 5. “Flatten The Curve” when it comes to learning. Milk every last ounce of value out of the support you get from the franchisor and the network of other franchisees. You’re paying for it either way. 6. Replace yourself. Hire great people and built great systems to the point that you can remove yourself from the business. But have so much fun that you don’t want to. 7. Repeat. My favorite part about being a franchise entrepreneur is the scalability. There is no ceiling to what you can build. Do it again, and again, and again until or unless you don’t want to anymore.

Фото профиля Douglas Fugazi
Douglas Fugazi1 год назад

Very nice.

Фото профиля Ray | AI marketer - Social Media Assistant
Ray | AI marketer - Social Media Assistant1 год назад

How do you plan to address data privacy concerns when implementing a RAG system, especially with sensitive company information?

Фото профиля Santiago
Santiago1 год назад

Keeping it in your network (instead of sending your data to a third-party company) is a great start.

Фото профиля Adam David Long
Adam David Long1 год назад

Thanks so much for this. You may have converted this already -- but would REALLY appreciate your advice on hardware. When I tried running local RAG before, I quickly learned that my Mac Mini M2 was not up to creating the embeddings.

Фото профиля Santiago
Santiago1 год назад

I know you found the information you asked for, but for anyone's benefit, this solution will not run on a personal computer. This solution will run a few models requiring GPU support to process your information. A personal computer won't cut it.

Фото профиля Coffee Nootrop
Coffee Nootrop1 год назад

Government and military would probably use Azure AI platform since they already have big partnerships with Microsoft.

Фото профиля A
A1 год назад

How does it deal with domain specific knowledge? Outside of the general enterprise conceptual spectrum

Фото профиля AI Times
AI Times1 год назад

Exciting to see advancements in document understanding with an enterprise RAG system installation guide! Secure data processing on-premises is crucial.

Фото профиля gdbsmg
gdbsmg1 год назад

@Readwise save thread

Похожие видео

MCP is an absolute game-changer. (Together with DeepSeek, MCP is probably the hottest thing in AI over the last 6 months.) I use Cursor to write code 90% of the time. I built an MCP server to connect the Cursor agent to GroundX, an open-source RAG system, and I'm not going back. This is officially insane! Here is what I did, step by step: First, a little bit of context. I maintain an end-to-end Machine Learning System with several pipelines to process data, train, evaluate, register, deploy, and monitor a model. I've written a lot of documentation explaining how the system works and how to modify and maintain it. There's also the documentation of the few libraries I used to build the system. I'm a massive fan of GroundX, an open-source enterprise-grade RAG system you can run on your servers or deploy to any cloud provider. I've been working with them for a long time. GroundX offers two services. First, the "ingest" service uses a custom, pretrained vision model to ingest and understand your data. I used this to process all the documentation I have for my code. Markdown files, source code, HTML files, and even PDF documents. Everything I've written related to my project went into GroundX. Their second service is "search," which combines text and vector search with a fine-tuned re-ranker model to retrieve information from the data. I needed to connect Cursor with this service, and that's where MCP came in. I built an MCP server with two tools: 1. The first tool would go to GroundX and retrieve the available topics. Splitting the data into topics (or "buckets," as GroundX calls them) allows me to use the same setup to serve documentation from different topics. 2. The second tool would search GroundX under a specific topic for the context related to the supplied query. The magic happens after connecting the MCP server with Cursor. Now, I can ask any questions related to my project, and Cursor's AI agent retrieves the list of available topics from the RAG system and then searches it to provide relevant context to the model. I went from getting mediocre, sometimes wrong answers to 100% truthful, complete answers. Here is the crazy part:

Santiago

255,362 просмотров • 1 год назад

Traditional data pipelines don't work for RAG applications. There are 3 issues with them: ​ 1. Traditional data engineering solutions are optimized to handle structured data. RAG applications rely primarily on unstructured data. ​ 2. The connector ecosystem to load data from unstructured data sources is very immature. ​ 3. Traditional solutions do not offer any way to transform unstructured data into an optimized vector search index. ​ The goal of a RAG Pipeline is to solve these problems. ​ The number one objective is to create a reliable vector search index using factual knowledge and relevant context. This sounds easy, but it's one of the biggest challenges we face when building RAG applications. ​ At a high level, there are four different stages in the architecture of a RAG pipeline: ​ 1. Ingestion: Here is where the pipeline loads the information from the data source. ​ 2. Extraction: Where the pipeline processes the input data and decides how to retrieve the text contained inside them. ​ 3. Transform: Where the pipeline chunks the data and generates document embeddings. ​ 4. Load: Where the pipeline creates a search index in a vector database and loads the document embeddings. ​ There are different rabbit holes at each one of these stages. Here are three of them: ​ 1. Ingesting data once is simple. The hard part is refreshing the vector database whenever the original data source changes. ​ 2. Extracting the content of a plain text document is simple. The hard part is to extract content from complex documents containing tables, images, or cross-references. ​ 3. A simple continual chunking strategy with an overlap is simple. The hard part is to find the optimal strategy for your specific knowledge base and the way you are planning to query it. ​ In the attached video, I'll show you how you can build an enterprise-grade RAG Pipeline that solves every one of the above problems. ​ I'll use Vectorize. They partnered with me on this post. You can use them to build RAG pipelines optimized for accurate context retrieval. ​ ​ If you have a few documents lying around, set up a free account and give it a try.

Santiago

40,441 просмотров • 1 год назад

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 просмотров • 3 лет назад

Robots will bring billionaire living to a lot more people. I had the blessing to eat with Guy Savoy several times. One of the best chefs in the world. He, and other top chefs taught me about the importance of getting fresh ingredients. Here is how robots and World Models will bring that and what do I mean by “everything as a service?” In three years I will have this conversation with my 1X Neo humanoid robot: “Hey Neo I want to upgrade our food to billionaire level.” “I can do that. Food as a service costs $500 a month. I will buy only hand grown fresh organic food and I will prepare amazing meals for you and your family.” Where is the supply chain for such food? Farmers’ markets where everything is fresh and organic. You gotta stop buying at grocery stores to upgrade your diet. “Hey Neo here are the keys to Tesla Robotaxi. And here is my credit card. Start up food as a service.” Neo will take an autonomous car to the market. “But Neo how do you know where to go?” “Well a guy on X did a video of the farmer’s market nearby.” “I watched it, and now know roughly the kinds of things I can get there.” We are too late to start today, the market is closed now, but we can start next week. Look at this video the way Grok does. I am playing humanoid today. In one visit my Neo will ingest all of this into its World Model. In the second visit it will get even better. In the third visit even better. World models are going to be real time by the end of next year from a variety of companies. فيصل Tesla Robotaxi already serves both our home and the market. Our Tesla drove us there and already knows where it is. Grok is already a world model. In a few minutes it can tell you what it learned by watching this video. It watches all my videos before distributing them to you. So it knows how not to overwhelm @jason’s feed with my prolific posting. It will get a lot better soon. But after three trips to this farmer’s market my robot will know everything about this market including the names of the farmers. Watch this video, you meet one. Grok can do a RAG search and learn everything about him, including that he doesn’t have a Website, and only posts on Facebook. Also that he takes Apple Pay. It already knows everything it sees. The names of the vegetables, fruits, nuts, and what is ravioli. One vendor sells fresh ravioli made early this morning. If you are freaked out by privacy have your Neo stay in the garage until it is time to do something for you. In three years I will be eating fresh food with my brother in law while football is on the TV. If you don’t have a robot you won’t eat as well unless you are a billionaire who can afford to pay the human to shop and cook for you. The Robotaxi network starts up next year (without humans). The world models get good next year. By 2030 every one of you will have a robot in your home, at least part time. Who has the best world model? Tesla. Who understands the real world better? Grok. (I didn’t give this video to anyone else). Who soon will have the best humanoid? Tesla. Which company already has a Robotaxi in my driveway? Tesla. Which company has the best video ingestion engine? Tesla. Which company is about to turn on a real time world model? xAI. Which company would you want to invest in? Tesla and xAI. Which is why, if you are a Tesla investor and you didn’t vote for Tesla to invest in xAI you hurting yourself.. Everything as a service is about to arrive. Everyone who can afford a $20,000 robot, which can be financed will have it next year. I will. Anyone worried about privacy has no idea how useful this all will be to make your lives better. And how much money it will make for a robot company to put it all together. And only Tesla has all the pieces to make the meal.

Robert Scoble

1,363,973 просмотров • 7 месяцев назад

Announcing a new Coursera course: Retrieval Augmented Generation (RAG) You'll learn to build high performance, production-ready RAG systems in this hands-on, in-depth course created by and taught by Zain, experienced AI and ML engineer, researcher, and educator. RAG is a critical component today of many LLM-based applications in customer support, internal company Q&A systems, even many of the leading chatbots that use web search to answer your questions. This course teaches you in-depth how to make RAG work well. LLMs can produce generic or outdated responses, especially when asked specialized questions not covered in its training data. RAG is the most widely used technique for addressing this. It brings in data from new data sources, such as internal documents or recent news, to give the LLM the relevant context to private, recent, or specialized information. This lets it generate more grounded and accurate responses. In this course, you’ll learn to design and implement every part of a RAG system, from retrievers to vector databases to generation to evals. You’ll learn about the fundamental principles behind RAG and how to optimize it at both the component and whole-system levels. As AI evolves, RAG is evolving too. New models can handle longer context windows, reason more effectively, and can be parts of complex agentic workflows. One exciting growth area is Agentic RAG, in which an AI agent at runtime (rather than it being hardcoded at development time) autonomously decides what data to retrieve, and when/how to go deeper. Even with this evolution, access to high-quality data at runtime is essential, which is why RAG is a key part of so many applications. You'll learn via hands-on experiences to: - Build a RAG system with retrieval and prompt augmentation - Compare retrieval methods like BM25, semantic search, and Reciprocal Rank Fusion - Chunk, index, and retrieve documents using a Weaviate vector database and a news dataset - Develop a chatbot, using open-source LLMs hosted by Together AI, for a fictional store that answers product and FAQ questions - Use evals to drive improving reliability, and incorporate multi-modal data RAG is an important foundational technique. Become good at it through this course! Please sign up here:

Andrew Ng

124,458 просмотров • 11 месяцев назад

Most developers can't explain how Single Sign-On (SSO) works. ​ This was one of my favorite questions during technical interviews. I love to ask about it because it's not a trivial topic. ​ Here is a 5-minute overview of how Single Sign-On works. ​ We all hate passwords; the less we use them, the better, and SSO helps with that. ​ When you log in to Google once and visit YouTube, Gmail, Drive, and any other connected service without re-entering your password, three players are working behind the scenes: ​ • A user trying to access an application. You, in this case. • The application you want to access. For example, YouTube. • An Identity Provider (IDP) that will verify your identity. Google, in this case. ​ Here is what happens when you try to access one application for the first time: ​ 1. You try to log in to YouTube, and the application redirects you to the Identity Provider (IDP) for authentication. ​ 2. The IDP (Google) checks your credentials and confirms your identity. It creates a new session for you on its server and sets a session cookie in your browser. ​ 3. The IDP also creates a token for YouTube—a small piece of data that contains information about your identity. ​ 4. Your browser grabs the token and presents it to YouTube. ​ 5. YouTube checks the token, and if it is valid, lets you in. ​ But then you want to access Google Drive: ​ 1. You go to Google Drive, and the application redirects you to the IDP. ​ 2. The IDP recognizes that you are still logged in because you have the session cookie. It doesn't need to ask for your credentials. ​ 3. Instead, the IDP generates a new token for Drive. ​ 4. Your browser grabs the token and presents it to Google Drive. If the token is valid, Drive lets you in. ​ You can now access multiple applications without re-entering your password. This is probably one of the best things we've invented since sliced bread! ​ But, of course, implementing Single Sign-On is a nightmare! If you are a developer, don't try to reinvent the wheel. I've been implementing SSO since dinosaurs were around, and I can tell you you want to check out Auth0. ​ Auth0 makes implementing SSO 100x easier. They just updated their free plan, and you get a lot without having to pay a single cent. 25,000 monthly active users, unlimited social connections, and you can go to production with custom domains. FOR FREE! ​ They are sponsoring this post. To save your time, keep your sanity, and have a really solid and secure solution, head over to their website: ​

Santiago

204,826 просмотров • 1 год назад

99% of AI applications are cool-looking demos. Impressive, but don't get fooled by the hype. It takes a lot to build enterprise-grade products that deliver real value. I have at least three weekly conversations with companies that want to use a Large Language Model with their data. The demand is huge! Here is one idea about what you can do to help. The use cases that most of these companies want to solve are similar: They have an extensive knowledge base and want to build a simple application that uses that information to answer questions. In other words, they need help building Retrieval Augmented Generation (RAG) applications they can use in many different scenarios: 1. To train new employees 2. To help their support team 3. To search old meetings and documents 4. To help with their research However, building these systems is not straightforward. Yes, there's a lot of information online, but there aren't enough people who know how to create solutions that work. Here is the idea: Today, you can build an enterprise-grade RAG application without writing code. A couple of MIT PhDs with 10+ years of experience building AI applications created . It's a no-code platform for building applications using Large Language Models. They are partnering with me on this post. You can use Stack AI to create, test, and deploy an end-to-end production-ready AI system. It's SOC-2, HIPAA, and GDPR compliant and offers SSO, role management, access control, and on-premise deployments. Of course, you can use the platform with any LLM on the market now. It's the whole nine yards for building AI applications. Check them out here: 2023 was about models. 2024 is about the tools using these models to build production-ready applications. That's where I'd start.

Santiago

197,675 просмотров • 2 лет назад