正在加载视频...
视频加载失败
Building a RAG system that works with real-life documents is crazy hard. Why is nobody talking about this? (I'll show you what a complex document looks like in the attached video. Good luck with the 10-line code demos if you want to deal with this.) All I see online... show more
10 条评论

have you seen looks similar and promising?

I haven't seen it, no. Thanks for sharing it!

We agree. That's why we build @trustgraphAI. TrustGraph is a full model agnostic AI Engine with native GraphRAG. No need to build anything. Deploys with Docker or Kubernetes.

I thought that building RAG systems is easy until I tried to build one. If you want it to work well it is really hard.

Yeah, building a demo is very simple. You can understand the basic components in a few minutes. But doing anything serious with it is really complex.

Really amazing! How can one find the documention to install it in your own Kubernetes cluster and run it locally? I've only been able to find the SaaS option...

The problem is, in the most of the enterprise use cases, where the data is highly regulated, they do not approve open source frameworks.

Actually, I've seen two different approaches: 1. Enterprises that don't approve closed-systems running outside their premises. They would rather use open-source. 2. Enterprises that don't approve open-source software that doesn't come with certain guarantees or is too complex to set up.

Or just pass your pdf's to gemini files API. Is working very well for me with 15 documents, with 10 to 165 pages each, all in a single prompt. Gemini finds and extracts multiple small details and builds and returns a compliant JSON response.

If this is working for you, that's awesome. But in many cases, relying on the output of the model is not enough to build a reliable retriever. Notice that here, the process is not just about "understanding the document," but augmenting it with contextual information as well.
