正在加载视频...

视频加载失败

Why aren't more people talking about how difficult it is to turn documents into structured data? This is literally a problem that every single company I talk to is trying to solve. They have a buttload of documents with forms and tables, and they want to turn them into...

46,405 次观看 • 1 年前 •via X (Twitter)

10 条评论

Santiago 的头像
Santiago1 年前

There are two awesome things here: First, you can use @tensorlake's Document Ingestion API to process all your files. They tell me they have a 98% - 99% accuracy processing insurance and bank documents, which are usually a nightmare. Second (and this is what I love the most), you can turn those tables and forms into structured data. You start with the image of a form and end with a JSON file containing the information you wanted to extract. You should definitely try this out: 1. Go to 2. Sign up 3. Try your documents in the playground No credit card required for any of this, and you have plenty of credits to try.

ibexdream 的头像
ibexdream1 年前

Companies sit on goldmines of PDFs, scans, and form, but can’t extract value without serious effort. Tools like @tensorlake are game-changers for operational intelligence.

OJ 的头像
OJ1 年前

My guesstimation is because most people in/around AI seem to have very little experience, in general, working at/with (big) companies. Therefore they know very little about the biz reality and the main challenges/how things actually work.

Santiago 的头像
Santiago1 年前

Yeah, this is accurate.

Santiago 的头像
Santiago1 年前

Not even close. I'm curious, why do you think this is solved by calling a model?

Piyush 的头像
Piyush1 年前

I work with clients on large volume of documents in high stake fields (healthcare, real estate, finance, etc) and where AWS textract wins is explicit provision of confidence scores which helps to ensure only high-reliability data is used in downstream processes. Will check out tensorlake too for low-stake documents if the pricing makes sense.

λthugg-huh? 的头像
λthugg-huh?1 年前

do you do much besides shilling stuff? seriously all i see is you recommending a tool thats gonna change the whole game and obsolete the tool you shilled last week

Santiago 的头像
Santiago1 年前

I'd suggest you stop following me if my posts bother you too much.

Neel Das 的头像
Neel Das1 年前

This problem is more or less solved now with a simple API call to any of the reasoning models

Michael | Æ 的头像
Michael | Æ1 年前

So true. This is such a huge challenge for so many teams. Thanks for sharing a solution that works.

相关视频

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,624 次观看 • 1 年前