Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Why aren't more people talking about how difficult it is to turn documents into structured data? This is literally a problem that every single company I talk to is trying to solve. They have a buttload of documents with forms and tables, and they want to turn them into...

46,405 Aufrufe • vor 1 Jahr •via X (Twitter)

10 Kommentare

Profilbild von Santiago
Santiagovor 1 Jahr

There are two awesome things here: First, you can use @tensorlake's Document Ingestion API to process all your files. They tell me they have a 98% - 99% accuracy processing insurance and bank documents, which are usually a nightmare. Second (and this is what I love the most), you can turn those tables and forms into structured data. You start with the image of a form and end with a JSON file containing the information you wanted to extract. You should definitely try this out: 1. Go to 2. Sign up 3. Try your documents in the playground No credit card required for any of this, and you have plenty of credits to try.

Profilbild von ibexdream
ibexdreamvor 1 Jahr

Companies sit on goldmines of PDFs, scans, and form, but can’t extract value without serious effort. Tools like @tensorlake are game-changers for operational intelligence.

Profilbild von OJ
OJvor 1 Jahr

My guesstimation is because most people in/around AI seem to have very little experience, in general, working at/with (big) companies. Therefore they know very little about the biz reality and the main challenges/how things actually work.

Profilbild von Santiago
Santiagovor 1 Jahr

Yeah, this is accurate.

Profilbild von Santiago
Santiagovor 1 Jahr

Not even close. I'm curious, why do you think this is solved by calling a model?

Profilbild von Piyush
Piyushvor 1 Jahr

I work with clients on large volume of documents in high stake fields (healthcare, real estate, finance, etc) and where AWS textract wins is explicit provision of confidence scores which helps to ensure only high-reliability data is used in downstream processes. Will check out tensorlake too for low-stake documents if the pricing makes sense.

Profilbild von λthugg-huh?
λthugg-huh?vor 1 Jahr

do you do much besides shilling stuff? seriously all i see is you recommending a tool thats gonna change the whole game and obsolete the tool you shilled last week

Profilbild von Santiago
Santiagovor 1 Jahr

I'd suggest you stop following me if my posts bother you too much.

Profilbild von Neel Das
Neel Dasvor 1 Jahr

This problem is more or less solved now with a simple API call to any of the reasoning models

Profilbild von Michael | Æ
Michael | Ævor 1 Jahr

So true. This is such a huge challenge for so many teams. Thanks for sharing a solution that works.

Ähnliche Videos

Here is how you can install an open-source, enterprise-grade RAG system on your server (with the best document understanding I've seen.) First, something obvious to anyone trying to sell RAG in the market: You are crazy if you think companies will let their data travel to a hosted model. No one wants to send their data anywhere (those who do haven't found an alternative.) Every single company would rather have an air-gapped system with no internet access. GroundX is an open-source RAG system that you can run on your servers (or any cloud provider, as long as you have access to GPUs) and works without a network. (If the military wants to do RAG, this is precisely what they will be looking for.) I installed GroundX on my AWS account and recorded a video to show you how to use it. There are two services you can use: 1. Ingest: This service uses a pretrained vision model to ingest and understand your knowledge base. 2. Search: This service combines text and vector search with a fine-tuned re-ranker model to retrieve information from your knowledge base. A quick note about the Ingest service: 99% of people think they need better "retrieval" mechanisms. I think they need better "ingestion." That's where this service comes in! Ingest "understands" your documents in a way I haven't seen before. After you try it, you'll realize why showing your LLM your raw documents is a bad idea. In the video, I use a free tool called X-Ray to test a document and understand how the Ingest service breaks it down. You can access this tool by signing up for a free GroundX cloud account and uploading your documents. You'll see a bit more about this in the video.

Santiago

89,624 Aufrufe • vor 1 Jahr