正在加载视频...
视频加载失败
Why aren't more people talking about how difficult it is to turn documents into structured data? This is literally a problem that every single company I talk to is trying to solve. They have a buttload of documents with forms and tables, and they want to turn them into... show more
10 条评论

There are two awesome things here: First, you can use @tensorlake's Document Ingestion API to process all your files. They tell me they have a 98% - 99% accuracy processing insurance and bank documents, which are usually a nightmare. Second (and this is what I love the most), you can turn those tables and forms into structured data. You start with the image of a form and end with a JSON file containing the information you wanted to extract. You should definitely try this out: 1. Go to 2. Sign up 3. Try your documents in the playground No credit card required for any of this, and you have plenty of credits to try.

Companies sit on goldmines of PDFs, scans, and form, but can’t extract value without serious effort. Tools like @tensorlake are game-changers for operational intelligence.

My guesstimation is because most people in/around AI seem to have very little experience, in general, working at/with (big) companies. Therefore they know very little about the biz reality and the main challenges/how things actually work.

Yeah, this is accurate.

Not even close. I'm curious, why do you think this is solved by calling a model?

I work with clients on large volume of documents in high stake fields (healthcare, real estate, finance, etc) and where AWS textract wins is explicit provision of confidence scores which helps to ensure only high-reliability data is used in downstream processes. Will check out tensorlake too for low-stake documents if the pricing makes sense.

do you do much besides shilling stuff? seriously all i see is you recommending a tool thats gonna change the whole game and obsolete the tool you shilled last week

I'd suggest you stop following me if my posts bother you too much.

This problem is more or less solved now with a simple API call to any of the reasoning models

So true. This is such a huge challenge for so many teams. Thanks for sharing a solution that works.

