Loading video...

Video Failed to Load

Go Home

we are excited to launch an experimental API focused on data extraction today Induced - send a URL and natural language query - get structured data back no custom scraping scripts required. supports csv, json and markdown with more to come. free to use. examples below 👇

87,513 views • 2 years ago •via X (Twitter)

11 Comments

aryan sharma's profile picture
aryan sharma2 years ago

1/ our extraction API docs are live on you can receive your API key on every extraction request includes a URL and natural language query. you can optionally pass column names, output format and count of rows to be captured.

aryan sharma's profile picture
aryan sharma2 years ago

2/ once a request is sent, the API returns an ID for your extraction job. you can use the ID to poll status and get structured data output back when the job is completed. example: extracting all products on @producthunt with their name, maker and upvotes.

aryan sharma's profile picture
aryan sharma2 years ago

3/ this API is great for extracting structured data from unstructured web pages. - extract trending repositories from github. - extract most active stocks on google finance. - extract top 5 videos from youtube trending. 40-60 seconds per task on average.

aryan sharma's profile picture
aryan sharma2 years ago

4/ we don't handle pagination or authenticated pages yet - but we'll be releasing a more configurable version soon. browser agents are super powerful for data extraction tasks and we want to help more devs use them. please share feedback! discord:

Saurabh Kumar's profile picture
Saurabh Kumar2 years ago

Really nice work. But, how do you do data validation, meaning, validating if it actually got the data you requested. I mean here there's a "name" field with "trending repos list", what if it fetched something like "popular repos" instead of trending, despite trending being available(but rather in a separate route/behind a click event). Data extraction has to be deterministic, cause the only thing you have to be absolutely sure about is data. N runs of the same script shouldn't also return N scraping outputs, as they can with stochastic embedding.

Alessio Fanelli's profile picture
Alessio Fanelli2 years ago

@inducedai @AlexReibman

Musthaq's profile picture
Musthaq2 years ago

@inducedai I can see the video is actually clipped from the time it takes to process the request. Assuming you are launching a headless browser, capturing a screenshot, parsing the HTML, AI request to query it, how much time does it usually take to complete this request?

aryan sharma's profile picture
aryan sharma2 years ago

@inducedai 30-60s on avg, sometimes more depending on the data. but we run this is as an async process so you can poll for completion status instead of waiting.

Harsh Agrawal | itsharshag.com's profile picture
Harsh Agrawal | itsharshag.com2 years ago

@inducedai can we do this with PDFs?

calix's profile picture
calix2 years ago

@inducedai love this

aryan sharma's profile picture
aryan sharma2 years ago

@inducedai thanks calix!

Related Videos