Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

we are excited to launch an experimental API focused on data extraction today Induced - send a URL and natural language query - get structured data back no custom scraping scripts required. supports csv, json and markdown with more to come. free to use. examples below 👇

87,513 Aufrufe • vor 2 Jahren •via X (Twitter)

11 Kommentare

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

1/ our extraction API docs are live on you can receive your API key on every extraction request includes a URL and natural language query. you can optionally pass column names, output format and count of rows to be captured.

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

2/ once a request is sent, the API returns an ID for your extraction job. you can use the ID to poll status and get structured data output back when the job is completed. example: extracting all products on @producthunt with their name, maker and upvotes.

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

3/ this API is great for extracting structured data from unstructured web pages. - extract trending repositories from github. - extract most active stocks on google finance. - extract top 5 videos from youtube trending. 40-60 seconds per task on average.

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

4/ we don't handle pagination or authenticated pages yet - but we'll be releasing a more configurable version soon. browser agents are super powerful for data extraction tasks and we want to help more devs use them. please share feedback! discord:

Profilbild von Saurabh Kumar
Saurabh Kumarvor 2 Jahren

Really nice work. But, how do you do data validation, meaning, validating if it actually got the data you requested. I mean here there's a "name" field with "trending repos list", what if it fetched something like "popular repos" instead of trending, despite trending being available(but rather in a separate route/behind a click event). Data extraction has to be deterministic, cause the only thing you have to be absolutely sure about is data. N runs of the same script shouldn't also return N scraping outputs, as they can with stochastic embedding.

Profilbild von Alessio Fanelli
Alessio Fanellivor 2 Jahren

@inducedai @AlexReibman

Profilbild von Musthaq
Musthaqvor 2 Jahren

@inducedai I can see the video is actually clipped from the time it takes to process the request. Assuming you are launching a headless browser, capturing a screenshot, parsing the HTML, AI request to query it, how much time does it usually take to complete this request?

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

@inducedai 30-60s on avg, sometimes more depending on the data. but we run this is as an async process so you can poll for completion status instead of waiting.

Profilbild von Harsh Agrawal | itsharshag.com
Harsh Agrawal | itsharshag.comvor 2 Jahren

@inducedai can we do this with PDFs?

Profilbild von calix
calixvor 2 Jahren

@inducedai love this

Profilbild von aryan sharma
aryan sharmavor 2 Jahren

@inducedai thanks calix!

Ähnliche Videos