Loading video...

Video Failed to Load

Go Home

Run evals—directly from the OpenAI dashboard. Use your test data to compare model performance, iterate on prompts, and improve outputs. Here's a quick walkthrough:

85,906 views • 1 year ago •via X (Twitter)

10 Comments

alex fazio's profile picture
alex fazio1 year ago

just a chill guy k-lling thousands of startups with a single post

Markus Odenthal's profile picture
Markus Odenthal1 year ago

This is pretty cool! Most people miss the evaluation steps. But it's actually crucial.

Quintus 🏛️'s profile picture
Quintus 🏛️1 year ago

Can you let us export the eval to a .csv?

Prompt Perfect's profile picture
Prompt Perfect1 year ago

this is a healthy post.

william's profile picture
william1 year ago

eval api wen?

Diego | AI 🚀 - e/acc's profile picture
Diego | AI 🚀 - e/acc1 year ago

🧪 What Are OpenAI Evals? OpenAI Evals is a tool to test and measure how well AI models perform specific tasks. Think of it like a report card for AI—researchers and developers use it to ensure the model gives accurate and helpful answers before it’s used in the real world. 📊🤖

TestingCatalog News 🗞's profile picture
TestingCatalog News 🗞1 year ago

I think now we will need a way to run these on CI 😋

Jim Hull's profile picture
Jim Hull1 year ago

This is same as it has been for two months right? Took me a bit to wrap my head around it, but now i want to eval EVERYTHING! Great feature.

Chase Brower's profile picture
Chase Brower1 year ago

The factuality passing grades seems to be mixed up somewhere in the pipeline, it gives a result (A) matching what I selected, but then it says fail. I found if I check the 'Response disagrees with the ground truth' (and only that) then it says success.

Soumyajit's profile picture
Soumyajit1 year ago

This is a great feature.

Related Videos