Loading video...
Video Failed to Load
Run evals—directly from the OpenAI dashboard. Use your test data to compare model performance, iterate on prompts, and improve outputs. Here's a quick walkthrough:
85,906 views • 1 year ago •via X (Twitter)
10 Comments

just a chill guy k-lling thousands of startups with a single post

This is pretty cool! Most people miss the evaluation steps. But it's actually crucial.

Can you let us export the eval to a .csv?

this is a healthy post.

eval api wen?

🧪 What Are OpenAI Evals? OpenAI Evals is a tool to test and measure how well AI models perform specific tasks. Think of it like a report card for AI—researchers and developers use it to ensure the model gives accurate and helpful answers before it’s used in the real world. 📊🤖

I think now we will need a way to run these on CI 😋

This is same as it has been for two months right? Took me a bit to wrap my head around it, but now i want to eval EVERYTHING! Great feature.

The factuality passing grades seems to be mixed up somewhere in the pipeline, it gives a result (A) matching what I selected, but then it says fail. I found if I check the 'Response disagrees with the ground truth' (and only that) then it says success.

This is a great feature.

