Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

The best programmers don't write code anymore. They write a spec. A test. An eval. T Then they turn the AI on and walk away for hours. It runs. It ships. They review. Evals are the unlock most devs are sleeping on. Here's how top agent teams build them right.

690,326 Aufrufe • vor 2 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

I don't think most PMs realize the PRD is becoming obsolete. For the last decade, the PM's core artifact was a qualitative spec. Clear requirements, user stories, acceptance criteria. The engineering team interpreted it, built something close, and the PM spent two weeks reconciling what shipped with what they wrote. The best AI companies replaced that entire loop with evals. A set of inputs your product needs to handle. A task that generates outputs. A scoring function that produces a number between 0 and 1. No ambiguity. No interpretation gap. Ankur Goyal built the eval platform behind Vercel, Replit, Ramp, Notion, and Airtable. An $800M company. He walked through building an eval from zero on this episode and the score went from 0 to 0.75 in under 20 minutes. That's a PM shipping a measurable quality bar before a single line of product code exists. Here's the part that changes the PM role permanently. When the product passes the eval and users still hate it, the eval is wrong. That's on the PM. Evals make PM judgment quantifiable in a way PRDs never did. You can't hide behind "the spec was ambiguous." There's a number now. Six months ago, PM interviews asked "how do you use AI in your workflow." The next wave of interviews is going to ask you to write an eval. The PMs who can encode user intent as a scoring function are building the one skill that survives every model change, every framework swap, every agent rewrite. Write the eval.

Aakash Gupta

78,139 Aufrufe • vor 2 Monaten