Leonard Tang's banner

Leonard Tang

@leonardtang_ • 3,654 subscribers

co-founder & ceo @haizelabs

Shorts

First came pre-training scaling; then came inference-time scaling. Now comes judge-time scaling. Despite progress in AI through scaled inference-time compute, AI remains unreliable in open-ended, non-verifiable domains. The key limitation is not generation—it is evaluation. Therefore, the next big leap for AI comes from better judging. In service of this future, today we release Verdict, a library for scaling judge-time compute.

First came pre-training scaling; then came inference-time scaling. Now comes judge-time scaling. Despite progress in AI through scaled inference-time compute, AI remains unreliable in open-ended, non-verifiable domains. The key limitation is not generation—it is evaluation. Therefore, the next big leap for AI comes from better judging. In service of this future, today we release Verdict, a library for scaling judge-time compute.

111,291 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

How do we understand & evaluate the fuzzy space of LLM outputs? We clone your Subject Matter Expert annotator into a Judge. Introducing EVALS EVALS EVALS Create a custom Judge that works for you

How do we understand & evaluate the fuzzy space of LLM outputs? We clone your Subject Matter Expert annotator into a Judge. Introducing EVALS EVALS EVALS Create a custom Judge that works for you

35,871 просмотров • 1 год назад

Больше нет контента для загрузки