正在加载视频...

视频加载失败

Mathematics offers a unique window into AI's reasoning capabilities. Discover why we've launched FrontierMath—a benchmark of hundreds of unpublished, expert-level math problems—to understand the frontier of artificial intelligence.

394,050 次观看 • 1 年前 •via X (Twitter)

10 条评论

Epoch AI 的头像
Epoch AI1 年前

Learn more about FrontierMath, explore sample problems with detailed solutions, and see how current AI systems perform on FrontierMath:

Daniel Paleka 的头像
Daniel Paleka1 年前

How many problems are there in the dataset? I guess about 290?

Tilman Bayer 的头像
Tilman Bayer1 年前

SOTA among math benchmarks... I see what you did there 😉

Alfred Wahlforss 的头像
Alfred Wahlforss1 年前

Go Elliot!

Fred Zhang 的头像
Fred Zhang1 年前

are problem statements formalized?

Charlie Snell 的头像
Charlie Snell1 年前

Banger

Luis Garicano 🇪🇺🇺🇦 的头像
Luis Garicano 🇪🇺🇺🇦1 年前

You did not try o1-preview?

João Augusto 的头像
João Augusto1 年前

That is awesome guys!!

YJxAI – e/acc 的头像
YJxAI – e/acc1 年前

IF Humans can take weeks to achieve the result. Should not o1 be given the same amount of time to think . Given its increase in performance by increasing Test time Compute

Jason Rute @ JMM 2025 的头像
Jason Rute @ JMM 20251 年前

What is the process by which researchers can submit models to be evaluated on this benchmark? Or are you only interested in leading foundation models? (But even then, you have to consider prompting strategies, no?)

相关视频