正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Mathematics offers a unique window into AI's reasoning capabilities. Discover why we've launched FrontierMath—a benchmark of hundreds of unpublished, expert-level math problems—to understand the frontier of artificial intelligence.

Epoch AI

47,066 subscribers

394,050 次观看 • 1 年前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

10 条评论

Epoch AI 的头像

Epoch AI1 年前

Learn more about FrontierMath, explore sample problems with detailed solutions, and see how current AI systems perform on FrontierMath:

Daniel Paleka 的头像

Daniel Paleka1 年前

How many problems are there in the dataset? I guess about 290?

Tilman Bayer 的头像

Tilman Bayer1 年前

SOTA among math benchmarks... I see what you did there 😉

Alfred Wahlforss 的头像

Alfred Wahlforss1 年前

Go Elliot!

Fred Zhang 的头像

Fred Zhang1 年前

are problem statements formalized?

Charlie Snell 的头像

Charlie Snell1 年前

Banger

Luis Garicano 🇪🇺🇺🇦 的头像

Luis Garicano 🇪🇺🇺🇦1 年前

You did not try o1-preview?

João Augusto 的头像

João Augusto1 年前

That is awesome guys!!

YJxAI – e/acc 的头像

YJxAI – e/acc1 年前

IF Humans can take weeks to achieve the result. Should not o1 be given the same amount of time to think . Given its increase in performance by increasing Test time Compute

Jason Rute @ JMM 2025 的头像

Jason Rute @ JMM 20251 年前

What is the process by which researchers can submit models to be evaluated on this benchmark? Or are you only interested in leading foundation models? (But even then, you have to consider prompting strategies, no?)

相关视频

This week, OpenAI launched the o3 and o4-mini reasoning models, pushing the boundaries of logic, math, and coding capabilities in AI. Learn more about o3's strengths, and its potential to transform unstructured enterprise data, from Box AI's Sidharth Srinivasan.

This week, OpenAI launched the o3 and o4-mini reasoning models, pushing the boundaries of logic, math, and coding capabilities in AI. Learn more about o3's strengths, and its potential to transform unstructured enterprise data, from Box AI's Sidharth Srinivasan.

Box

615,609 次观看 • 1 年前

"It’s not AI; it’s AC; Artificial Cleverness!" Sir Roger Penrose explains why current "Artificial Intelligence" is missing the most critical ingredient: consciousness. By delving into the limits of mathematical computability, he argues that while computers are masters of "computational mathematics," true intelligence requires a conscious understanding that exists beyond any algorithm. At the fundamental level, what we call AI is just high-speed calculation. To Penrose, intelligence and consciousness are inseparable; and until a machine can "understand," it’s simply being clever.

"It’s not AI; it’s AC; Artificial Cleverness!" Sir Roger Penrose explains why current "Artificial Intelligence" is missing the most critical ingredient: consciousness. By delving into the limits of mathematical computability, he argues that while computers are masters of "computational mathematics," true intelligence requires a conscious understanding that exists beyond any algorithm. At the fundamental level, what we call AI is just high-speed calculation. To Penrose, intelligence and consciousness are inseparable; and until a machine can "understand," it’s simply being clever.

Cosmos Archive

82,942 次观看 • 3 个月前

Does AI have a plausibility problem? Or rather, do we have a problem with AI's plausibility? Watch Terry Tao's full Oxford Mathematics Public Lecture (supported by XTX Markets) as he suggests new ways of doing mathematics in an artificial intelligence future.

Does AI have a plausibility problem? Or rather, do we have a problem with AI's plausibility? Watch Terry Tao's full Oxford Mathematics Public Lecture (supported by XTX Markets) as he suggests new ways of doing mathematics in an artificial intelligence future.

Oxford Mathematics

31,780 次观看 • 1 年前

.Bret Baier tests out the many creative capabilities of artificial intelligence — including creating a rap song

.Bret Baier tests out the many creative capabilities of artificial intelligence — including creating a rap song

Fox News

84,839 次观看 • 11 个月前

A month ago, we invited 30 of the world’s top mathematicians to Berkeley for a weekend to finish a very hard math exam. The 2025 FrontierMath Symposium wrapped up the hardest tier of FrontierMath, our benchmark for AI’s math abilities. The mathematicians tested AI models on their most challenging math problems, and discussed AI’s future in math. Watch our closing ceremony footage for a glimpse behind the scenes. -Timestamps- (00:00) - Intro by Elliot Glazer - Epoch AI (03:38) - Topology | Sergei Gukov - Caltech (06:53) - Algebraic Geometry | Ravi Vakil - Stanford (09:10) - Number Theory | Ken Ono - University of Virginia (13:19) - Combinatorics | Igor Pak - UCLA (16:54) - Analysis | Paata Ivanisvili - UC Irvine (18:18) - Closing Remarks

A month ago, we invited 30 of the world’s top mathematicians to Berkeley for a weekend to finish a very hard math exam. The 2025 FrontierMath Symposium wrapped up the hardest tier of FrontierMath, our benchmark for AI’s math abilities. The mathematicians tested AI models on their most challenging math problems, and discussed AI’s future in math. Watch our closing ceremony footage for a glimpse behind the scenes. -Timestamps- (00:00) - Intro by Elliot Glazer - Epoch AI (03:38) - Topology | Sergei Gukov - Caltech (06:53) - Algebraic Geometry | Ravi Vakil - Stanford (09:10) - Number Theory | Ken Ono - University of Virginia (13:19) - Combinatorics | Igor Pak - UCLA (16:54) - Analysis | Paata Ivanisvili - UC Irvine (18:18) - Closing Remarks

Epoch AI

21,379 次观看 • 1 年前

Here's why the future of artificial intelligence may be less frightening than we've been led to believe, despite the uncertainty.

Here's why the future of artificial intelligence may be less frightening than we've been led to believe, despite the uncertainty.

Dinesh D'Souza

73,126 次观看 • 1 年前

📁 Yann LeCun, an expert in artificial intelligence and a pioneer of deep learning, explains that current systems do not understand the physical world and lack real memory or reasoning. He adds that, despite the vast resources spent on training and refining them, they remain far behind the learning ability of a four-year-old child.

📁 Yann LeCun, an expert in artificial intelligence and a pioneer of deep learning, explains that current systems do not understand the physical world and lack real memory or reasoning. He adds that, despite the vast resources spent on training and refining them, they remain far behind the learning ability of a four-year-old child.

Jon Hernandez

24,677 次观看 • 8 个月前

Terence Tao says AI could help rework mathematics from the ground up, doing math problems at a scale never done before and leading to a new age of discovery

Terence Tao says AI could help rework mathematics from the ground up, doing math problems at a scale never done before and leading to a new age of discovery

Tsarathustra

63,759 次观看 • 1 年前

“AI can not crack bitcoin”, says @ElonMusk “We see no way for artificial intelligence to subvert the fundamentals of mathematics and hash bitcoin easily. AI can not defy fundamental math. It can improve the efficiency of hashing, but not crack it”

“AI can not crack bitcoin”, says @ElonMusk “We see no way for artificial intelligence to subvert the fundamentals of mathematics and hash bitcoin easily. AI can not defy fundamental math. It can improve the efficiency of hashing, but not crack it”

Documenting ₿itcoin 📄

308,066 次观看 • 1 年前

I launched my High-Level Advisory Body on Artificial Intelligence. This is the starting point for a global conversation on the governance of AI, so that its benefits to all of humanity are maximized, and the risks contained & diminished.

I launched my High-Level Advisory Body on Artificial Intelligence. This is the starting point for a global conversation on the governance of AI, so that its benefits to all of humanity are maximized, and the risks contained & diminished.

António Guterres

345,925 次观看 • 2 年前

We’re pushing the boundaries of intelligence even further with Gemini 3 Deep Think. 🧠 This mode meaningfully improves reasoning capabilities by exploring many hypotheses simultaneously to solve problems. Here’s how it coded a simulated dominoes game from a single prompt ⬇️

We’re pushing the boundaries of intelligence even further with Gemini 3 Deep Think. 🧠 This mode meaningfully improves reasoning capabilities by exploring many hypotheses simultaneously to solve problems. Here’s how it coded a simulated dominoes game from a single prompt ⬇️

Google DeepMind

247,871 次观看 • 7 个月前

🇯🇵 MASAYOSHI SON: AGI IS SAME LEVEL AS A HUMAN BRAIN “AGI definition is same level as a human brain. That's A GI. Artificial general intelligence. But people have a different point of view definition of artificial super intelligence.” Source: Visionary Minds

🇯🇵 MASAYOSHI SON: AGI IS SAME LEVEL AS A HUMAN BRAIN “AGI definition is same level as a human brain. That's A GI. Artificial general intelligence. But people have a different point of view definition of artificial super intelligence.” Source: Visionary Minds

Mario Nawfal

62,169 次观看 • 1 年前

#WATCH | Maharashtra: In a bid to improve the nutrition level of tribal children of Gadchiroli, a unique Artificial Intelligence-based machine has been installed at Todsa Ashram School of Etapalli. The machine takes a photo of the student with her/his plate of food and within a few seconds, without any human intervention, identifies whether the quality of the food is good.

#WATCH | Maharashtra: In a bid to improve the nutrition level of tribal children of Gadchiroli, a unique Artificial Intelligence-based machine has been installed at Todsa Ashram School of Etapalli. The machine takes a photo of the student with her/his plate of food and within a few seconds, without any human intervention, identifies whether the quality of the food is good.

ANI

1,494,674 次观看 • 3 年前

A beautiful proof of the area of a circle 🚀😍 #math #mathematics #mathematician #circle #science #geometry #mathvideos

A beautiful proof of the area of a circle 🚀😍 #math #mathematics #mathematician #circle #science #geometry #mathvideos

Matematickcom

26,917 次观看 • 2 年前

Visual Proof of (a+b)² = a² + b² + 2ab #math #mathematics #mathskills

Visual Proof of (a+b)² = a² + b² + 2ab #math #mathematics #mathskills

Infinite Logiz

960,162 次观看 • 1 个月前

Pope Leo XIV calls for artificial intelligence to be immediately disarmed and compares it to a nuclear missile. He says artificial intelligence is turning into an instrument of death that seeks to dominate the lives of all humans. “An instrument of domination.”

Pope Leo XIV calls for artificial intelligence to be immediately disarmed and compares it to a nuclear missile. He says artificial intelligence is turning into an instrument of death that seeks to dominate the lives of all humans. “An instrument of domination.”

Shadow of Ezra

130,922 次观看 • 1 个月前

"For it to NOT be a bubble... requires that the AI companies make hundreds of billions of dollars a year, in the next 24 months. That would be the fastest growing business in history." —Derek Thompson on why "there's no off-ramp" for the impact of artificial intelligence

"For it to NOT be a bubble... requires that the AI companies make hundreds of billions of dollars a year, in the next 24 months. That would be the fastest growing business in history." —Derek Thompson on why "there's no off-ramp" for the impact of artificial intelligence

Pablo Torre Finds Out

73,775 次观看 • 4 个月前

Owners of some iPhones are in line to get cash payments of up to $95 from Apple after the company on Tuesday reached a $250 million settlement in a class-action lawsuit for false advertising of its artificial intelligence capabilities.

Owners of some iPhones are in line to get cash payments of up to $95 from Apple after the company on Tuesday reached a $250 million settlement in a class-action lawsuit for false advertising of its artificial intelligence capabilities.

The Associated Press

135,714 次观看 • 1 个月前