Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney... show more

Hao AI Lab

6,280 subscribers

999,231 Aufrufe • vor 1 Jahr •via X (Twitter)

Gaming Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

12 Kommentare

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

Phoenix Wright Ace Attorney is a popular visual novel known for its complex storytelling and courtroom drama. Like a detective novel, it challenges players to connect clues and evidence to expose contradictions and reveal the true culprit. In our setup, models are tested on the intense cross-examination stage. It must spot contradictions and present the correct evidence to challenge witness testimony. Each level grants 5 lives, allowing limited tolerance for mistakes.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

🔍 Interesting Findings: We tested 4 top AI multimodal models: O1, Gemini 2.5 Pro, Claude 3.7-thinking, and Llama-4 Maverick. 1. O1 and Gemini 2.5 Pro performed the best, both reaching Level 4 🏅. While neither managed to crack it, O1 had a slight edge over Gemini 2.5 in tackling the toughest cases. 2.GPT-4.1 showed similar performance to Claude 3.5. Despite reported gains over GPT-4o, in this task it’s only on par with older models.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

🧠 Task Analysis — Why It’s Hard: 1. Long-context Reasoning - Spot contradictions by cross-referencing with prior dialogue and evidence. 2. Visual Understanding - Identify the exact image that disproves false claims with precising grounding. 3. Strategic Decision-Making (Game Design) - Decide when to press, present evidence, or hold back - it’s not just about answers, but making the right move in a dynamic, evolving case. Thoughts: Game design pushes AI beyond pure textual and visual tasks by requiring it to convert understanding into context-aware actions. It is harder to overfit because success here demands reasoning over context-aware action space - not just memorization.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

When it comes to cost-efficiency, Gemini 2.5 Pro redefines the value.⚡️ With comparable performance, it’s 6 to 15 times cheaper than O1-2024-12-17, depending on the case.💸 Gemini 2.5 Pro is even slightly cheaper than GPT-4.1 ($1.25 vs $2.00 per 1M input tokens). In our table for models that passed Level 1, O1 made the fewest API calls but still had the highest cost. The call count reflects strategy, not reasoning strength, as models that dig deeper into testimony naturally trigger more requests. Beyond Level 1, as conversations get longer, O1’s cost skyrockets. 🚀 In Level 2, which is a really long case, O1 cost over $45.75, while Gemini 2.5 Pro handled it for $7.89. That’s a massive gap! 💸 Note: Gemini uses a built-in token counting method that treats all images as 258 tokens for gemini-2.5-pro model, so actual costs may be slightly higher. O1’s output may also be underestimated due to variability in its hidden reasoning content.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

We’re committed to building more transparent, robust, and innovative AI benchmarks and would love to hear your ideas. Drop your thoughts about games and evaluations below, we’re always open to new suggestions for advancing AI evaluation! 💡📊 Leaderboard: Github Repo: Official Website:

Profilbild von Parroted Words

Parroted Wordsvor 1 Jahr

If the fear of Yahweh is the beginning of wisdom, then what is its end? Cut through the abundant nonsense, empty platitudes, and conventional musings of the world and get straight to the heart of what it means to be wise.

Profilbild von Janek Mann

Janek Mannvor 1 Jahr

That’s a fun benchmark! Would be interesting to RL-tune a model on it.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

We are on it, stay tuned! 😃

Profilbild von Shailesh

Shaileshvor 1 Jahr

Is the code for the implementation available somewhere?

Profilbild von kfant

kfantvor 1 Jahr

code?

Profilbild von Seth Stafford

Seth Staffordvor 1 Jahr

Clever idea. But what I really need is an AI that can explain the plot of “The Big Sleep”. 😉

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

Interesting, curious to see what kinds of stories each model would come up with 🤔

Ähnliche Videos

Phoenix Wright Ace Attorney is a popular visual novel known for its complex storytelling and courtroom drama. Like a detective novel, it challenges players to connect clues and evidence to expose contradictions and reveal the true culprit. In our setup, models are tested on the intense cross-examination stage. It must spot contradictions and present the correct evidence to challenge witness testimony. Each level grants 5 lives, allowing limited tolerance for mistakes.

Phoenix Wright Ace Attorney is a popular visual novel known for its complex storytelling and courtroom drama. Like a detective novel, it challenges players to connect clues and evidence to expose contradictions and reveal the true culprit. In our setup, models are tested on the intense cross-examination stage. It must spot contradictions and present the correct evidence to challenge witness testimony. Each level grants 5 lives, allowing limited tolerance for mistakes.

Hao AI Lab

29,983 Aufrufe • vor 1 Jahr

🇺🇸 TULSI: WE HAVE MORE FILES COMING “President Trump wants us to find the truth. I want to find the truth. The American people deserve the truth, and they deserve accountability. Which is why I'm really glad to see that Attorney General Pam Bondi and the Department of Justice announced the creation of a strike force team today. Specifically to get after all the intelligence and evidence that we have gathered. We have more coming.” Source: DNI Tulsi Gabbard, Acyn, Jesse Watters

🇺🇸 TULSI: WE HAVE MORE FILES COMING “President Trump wants us to find the truth. I want to find the truth. The American people deserve the truth, and they deserve accountability. Which is why I'm really glad to see that Attorney General Pam Bondi and the Department of Justice announced the creation of a strike force team today. Specifically to get after all the intelligence and evidence that we have gathered. We have more coming.” Source: DNI Tulsi Gabbard, Acyn, Jesse Watters

Mario Nawfal

86,226 Aufrufe • vor 11 Monaten

If you're curious about the Ace Attorney franchise and haven't checked out Phoenix Wright: Ace Attorney Trilogy, let our "That's Wright! Ace Attorney TV" hosts tell you why it's a fantastic starting point to the epic courtroom battle series! ⚖️

If you're curious about the Ace Attorney franchise and haven't checked out Phoenix Wright: Ace Attorney Trilogy, let our "That's Wright! Ace Attorney TV" hosts tell you why it's a fantastic starting point to the epic courtroom battle series! ⚖️

Ace Attorney

84,195 Aufrufe • vor 1 Jahr

👾Are You Tired of the LIES? Swarm is here to uncover the TRUTH. The truth is in jeopardy... Truth seekers are silenced... Time to unleash the Swarm... 🔗Join at | Unlock the potential of collective human intelligence, predict the future, and earn together.

👾Are You Tired of the LIES? Swarm is here to uncover the TRUTH. The truth is in jeopardy... Truth seekers are silenced... Time to unleash the Swarm... 🔗Join at | Unlock the potential of collective human intelligence, predict the future, and earn together.

Swarm Network

1,024,709 Aufrufe • vor 1 Jahr

The Epstein Luciferian Connection. Everything leads back to child trafficking. It's a lot darker than you thought, but crucial to understanding the truth. Why do they require worship, and why are elites and high-level intelligence and private corporations involved in these practices? You will learn what they are doing and why. One of the most important videos to tie everything together and to expose the coming deception. This is the big picture, and extremely important to understanding the truth.

The Epstein Luciferian Connection. Everything leads back to child trafficking. It's a lot darker than you thought, but crucial to understanding the truth. Why do they require worship, and why are elites and high-level intelligence and private corporations involved in these practices? You will learn what they are doing and why. One of the most important videos to tie everything together and to expose the coming deception. This is the big picture, and extremely important to understanding the truth.

The SCIF

58,575 Aufrufe • vor 3 Monaten

Exposing Mayor Ashe's Lies: The Truth Behind Pickering's Latest Propaganda Stunt Watch me as I expose the lies and manipulation behind Mayor Ashe's and The Corporation of the City of Pickering's latest propaganda stunt. It's a bit long, but you won't want to miss the truth as I break down their Propaganda video piece by piece.

Exposing Mayor Ashe's Lies: The Truth Behind Pickering's Latest Propaganda Stunt Watch me as I expose the lies and manipulation behind Mayor Ashe's and The Corporation of the City of Pickering's latest propaganda stunt. It's a bit long, but you won't want to miss the truth as I break down their Propaganda video piece by piece.

Lisa Robinson

13,296 Aufrufe • vor 1 Jahr

See Google’s Gemini 2.5 Pro in action with the Box AI API! We put it to the test on complex invoice calculations and analyzing 10-K financial documents. Leverage Gemini 2.5 Pro via the Box AI API for advanced document processing, data extraction, and Q&A in your applications.

See Google’s Gemini 2.5 Pro in action with the Box AI API! We put it to the test on complex invoice calculations and analyzing 10-K financial documents. Leverage Gemini 2.5 Pro via the Box AI API for advanced document processing, data extraction, and Q&A in your applications.

Box

489,084 Aufrufe • vor 1 Jahr

To Be Hero X Episode 4 PV "Xiao Yueqing dies and comes back to life. Will Nice continue to become the perfect version of himself, as others have made him, or will he confront his heart and uncover the truth behind the lies?" Subbed by me #TOBEHEROX

To Be Hero X Episode 4 PV "Xiao Yueqing dies and comes back to life. Will Nice continue to become the perfect version of himself, as others have made him, or will he confront his heart and uncover the truth behind the lies?" Subbed by me #TOBEHEROX

rin | 蕊凜

99,043 Aufrufe • vor 1 Jahr

Ilya Sutskever says accurately predicting the next word leads to real understanding.

Ilya Sutskever says accurately predicting the next word leads to real understanding.

vitrupo

414,609 Aufrufe • vor 1 Monat

Empires have risen and fallen—but one truth has endured: The name is PERSIAN. The Gulf is ours. Not Arab. Not up for sale. Not negotiable for the comfort of petro-tyrants. President Donald J. Trump says he “has to make a decision,” we say: There’s nothing to decide. It’s a TEST! —-The cowards want to see if you even entertain this pan-Arabist propaganda by the same regimes that chant death to America behind closed doors. —-They want to see if we would trade 2,500 years of history and truth for 5 minutes of diplomacy? —-They hate strength and they want to see you flinch. America First doesn’t mean history last. It doesn’t bow to revisionist lies. It’s the PERSIAN GULF. And any true patriot would defend it. #PersianGulf #AmericaFirst #NoAppeasement #StandWithTruth

Empires have risen and fallen—but one truth has endured: The name is PERSIAN. The Gulf is ours. Not Arab. Not up for sale. Not negotiable for the comfort of petro-tyrants. President Donald J. Trump says he “has to make a decision,” we say: There’s nothing to decide. It’s a TEST! —-The cowards want to see if you even entertain this pan-Arabist propaganda by the same regimes that chant death to America behind closed doors. —-They want to see if we would trade 2,500 years of history and truth for 5 minutes of diplomacy? —-They hate strength and they want to see you flinch. America First doesn’t mean history last. It doesn’t bow to revisionist lies. It’s the PERSIAN GULF. And any true patriot would defend it. #PersianGulf #AmericaFirst #NoAppeasement #StandWithTruth

Gazelle غزاله شارمهد

22,119 Aufrufe • vor 1 Jahr

Unlock the mystery! Can you piece together the puzzle to reveal the rider on the horse? Test your skills and see if you can solve it!

Unlock the mystery! Can you piece together the puzzle to reveal the rider on the horse? Test your skills and see if you can solve it!

Bzhxyz

62,363 Aufrufe • vor 10 Monaten

Sam Altman says a line from Ilya Sutskever that stuck with me: "prediction is very close to intelligence" If a system can compress the world into a smaller representation and predict what comes next, it starts to understand the data in a deep way That's the bet behind generative models

Sam Altman says a line from Ilya Sutskever that stuck with me: "prediction is very close to intelligence" If a system can compress the world into a smaller representation and predict what comes next, it starts to understand the data in a deep way That's the bet behind generative models

Haider.

181,466 Aufrufe • vor 1 Monat

Today I questioned the Solicitor General about the Attorney General’s role in the collapse of the China spy prosecution. Throughout this saga there have been conflicting accounts about who did what, and why decisions were taken. The rules are clear, the Attorney General must be consulted on politically sensitive matters. That doesn’t compromise their independence, it ensures they are informed at the time. We understand from the Director of Public Prosecutions that the Attorney General must have been told there was insufficient evidence to prosecute. If that’s the case, the Attorney General’s duty was to warn the Government that unless they provided more evidence, the case would collapse. So my question: Did the Attorney General say to the Government, “You are not providing enough evidence to secure a prosecution. It is over to you to do as you have been asked by the DPP”? We deserve to know whether that warning was given, and if it was given, why it was ignored.

Today I questioned the Solicitor General about the Attorney General’s role in the collapse of the China spy prosecution. Throughout this saga there have been conflicting accounts about who did what, and why decisions were taken. The rules are clear, the Attorney General must be consulted on politically sensitive matters. That doesn’t compromise their independence, it ensures they are informed at the time. We understand from the Director of Public Prosecutions that the Attorney General must have been told there was insufficient evidence to prosecute. If that’s the case, the Attorney General’s duty was to warn the Government that unless they provided more evidence, the case would collapse. So my question: Did the Attorney General say to the Government, “You are not providing enough evidence to secure a prosecution. It is over to you to do as you have been asked by the DPP”? We deserve to know whether that warning was given, and if it was given, why it was ignored.

Iain Duncan Smith MP Chingford & Woodford Green

26,897 Aufrufe • vor 8 Monaten

When you finally understand the truth and what we’re dealing with now, you see that we have slowly been infiltrated over the years and the enemy has been within. If we hope to survive as the nation we love, we can’t turn a blind eye. We can’t look away. We have to act.

When you finally understand the truth and what we’re dealing with now, you see that we have slowly been infiltrated over the years and the enemy has been within. If we hope to survive as the nation we love, we can’t turn a blind eye. We can’t look away. We have to act.

Blonde of War (JJ)

103,042 Aufrufe • vor 4 Monaten

Agents - in 12 hours your duty as a ROAM agent begins when KARMA: The Dark World launches on Steam Are you ready to become a nightcrawler and uncover the truth hidden in the shadows of the mind? RT to spread the word of Leviathan 👁️

Agents - in 12 hours your duty as a ROAM agent begins when KARMA: The Dark World launches on Steam Are you ready to become a nightcrawler and uncover the truth hidden in the shadows of the mind? RT to spread the word of Leviathan 👁️

Karma: The Dark World

12,967 Aufrufe • vor 1 Jahr

The DOJ MUST Take Action And 'Do SOMETHING' To Regain The Trust Of The American People Former U.S. Attorney Brett Tolman calls for the next Attorney General to focus on accountability to show the people they hear them. A Special Counsel, prosecutions those on the left like Dr. Fauci, Adam Schiff and John Brennan — Some meaningful action: "I want to see is a powerful Attorney General come in who has a Deputy Attorney General that they have selected, that they work together, and you can turn that organization around... there has to be something where the American public sees the Department of Justice has heard them that they want accountability with those on the left that committed crimes"

The DOJ MUST Take Action And 'Do SOMETHING' To Regain The Trust Of The American People Former U.S. Attorney Brett Tolman calls for the next Attorney General to focus on accountability to show the people they hear them. A Special Counsel, prosecutions those on the left like Dr. Fauci, Adam Schiff and John Brennan — Some meaningful action: "I want to see is a powerful Attorney General come in who has a Deputy Attorney General that they have selected, that they work together, and you can turn that organization around... there has to be something where the American public sees the Department of Justice has heard them that they want accountability with those on the left that committed crimes"

Benny Johnson

50,559 Aufrufe • vor 2 Monaten

I Came to China to Uncover the Truth About Manufacturing 🇨🇳 If you want real answers, you go straight to the source. I’m in Foshan-the manufacturing capital of the world-to see what’s really going on behind the scenes. Over the next week, I’ll be showing you what I find.

I Came to China to Uncover the Truth About Manufacturing 🇨🇳 If you want real answers, you go straight to the source. I’m in Foshan-the manufacturing capital of the world-to see what’s really going on behind the scenes. Over the next week, I’ll be showing you what I find.

Earn Your Leisure

20,476 Aufrufe • vor 1 Jahr

Harvard Medical School's #1 longevity researcher named one of TIME's 100 most influential people in the world says he can reverse a 92-year-old's skin cells back to age 20: "We've been using mostly artificial intelligence to screen billions of molecules in silico." "We know OSK works, the gene therapy works and we're looking for molecules that do that." "We already have a proof of concept, a cocktail of molecules that we hope to put into a clinical trial into humans within the next couple of months." PS. If you found value in this post make sure to like and repost the first tweet + follow Uncover AI to stay updated with the latest AI news. See you in the next one:

Harvard Medical School's #1 longevity researcher named one of TIME's 100 most influential people in the world says he can reverse a 92-year-old's skin cells back to age 20: "We've been using mostly artificial intelligence to screen billions of molecules in silico." "We know OSK works, the gene therapy works and we're looking for molecules that do that." "We already have a proof of concept, a cocktail of molecules that we hope to put into a clinical trial into humans within the next couple of months." PS. If you found value in this post make sure to like and repost the first tweet + follow Uncover AI to stay updated with the latest AI news. See you in the next one:

Uncover AI

19,249 Aufrufe • vor 1 Monat

🚨 Elon Musk on xAI and Grok: "I came to realize that really there's two choices here, either be a spectator or or a participant. And if I'm a spectator, I can't really influence the direction of AI, but if I'm a participant, I can try to influence the direction of AI and have a maximally true seeking AI with with good values, that loves humanity, and that's what we're trying to create with Grok. Grok was the only one that weighted human life equally. You want a curious truth seeking AI. And I think a curious truth seeking AI will want to foster humanity. As long as there's at least one AI that is maximally truth seeking, it will cause embarrassment for the other AIs, and then they improve. They tend to improve just in the same way that acquiring Twitter and allowing the truth to be told and not suppressing the truth forced the other social media companies to be more truthful in the same way, having Grok be a maximally truth seeking, curious AI will force the other AI companies to be more truth seeking and fair."

🚨 Elon Musk on xAI and Grok: "I came to realize that really there's two choices here, either be a spectator or or a participant. And if I'm a spectator, I can't really influence the direction of AI, but if I'm a participant, I can try to influence the direction of AI and have a maximally true seeking AI with with good values, that loves humanity, and that's what we're trying to create with Grok. Grok was the only one that weighted human life equally. You want a curious truth seeking AI. And I think a curious truth seeking AI will want to foster humanity. As long as there's at least one AI that is maximally truth seeking, it will cause embarrassment for the other AIs, and then they improve. They tend to improve just in the same way that acquiring Twitter and allowing the truth to be told and not suppressing the truth forced the other social media companies to be more truthful in the same way, having Grok be a maximally truth seeking, curious AI will force the other AI companies to be more truth seeking and fair."

DogeDesigner

216,533 Aufrufe • vor 8 Monaten