Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Phoenix Wright Ace Attorney is a popular visual novel known for its complex storytelling and courtroom drama. Like a detective novel, it challenges players to connect clues and evidence to expose contradictions and reveal the true culprit. In our setup, models are tested on the intense cross-examination stage. It... show more

Hao AI Lab

3,333 subscribers

29,983 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

7 Kommentare

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

🔍 Interesting Findings: We tested 4 top AI multimodal models: O1, Gemini 2.5 Pro, Claude 3.7-thinking, and Llama-4 Maverick. 1. O1 and Gemini 2.5 Pro performed the best, both reaching Level 4 🏅. While neither managed to crack it, O1 had a slight edge over Gemini 2.5 in tackling the toughest cases. 2.GPT-4.1 showed similar performance to Claude 3.5. Despite reported gains over GPT-4o, in this task it’s only on par with older models.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

🧠 Task Analysis — Why It’s Hard: 1. Long-context Reasoning - Spot contradictions by cross-referencing with prior dialogue and evidence. 2. Visual Understanding - Identify the exact image that disproves false claims with precising grounding. 3. Strategic Decision-Making (Game Design) - Decide when to press, present evidence, or hold back - it’s not just about answers, but making the right move in a dynamic, evolving case. Thoughts: Game design pushes AI beyond pure textual and visual tasks by requiring it to convert understanding into context-aware actions. It is harder to overfit because success here demands reasoning over context-aware action space - not just memorization.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

When it comes to cost-efficiency, Gemini 2.5 Pro redefines the value.⚡️ With comparable performance, it’s 6 to 15 times cheaper than O1-2024-12-17, depending on the case.💸 Gemini 2.5 Pro is even slightly cheaper than GPT-4.1 ($1.25 vs $2.00 per 1M input tokens). In our table for models that passed Level 1, O1 made the fewest API calls but still had the highest cost. The call count reflects strategy, not reasoning strength, as models that dig deeper into testimony naturally trigger more requests. Beyond Level 1, as conversations get longer, O1’s cost skyrockets. 🚀 In Level 2, which is a really long case, O1 cost over $45.75, while Gemini 2.5 Pro handled it for $7.89. That’s a massive gap! 💸 Note: Gemini uses a built-in token counting method that treats all images as 258 tokens for gemini-2.5-pro model, so actual costs may be slightly higher. O1’s output may also be underestimated due to variability in its hidden reasoning content.

Profilbild von Hao AI Lab

Hao AI Labvor 1 Jahr

We’re committed to building more transparent, robust, and innovative AI benchmarks and would love to hear your ideas. Drop your thoughts about games and evaluations below, we’re always open to new suggestions for advancing AI evaluation! 💡📊 Leaderboard: Github Repo: Official Website:

Profilbild von ASTROMEDA - Wishlist on Steam!

ASTROMEDA - Wishlist on Steam!vor 2 Jahren

A small teaser of Astromeda! It's a game inspired by Pokemon, Undertale, and OneShot. Wishlist on Steam! #TrailerTuesday #indiegame #IndieGameDev #GamingNews #Steam

Profilbild von Inspect Element Capital

Inspect Element Capitalvor 1 Jahr

Isn't most of the game already in the training data?

Ähnliche Videos

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️‍♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.

Hao AI Lab

998,910 Aufrufe • vor 1 Jahr

If you're curious about the Ace Attorney franchise and haven't checked out Phoenix Wright: Ace Attorney Trilogy, let our "That's Wright! Ace Attorney TV" hosts tell you why it's a fantastic starting point to the epic courtroom battle series! ⚖️

If you're curious about the Ace Attorney franchise and haven't checked out Phoenix Wright: Ace Attorney Trilogy, let our "That's Wright! Ace Attorney TV" hosts tell you why it's a fantastic starting point to the epic courtroom battle series! ⚖️

Ace Attorney

84,195 Aufrufe • vor 1 Jahr

I believe it is time to step in. our first Soap Of The Day is a soap referencing a popular drawing of the ship containing Phoenix Wright and Miles Edgeworth from Ace Attorney, known in the Japanese fandom as NaruMitsu. Vote for them in the most influential yaoi poll down here.

I believe it is time to step in. our first Soap Of The Day is a soap referencing a popular drawing of the ship containing Phoenix Wright and Miles Edgeworth from Ace Attorney, known in the Japanese fandom as NaruMitsu. Vote for them in the most influential yaoi poll down here.

soap of the day

225,687 Aufrufe • vor 2 Jahren

Is the Bible full of contradictions and mistakes? Skeptics and atheists say it's full of contradictions. Believers in God say it's a flawless masterpiece. Both can’t be right. If the Bible is a lie, why has no one been able to disprove it?

Is the Bible full of contradictions and mistakes? Skeptics and atheists say it's full of contradictions. Believers in God say it's a flawless masterpiece. Both can’t be right. If the Bible is a lie, why has no one been able to disprove it?

Apologetics Press

9,286,707 Aufrufe • vor 1 Jahr

OF DRAGONS AND MEN Documentary So much evidence that our history has been changed or written off as myth. The evidence of dragons existing is abundant. Recorded encounters by Marco Polo and Alexander the Great are just a drop in the bucket when it comes to witness testimony

OF DRAGONS AND MEN Documentary So much evidence that our history has been changed or written off as myth. The evidence of dragons existing is abundant. Recorded encounters by Marco Polo and Alexander the Great are just a drop in the bucket when it comes to witness testimony

Redpill Drifter

26,787 Aufrufe • vor 1 Jahr

Karen Read’s entire lawsuit is 46 pages of pure defamation aimed at inflicting emotional harm on the McCabes and Alberts. It’s also packed with blatant contradictions to the actual evidence and testimony. And by the way—where’s Colin? You all screamed his name for three years straight. Losers.

Karen Read’s entire lawsuit is 46 pages of pure defamation aimed at inflicting emotional harm on the McCabes and Alberts. It’s also packed with blatant contradictions to the actual evidence and testimony. And by the way—where’s Colin? You all screamed his name for three years straight. Losers.

Mama Llama

14,910 Aufrufe • vor 6 Monaten

A.W.A.R.E (Aware Forces) is a furry visual novel developed by The Neurasthenia Studio, utilizing Unity 3D for its creation! The game is fully animated, film-style storytelling and supports both Chinese and English languages.

A.W.A.R.E (Aware Forces) is a furry visual novel developed by The Neurasthenia Studio, utilizing Unity 3D for its creation! The game is fully animated, film-style storytelling and supports both Chinese and English languages.

AwareForces

197,642 Aufrufe • vor 2 Jahren

In celebration of Little Napoleon and Wat Visual Novel, Changeling Tale, I present a very special trailer for the game that quickly and accurately covers the game's features! To purchase Changeling Tale, go to: or

In celebration of Little Napoleon and Wat Visual Novel, Changeling Tale, I present a very special trailer for the game that quickly and accurately covers the game's features! To purchase Changeling Tale, go to: or

Nocturn3: Body Pillow Expert/Actual Pig

15,377 Aufrufe • vor 2 Jahren

"Dr. Corbett's evidence yesterday. Now, this comes to when we just explained after reading the first half of the letter that the only reason that remains logically plausible on the strength of the evidence as to why this delay, that has remained unexplained by omission by the coroner, as to why this delay occurred, the only logical reason is that the strength of the evidence yesterday of Dr. Kevin Corbett was such that it meant, and it was so strong in terms of its implications, that it meant the coroner and counsel for the Royal Trinity Hospice, barrister Wright-Kluger, needed time to digest the implications of Dr. Corbett's evidence. And Wright-Kluger may have needed extra time to prepare a proper defense for the Royal Trinity Hospice, because Dr. Corbett's evidence was damning. And we're going to come to that evidence from our notes, live notes that we published yesterday, and show you exactly what was so damning about it." Maajid Nawaz for WARRIOR CREED

"Dr. Corbett's evidence yesterday. Now, this comes to when we just explained after reading the first half of the letter that the only reason that remains logically plausible on the strength of the evidence as to why this delay, that has remained unexplained by omission by the coroner, as to why this delay occurred, the only logical reason is that the strength of the evidence yesterday of Dr. Kevin Corbett was such that it meant, and it was so strong in terms of its implications, that it meant the coroner and counsel for the Royal Trinity Hospice, barrister Wright-Kluger, needed time to digest the implications of Dr. Corbett's evidence. And Wright-Kluger may have needed extra time to prepare a proper defense for the Royal Trinity Hospice, because Dr. Corbett's evidence was damning. And we're going to come to that evidence from our notes, live notes that we published yesterday, and show you exactly what was so damning about it." Maajid Nawaz for WARRIOR CREED

أبو عمّار

32,144 Aufrufe • vor 10 Monaten

Nobel for Hafiz Saeed and Masood Azhar next? A terror state like Pakistan demanding a Nobel for its leaders is laughable. First, they wanted one for Trump; now they demand three for their own. In the future, they might as well demand one for terrorists like Hafiz Saeed and Masood Azhar. While it seems funny now, as Indians, it is our fundamental duty to expose Pakistan on the world stage. We must tweet, post, and act at every level to ensure this rogue nation doesn't even make it to the nominations. #ExposePakistan #IndiaFirst 🇮🇳

Nobel for Hafiz Saeed and Masood Azhar next? A terror state like Pakistan demanding a Nobel for its leaders is laughable. First, they wanted one for Trump; now they demand three for their own. In the future, they might as well demand one for terrorists like Hafiz Saeed and Masood Azhar. While it seems funny now, as Indians, it is our fundamental duty to expose Pakistan on the world stage. We must tweet, post, and act at every level to ensure this rogue nation doesn't even make it to the nominations. #ExposePakistan #IndiaFirst 🇮🇳

Kavitha Kalvakuntla

79,641 Aufrufe • vor 2 Monaten

And....HERE IT IS: The book cover reveal video for my upcoming novel. Plz have a watch (...then share/RT to help spread it around...?) Many thanks. #HUGEthenovel

And....HERE IT IS: The book cover reveal video for my upcoming novel. Plz have a watch (...then share/RT to help spread it around...?) Many thanks. #HUGEthenovel

Brent Butt

57,135 Aufrufe • vor 3 Jahren

♦️ Must Watch ♦️ President Julius Sello Malema makes a progressive suggestion, stating that it would be procedurally fair to release General Mkhwanazi, as his evidence-in-chief has been completed. The matters currently under discussion have nothing to do with Mkhwanazi. He has submitted his evidence and is willing to provide more if challenged. It is not proper for a witness to give evidence and for the committee to immediately begin discussing it. This is not a kangaroo court. #EFFInParliament

♦️ Must Watch ♦️ President Julius Sello Malema makes a progressive suggestion, stating that it would be procedurally fair to release General Mkhwanazi, as his evidence-in-chief has been completed. The matters currently under discussion have nothing to do with Mkhwanazi. He has submitted his evidence and is willing to provide more if challenged. It is not proper for a witness to give evidence and for the committee to immediately begin discussing it. This is not a kangaroo court. #EFFInParliament

Economic Freedom Fighters

78,856 Aufrufe • vor 8 Monaten

Objection! 🧑‍⚖️ Investigate crimes, cross-examine witnesses, and prove your client is innocent in Phoenix Wright: Ace Attorney Trilogy, available to play on #GeForceNOW! 🌩️

Objection! 🧑‍⚖️ Investigate crimes, cross-examine witnesses, and prove your client is innocent in Phoenix Wright: Ace Attorney Trilogy, available to play on #GeForceNOW! 🌩️

🌩️ NVIDIA GeForce NOW

70,900 Aufrufe • vor 8 Monaten

I went away for a few years and wrote a novel. It is called The Life Impossible. It is my big life-and-love-and-the-universe novel. It will be out next August in the U.K. and early September in the US, Canada and elsewhere. It can be pre-ordered here -

I went away for a few years and wrote a novel. It is called The Life Impossible. It is my big life-and-love-and-the-universe novel. It will be out next August in the U.K. and early September in the US, Canada and elsewhere. It can be pre-ordered here -

Matt Haig

255,169 Aufrufe • vor 2 Jahren

of the Devil is a REAL VISUAL NOVEL (has themes, aura, Schrödinger's Cat is discussed) that released in 2025 to over a THOUSAND OVERWHELMINGLY POSITIVE player reviews and it has a FREE 3 HOUR DEMO and it's ON SALE for the next 24 HOURS!

of the Devil is a REAL VISUAL NOVEL (has themes, aura, Schrödinger's Cat is discussed) that released in 2025 to over a THOUSAND OVERWHELMINGLY POSITIVE player reviews and it has a FREE 3 HOUR DEMO and it's ON SALE for the next 24 HOURS!

of the Devil

171,945 Aufrufe • vor 11 Monaten