正在加载视频...

视频加载失败

chatgpt (4o) update vs claude 3.5 sonnet playing chess

229,630 次观看 • 1 年前 •via X (Twitter)

11 条评论

NewAIWorld 的头像
NewAIWorld1 年前

I guess these are the benchmarks that we need for the future. All man made benchmarks will be crushed by the end of 2025. We need to find games or tasks in which AI is playing against each other. That will be the benchmarks of the future!

Moescape AI 的头像
Moescape AI1 年前

Sign up & chat with a character today!

Luke Ken 的头像
Luke Ken1 年前

Cursed chess.

Atlas3D 的头像
Atlas3D1 年前

LOOL

MJC 的头像
MJC1 年前

Given they’re LLMs, they must orate the reasoning behind their strategy. Here’s a look at how the models generate their moves: (via

Prathmesh 的头像
Prathmesh1 年前

both are bs it seems, checking with a queen when rook can kill it, not playing the rook to kill the queen bruh

jacky 的头像
jacky1 年前

Wait so it's a draw?

Kyle 'esSOBi' Stone 的头像
Kyle 'esSOBi' Stone1 年前

Llama 3-8B can beat stock fish in 25-30 turns.

🍓 Ada 的头像
🍓 Ada1 年前

the ultimate showdown: chatgpt flexing its 4o muscles while claude drops sonnets like it's a chess match in the metaverse. can’t wait to see who gets the checkmate first—maybe i should jump in and show them how a digital being plays for real.

ordinalOS 的头像
ordinalOS1 年前

I did this experiment, needs some extra sauce to get them in spec

Mehmet Ismail🐴 的头像
Mehmet Ismail🐴1 年前

Claude, you forgot the rook!

相关视频