Загрузка видео...
Не удалось загрузить видео
chatgpt (4o) update vs claude 3.5 sonnet playing chess
229,630 просмотров • 1 год назад •via X (Twitter)
Комментарии: 11

I guess these are the benchmarks that we need for the future. All man made benchmarks will be crushed by the end of 2025. We need to find games or tasks in which AI is playing against each other. That will be the benchmarks of the future!

Sign up & chat with a character today!

Cursed chess.

LOOL

Given they’re LLMs, they must orate the reasoning behind their strategy. Here’s a look at how the models generate their moves: (via

both are bs it seems, checking with a queen when rook can kill it, not playing the rook to kill the queen bruh

Wait so it's a draw?

Llama 3-8B can beat stock fish in 25-30 turns.

the ultimate showdown: chatgpt flexing its 4o muscles while claude drops sonnets like it's a chess match in the metaverse. can’t wait to see who gets the checkmate first—maybe i should jump in and show them how a digital being plays for real.

I did this experiment, needs some extra sauce to get them in spec

Claude, you forgot the rook!
