正在加载视频...

视频加载失败

llama 3.1 outperforms gpt-4o on most benchmarks. It's a massive open-source win. Here are 5 side-by-side comparing llama to gpt-4o with my own tests:

257,399 次观看 • 1 年前 •via X (Twitter)

11 条评论

Ruben Hassid 的头像
Ruben Hassid1 年前

Left: llama-3.1 Right: gpt-4o test #1 → 9.11 & 9.9, which one is bigger? Few LLMs managed to answer this right. gpt-4o could here. llama 3.1 couldn't. The reasoning was interesting, but wrong.

Ruben Hassid 的头像
Ruben Hassid1 年前

test #2 → Linkedin headlines It's a task that requires both of them to suggest multiple headlines. → gpt-4o suggested only one. → llama-3 suggested me five 5 headlines. gpt-4o headline is too long. llama 3.1 reviewed & suggested a really good one.

Ruben Hassid 的头像
Ruben Hassid1 年前

test #3 → One-person business plan Context: a Spanish learning course. I'm impressed by llama. > problem discovery > content creation plan > audience building & marketing It even suggested reddit & facebook groups. gpt-4o was too generic. llama wins.

Ruben Hassid 的头像
Ruben Hassid1 年前

test #4 → cold email for EasyGen I prefer gpt-4o tone here. It's not perfect but it's direct. llama 3.1 was too long, I had to ask for a shorten one. Now, which one is the best at writing linkedin invitation note:

Ruben Hassid 的头像
Ruben Hassid1 年前

test #5 → Linkedin invitation note I'm shocked on how they wrote the same here. They suggested me roughly the same opening, closing lines & invitation notes. But I prefer llama's version. TL;DR my conclusion:

Ruben Hassid 的头像
Ruben Hassid1 年前

I'm impressed by llama-3.1. 1. it's a massive open-source win. 2. it's just as good as gpt-4o. 3. sometimes even better. Open-source AI will dominate the future. Closed-source AI like ChatGPT might fade away without a much better offering. Last thing before I go:

Ruben Hassid 的头像
Ruben Hassid1 年前

I test every major LLM to help me create better content faster - so you do too. Check @rubenhssd for more. It's me :) I'm ruben If you'd like to support me, a simple RT shows my mom I do the right thing.

Michael Howe-Ely 的头像
Michael Howe-Ely1 年前

oof 😓

Benjamin 的头像
Benjamin1 年前

Llama answered the question, but the reasoning is still wrong

Dakota Robertson 的头像
Dakota Robertson1 年前

Brother Ruben with the banger tests 🤌🏻

Ruben Hassid 的头像
Ruben Hassid1 年前

That's my bread & butter. I love running these tests.

相关视频