Video wird geladen...
Video konnte nicht geladen werden
llama 3.1 outperforms gpt-4o on most benchmarks. It's a massive open-source win. Here are 5 side-by-side comparing llama to gpt-4o with my own tests:
257,399 Aufrufe • vor 1 Jahr •via X (Twitter)
11 Kommentare

Left: llama-3.1 Right: gpt-4o test #1 → 9.11 & 9.9, which one is bigger? Few LLMs managed to answer this right. gpt-4o could here. llama 3.1 couldn't. The reasoning was interesting, but wrong.

test #2 → Linkedin headlines It's a task that requires both of them to suggest multiple headlines. → gpt-4o suggested only one. → llama-3 suggested me five 5 headlines. gpt-4o headline is too long. llama 3.1 reviewed & suggested a really good one.

test #3 → One-person business plan Context: a Spanish learning course. I'm impressed by llama. > problem discovery > content creation plan > audience building & marketing It even suggested reddit & facebook groups. gpt-4o was too generic. llama wins.

test #4 → cold email for EasyGen I prefer gpt-4o tone here. It's not perfect but it's direct. llama 3.1 was too long, I had to ask for a shorten one. Now, which one is the best at writing linkedin invitation note:

test #5 → Linkedin invitation note I'm shocked on how they wrote the same here. They suggested me roughly the same opening, closing lines & invitation notes. But I prefer llama's version. TL;DR my conclusion:

I'm impressed by llama-3.1. 1. it's a massive open-source win. 2. it's just as good as gpt-4o. 3. sometimes even better. Open-source AI will dominate the future. Closed-source AI like ChatGPT might fade away without a much better offering. Last thing before I go:

I test every major LLM to help me create better content faster - so you do too. Check @rubenhssd for more. It's me :) I'm ruben If you'd like to support me, a simple RT shows my mom I do the right thing.

oof 😓

Llama answered the question, but the reasoning is still wrong

Brother Ruben with the banger tests 🤌🏻

That's my bread & butter. I love running these tests.
