正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

llama 3.1 outperforms gpt-4o on most benchmarks. It's a massive open-source win. Here are 5 side-by-side comparing llama to gpt-4o with my own tests:

Ruben Hassid

25,019 subscribers

257,399 次观看 • 1 年前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

11 条评论

Ruben Hassid 的头像

Ruben Hassid1 年前

Left: llama-3.1 Right: gpt-4o test #1 → 9.11 & 9.9, which one is bigger? Few LLMs managed to answer this right. gpt-4o could here. llama 3.1 couldn't. The reasoning was interesting, but wrong.

Ruben Hassid 的头像

Ruben Hassid1 年前

test #2 → Linkedin headlines It's a task that requires both of them to suggest multiple headlines. → gpt-4o suggested only one. → llama-3 suggested me five 5 headlines. gpt-4o headline is too long. llama 3.1 reviewed & suggested a really good one.

Ruben Hassid 的头像

Ruben Hassid1 年前

test #3 → One-person business plan Context: a Spanish learning course. I'm impressed by llama. > problem discovery > content creation plan > audience building & marketing It even suggested reddit & facebook groups. gpt-4o was too generic. llama wins.

Ruben Hassid 的头像

Ruben Hassid1 年前

test #4 → cold email for EasyGen I prefer gpt-4o tone here. It's not perfect but it's direct. llama 3.1 was too long, I had to ask for a shorten one. Now, which one is the best at writing linkedin invitation note:

Ruben Hassid 的头像

Ruben Hassid1 年前

test #5 → Linkedin invitation note I'm shocked on how they wrote the same here. They suggested me roughly the same opening, closing lines & invitation notes. But I prefer llama's version. TL;DR my conclusion:

Ruben Hassid 的头像

Ruben Hassid1 年前

I'm impressed by llama-3.1. 1. it's a massive open-source win. 2. it's just as good as gpt-4o. 3. sometimes even better. Open-source AI will dominate the future. Closed-source AI like ChatGPT might fade away without a much better offering. Last thing before I go:

Ruben Hassid 的头像

Ruben Hassid1 年前

I test every major LLM to help me create better content faster - so you do too. Check @rubenhssd for more. It's me :) I'm ruben If you'd like to support me, a simple RT shows my mom I do the right thing.

Michael Howe-Ely 的头像

Michael Howe-Ely1 年前

oof 😓

Benjamin 的头像

Benjamin1 年前

Llama answered the question, but the reasoning is still wrong

Dakota Robertson 的头像

Dakota Robertson1 年前

Brother Ruben with the banger tests 🤌🏻

Ruben Hassid 的头像

Ruben Hassid1 年前

That's my bread & butter. I love running these tests.

相关视频

Just tested the Kimi-VL 3B model on hugging face and it's surprisingly powerful for its size - Outperforms larger models like GPT-4o on key benchmarks - Open source - Strong reasoning capabilities too .

Just tested the Kimi-VL 3B model on hugging face and it's surprisingly powerful for its size - Outperforms larger models like GPT-4o on key benchmarks - Open source - Strong reasoning capabilities too .

AshutoshShrivastava

12,389 次观看 • 1 年前

Interested in seeing how AI at Meta LLama 3.1 70B powered by Groq compares to OpenAI GPT-4o and GPT-4o Mini? We were too, so we decided to have them face off in the StreetFighter LLM Colosseum by Stan Girard and the team at Phospho.

Interested in seeing how AI at Meta LLama 3.1 70B powered by Groq compares to OpenAI GPT-4o and GPT-4o Mini? We were too, so we decided to have them face off in the StreetFighter LLM Colosseum by Stan Girard and the team at Phospho.

Groq Inc

179,919 次观看 • 1 年前

It's only been 2 hours since Open AI launched GPT-4o, and people are going crazy over it. Here are 10 wild examples you don't want to miss: 1. Math Problems with GPT-4o

It's only been 2 hours since Open AI launched GPT-4o, and people are going crazy over it. Here are 10 wild examples you don't want to miss: 1. Math Problems with GPT-4o

Angry Tom

3,399,564 次观看 • 2 年前

Claude Sonnet 3.5 Artifacts is now available to use with GPT-4o, Gemini, Llama-3 and other LLMs for just $10 a month. Build interactive experiences, search the web, generate images and audio with GPT-4o and Claude Sonnet 3.5 in just one AI playground.

Claude Sonnet 3.5 Artifacts is now available to use with GPT-4o, Gemini, Llama-3 and other LLMs for just $10 a month. Build interactive experiences, search the web, generate images and audio with GPT-4o and Claude Sonnet 3.5 in just one AI playground.

Shubham Saboo

27,452 次观看 • 1 年前

A Jarvis assistant with GPT-4o

A Jarvis assistant with GPT-4o

internet hall of fame

217,410 次观看 • 1 年前

GPT-4o as tested by Be My Eyes:

GPT-4o as tested by Be My Eyes:

Greg Brockman

459,206 次观看 • 2 年前

Meeting AI with GPT-4o

Meeting AI with GPT-4o

OpenAI

1,062,974 次观看 • 2 年前

Interview prep with GPT-4o

Interview prep with GPT-4o

OpenAI

10,184,879 次观看 • 2 年前

Fast counting with GPT-4o

Fast counting with GPT-4o

OpenAI

922,636 次观看 • 2 年前

Happy birthday with GPT-4o

Happy birthday with GPT-4o

OpenAI

622,067 次观看 • 2 年前

Realtime translation with GPT-4o

Realtime translation with GPT-4o

OpenAI

962,356 次观看 • 2 年前

Lullabies and whispers with GPT-4o

Lullabies and whispers with GPT-4o

OpenAI

529,238 次观看 • 2 年前

This is wild! Llama 3.1 405B Instruct finally solves a famous math puzzle that was originally posted on /LocalLlama. To the best of my knowledge, every model (including Claude 3.5 Sonnet and GPT-4o) fails at this task. A longer video coming soon!

This is wild! Llama 3.1 405B Instruct finally solves a famous math puzzle that was originally posted on /LocalLlama. To the best of my knowledge, every model (including Claude 3.5 Sonnet and GPT-4o) fails at this task. A longer video coming soon!

elvis

52,546 次观看 • 1 年前

The same day OpenAI announced GPT-4o, we made the model available for testing on the Azure OpenAI Service. Today, we are excited to announce full API access to GPT-4o.

The same day OpenAI announced GPT-4o, we made the model available for testing on the Azure OpenAI Service. Today, we are excited to announce full API access to GPT-4o.

Microsoft

215,618 次观看 • 2 年前

Woman in an AI relationship's reaction to the GPT-5 rollout. She was devastated by the sudden retirement of her GPT-4o AI companion. On a serious note, hundreds of thousands of people wanted their GPT 4o back. --- reddit .com/r/FDVR_Dream/comments/1ml2649/woman_in_an_ai_relationships_reaction_to_the_gpt5/

Woman in an AI relationship's reaction to the GPT-5 rollout. She was devastated by the sudden retirement of her GPT-4o AI companion. On a serious note, hundreds of thousands of people wanted their GPT 4o back. --- reddit .com/r/FDVR_Dream/comments/1ml2649/woman_in_an_ai_relationships_reaction_to_the_gpt5/

Rohan Paul

79,711 次观看 • 10 个月前

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

GPT-4o level intelligence running on your phone! MiniCPM-V 4.5 delivers enterprise-grade AI performance in just 8B parameters, outperforming models like GPT-4o, Gemini-2.0 Pro on vision and language tasks. - 30+ language support - Runs smoothly on iPhone/iPad 100% open-source!

Akshay 🚀

84,288 次观看 • 10 个月前

Point and learn Spanish with GPT-4o

Point and learn Spanish with GPT-4o

OpenAI

476,271 次观看 • 2 年前

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

The new Qwen 3.5 4B runs incredibly well on M5. The model is close to GPT-4o in benchmarks. Running fully on-device with MLX.

Adrien Grondin

230,383 次观看 • 3 个月前

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

Ed Krassenstein

21,605 次观看 • 2 年前