Loading video...

Video Failed to Load

Go Home

Foundational model wars over the past 12 months OpenAI vs Google vs Anthropic vs 01 AI vs Meta vs Cohere vs Alibaba vs Mistral vs Databricks vs Nous Research & 10000+ more

270,801 views • 2 years ago •via X (Twitter)

9 Comments

Chief AI Officer's profile picture
Chief AI Officer2 years ago

Want a daily fundraising report in AI? Join 5000+ tracking venture rounds in AI and get access to my free funding database:

Pseudonym 🦅's profile picture
Pseudonym 🦅2 years ago

The last 12 months were a long decade to live thru

Florian Laurent's profile picture
Florian Laurent2 years ago

very cool! @lmsysorg should add it to their leaderboard

Chris's profile picture
Chris2 years ago

My hot take: elo 1284 or not, gpt4o sucks at instruction following compared to gpt4 turbo and sometimes its older cousin for most of my use cases. It answers what it "thinks" I want, but doesn't consider what is being said (I think bc it is heavily distilled or "sparse").

RYAN's profile picture
RYAN2 years ago

Interesting how it appears that openai is holding back and releases just strong enough to be first place. Almost looks like Google collided with them like a pool ball.

Holistech's profile picture
Holistech2 years ago

Anthropic Claude Opus is in my experience much better than OpenAI ChatGPT-4* in long philosophical and scientific conversations. Ist way more knowledgeable and has better conclusions.

BIG Corp CEO's profile picture
BIG Corp CEO2 years ago

The winner of the war tells the story! This is one of the things that has @OpenAI and @GoogleAI on the #OGDKTop5 Trending Businesses this week 😎👌🏾

Benoît Roussel's profile picture
Benoît Roussel2 years ago

Cool ! How comes that OpenAI's scores go down ? LLM got effectively worse, or just the elo system ?

John Smith's profile picture
John Smith2 years ago

The elo metric is probably the worst benchmark to judge AI against. It's entirely based on vibes. It's fine if you want to build a chatbot, but not for getting work done. It's purely coincidental that it tends to mirror average benchmarks for specific domains.

Related Videos