Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Foundational model wars over the past 12 months OpenAI vs Google vs Anthropic vs 01 AI vs Meta vs Cohere vs Alibaba vs Mistral vs Databricks vs Nous Research & 10000+ more

270,801 görüntüleme • 2 yıl önce •via X (Twitter)

9 Yorum

Chief AI Officer profil fotoğrafı
Chief AI Officer2 yıl önce

Want a daily fundraising report in AI? Join 5000+ tracking venture rounds in AI and get access to my free funding database:

Pseudonym 🦅 profil fotoğrafı
Pseudonym 🦅2 yıl önce

The last 12 months were a long decade to live thru

Florian Laurent profil fotoğrafı
Florian Laurent2 yıl önce

very cool! @lmsysorg should add it to their leaderboard

Chris profil fotoğrafı
Chris2 yıl önce

My hot take: elo 1284 or not, gpt4o sucks at instruction following compared to gpt4 turbo and sometimes its older cousin for most of my use cases. It answers what it "thinks" I want, but doesn't consider what is being said (I think bc it is heavily distilled or "sparse").

RYAN profil fotoğrafı
RYAN2 yıl önce

Interesting how it appears that openai is holding back and releases just strong enough to be first place. Almost looks like Google collided with them like a pool ball.

Holistech profil fotoğrafı
Holistech2 yıl önce

Anthropic Claude Opus is in my experience much better than OpenAI ChatGPT-4* in long philosophical and scientific conversations. Ist way more knowledgeable and has better conclusions.

BIG Corp CEO profil fotoğrafı
BIG Corp CEO2 yıl önce

The winner of the war tells the story! This is one of the things that has @OpenAI and @GoogleAI on the #OGDKTop5 Trending Businesses this week 😎👌🏾

Benoît Roussel profil fotoğrafı
Benoît Roussel2 yıl önce

Cool ! How comes that OpenAI's scores go down ? LLM got effectively worse, or just the elo system ?

John Smith profil fotoğrafı
John Smith2 yıl önce

The elo metric is probably the worst benchmark to judge AI against. It's entirely based on vibes. It's fine if you want to build a chatbot, but not for getting work done. It's purely coincidental that it tends to mirror average benchmarks for specific domains.

Benzer Videolar