Arena.ai's banner
Arena.ai's profile picture

Arena.ai

@arena161,672 subscribers

Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → https://t.co/XBZCrseaWF

Shorts

Arena Trends: Text-to-Image, Jan 2026 – Apr 2026 For most of the year, Google DeepMind and OpenAI traded the top spot within a tight margin - GPT-Image vs. Nano Banana - with the rest of the field clustered below 1,200. Today, GPT-Image-2 breaks away with a score of 1,512, 242 points ahead of #2 Google. The frontier continues to move.

Arena Trends: Text-to-Image, Jan 2026 – Apr 2026 For most of the year, Google DeepMind and OpenAI traded the top spot within a tight margin - GPT-Image vs. Nano Banana - with the rest of the field clustered below 1,200. Today, GPT-Image-2 breaks away with a score of 1,512, 242 points ahead of #2 Google. The frontier continues to move.

513,313 Aufrufe

US vs China update. Stanford's AI Index put the US–China gap at 2.7%. Here's what two years of real-world use from the Text Arena shows. Gap three years ago: +278. Today: +29. Anthropic's Claude Opus 4.6 Thinking vs. Baidu's ERNIE for Developers Ernie 5.1 at the top. The US has never lost #1, but the race keeps closing.

US vs China update. Stanford's AI Index put the US–China gap at 2.7%. Here's what two years of real-world use from the Text Arena shows. Gap three years ago: +278. Today: +29. Anthropic's Claude Opus 4.6 Thinking vs. Baidu's ERNIE for Developers Ernie 5.1 at the top. The US has never lost #1, but the race keeps closing.

58,157 Aufrufe

We decided to take Paul Jankura’s Claude Opus 4.5 out for a test drive vs. the current #1 ranking model in Code Arena: Gemini 3 Pro. Same prompt, different outputs. Let’s take a look. Remember, your votes drive the leaderboards. We’ll see how Claude Opus 4.5 stacks up in the coming days! Check out some of the comparisons, like how Claude Opus 4.5 handled the “Pyramids of Giza” prompt, in thread. 🧵

We decided to take Paul Jankura’s Claude Opus 4.5 out for a test drive vs. the current #1 ranking model in Code Arena: Gemini 3 Pro. Same prompt, different outputs. Let’s take a look. Remember, your votes drive the leaderboards. We’ll see how Claude Opus 4.5 stacks up in the coming days! Check out some of the comparisons, like how Claude Opus 4.5 handled the “Pyramids of Giza” prompt, in thread. 🧵

88,977 Aufrufe

📊Arena Trend update for August 2024 - Feb 2025: After a few DeepSeek jumps last month, xAI leaps forward to the top of the leaderboard. The AI race continues! 📈 animation credit: Peter Gostev

📊Arena Trend update for August 2024 - Feb 2025: After a few DeepSeek jumps last month, xAI leaps forward to the top of the leaderboard. The AI race continues! 📈 animation credit: Peter Gostev

154,425 Aufrufe

Arena leaderboards now include Price and Context. - Price is shown as input / output cost per 1M tokens, and context shows the maximum context window. Compare Arena scores based on what matters for your use case.

Arena leaderboards now include Price and Context. - Price is shown as input / output cost per 1M tokens, and context shows the maximum context window. Compare Arena scores based on what matters for your use case.

28,294 Aufrufe

📈Arena Trends Update We pulled Arena scores for the Top 10 labs in Text for the past 6 months (Sept-2025-Feb 2026), and the competitive spread is shifting again. With tighter confidence intervals and new entries in the mix, the frontier continues to shift. Stay tuned for more insights as we dive deeper into the top open models for February later this week. Let us know what you found the most surprising in the comments. 👇

📈Arena Trends Update We pulled Arena scores for the Top 10 labs in Text for the past 6 months (Sept-2025-Feb 2026), and the competitive spread is shifting again. With tighter confidence intervals and new entries in the mix, the frontier continues to shift. Stay tuned for more insights as we dive deeper into the top open models for February later this week. Let us know what you found the most surprising in the comments. 👇

23,239 Aufrufe

Created by Gemini 3 Pro in one shot!

Created by Gemini 3 Pro in one shot!

36,569 Aufrufe

🍌 Thousands of new people jumped into Image Arena Battle mode this week - our intern can barely keep up! What happens in Battle mode? 🧵 We partner directly with model providers to give you early access to cutting-edge models still in development, often before you can try them anywhere else. These pre-release models are tested in Battle mode. Details in the thread 👇

🍌 Thousands of new people jumped into Image Arena Battle mode this week - our intern can barely keep up! What happens in Battle mode? 🧵 We partner directly with model providers to give you early access to cutting-edge models still in development, often before you can try them anywhere else. These pre-release models are tested in Battle mode. Details in the thread 👇

16,909 Aufrufe

Videos