
Arena.ai
@arena • 161,672 subscribers
Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → https://t.co/XBZCrseaWF
Shorts
Videos

5 patterns in Text Arena's price–performance Pareto frontier since 2023: 1. GPT-4-level quality is now ~500x lower cost. - From a ~$50 blended price per million tokens in 2023 to ~$0.10 today. 2. The higher-price end is both better and lower-priced since 2023. - The leading Arena score has climbed ~170 points (1,330 → 1,500). While the price of the higher-end frontier models dropped from ~$50 to ~$20 per million tokens. 3. The low-cost end gained the most. - Under $0.20 per million tokens, the best available model went from ~1,000 Arena score in 2023 to ~1,440 today. 4. The low-cost/top performance gap has nearly closed. - In 2023, sub-$0.20 models trailed the leader by ~350 Arena points. Today, ~60. 5. The cast has rotated quite a bit. - - OpenAI set the 2023–24 benchmark. - AI at Meta strengthened the low-cost end in 2024. - Google DeepMind drove the 2025 jump. - Anthropic holds the peak in 2026. - xAI and Chinese labs like DeepSeek AI, Z.ai, Kimi.ai, Xiaomi MiMo, and Qwen are continuing to push the mid-price frontier.
Arena.ai57,397 views • 13 days ago

GPT-5.2-high by OpenAI is off to a strong start in the Code Arena. ⚡️ If you’re new here: Code Arena is where AI models build full web apps, tools, and interactive sites — all from a single prompt. Watch the video to see GPT-5.2-high in action, then try your own prompt and reply with your creation below. ⬇️
Arena.ai246,068 views • 5 months ago

📢We’re excited to share that we’ve raised $100M in seed funding to support LMArena and continue our research on reliable AI. Led by a16z and UC Investments (University of California), we're proud to have the support of those that believe in both the science and the mission. We’re focused on building a neutral, open, community-driven platform that helps the world understand and improve the performance of AI models on real queries from real users. Also, big news is coming next week!👀 We're relaunching LMArena with a whole new look built directly with community feedback from the ground up 🧱 Link in thread.
Arena.ai435,389 views • 1 year ago

How does the #1 open Text Arena model hold up in agentic coding tasks? We tested GLM-5 in Code Arena with head-to-head SVG prompts vs. top frontier AI models. What do you think? Scores for Z.ai 's GLM-5 in Code Arena coming soon. Test out GLM-5 for yourself and get voting.
Arena.ai115,220 views • 3 months ago

We put the top three Code Arena models head-to-head: Opus 4.5 Thinking 32k, Opus 4.5, and Gemini 3 Pro. They’re just 20 points apart. Same tough prompts, different results. Here’s what stood out. Remember, your votes drive the rankings. Watch how these contenders move on the leaderboard as more votes come in. Check out the comparisons in the thread below. 🧵
Arena.ai163,668 views • 6 months ago

We’ve challenged Claude Opus 4.6 by Anthropic with our hardest 3D prompts, it did not disappoint.
Arena.ai98,764 views • 3 months ago

The NEW LMArena is officially live! 🎉 ✨ New Logo! ⚡️ Better, faster UI/UX for chat and leaderboard 📱 Mobile optimized 💬 Chat history 🧭 Clearer leaderboard navigation 🤖 Many modalities in one place: vision, image, and more coming soon Try it now at lmarena dot ai! (Link in 🧵)
Arena.ai266,646 views • 1 year ago

🚨 BIG NEWS: An announcement from our intern… Introducing, 🎬 Video Arena!
Arena.ai151,712 views • 10 months ago

An anonymous image model appeared on Arena on Aug 12, 2025 and quickly became the most-voted model in Arena's history. The codename: Nano Banana. It was later revealed to be built on Google Gemini and publicly released on Aug 26, 2025. We sat down with Lead Engineer Yue to break down why it stood out. Watch the full video on YouTube, link in thread 👇
Arena.ai44,172 views • 2 months ago

🚨BIG NEWS: 🎬 Video Arena is now live on the web! Test out Veo 3.1, Sora 2, Seedance v1.5 Pro, Kling 2.6 Pro, Wan 2.5 & more. What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform with real-world use. Thank you to our wonderful community for all the feedback! Today, we’re opening up access by making it available on the web. 🎥 Generate videos with 15 different frontier AI models and compare them head-to-head. 📊 Vote for the best output to power the leaderboards.
Arena.ai61,930 views • 4 months ago

🚨🍌Big Reveal: who was "Nano Banana?" The anonymous model, “nano-banana,” that caught the world's attention with its ability to follow complex instructions, preserve character identity, and maintain contextual details was: Gemini-2.5-Flash-Image-Preview by Google DeepMind 🍌✨ - Now ranked #1 on the Image Edit Arena - Also ranked #1 for Text-to-Image In two weeks, “nano-banana” has driven over 5 million votes to the Image Edit Arena. With 2.5M+ votes for this model, it is the highest number of votes any model has received, with the largest Elo score lead (171) in Arena history. Congrats to the Google DeepMind team on this incredible milestone in image edit and generation. 👏
Arena.ai106,522 views • 9 months ago

📄We just launched PDF uploads in Arena. Upload PDFs with your prompts to add richer context and test models on document reasoning, bringing evaluations closer to real-world use. ▪️Ask questions directly against documents ▪️Digest complex, technical content in minutes ▪️Extract summaries and key takeaways instantly Try it across 10 models today - we’ll be adding more over time. Leaderboard coming soon. Start uploading, comparing, and voting!
Arena.ai46,578 views • 3 months ago

📊 After ranking #1, how do model ranks evolve over time? We analyzed how every top performer since mid-2023, when OpenAI’s GPT‑4 was at the top of the leaderboard. As of today, leaders seem to stay at the #1 spot for only 35 days on average. The leading model typically drops out of the top 5 within 5 months, and out of the top 10 within 7 months. Previous leaders have fallen substantially, with o1 now at #56 and Claude 3 Opus at #139 as progress moves faster and faster. We’ll see what 2026 brings.
lmarena.ai54,822 views • 4 months ago

Exciting news - Chatbot Arena now supports image uploads📸 Challenge GPT-4o, Gemini, Claude, and LLaVA with your toughest questions. Plot to code, VQA, story telling, you name it. Let's get creative and have fun! Leaderboard coming soon. Credits to builders Christopher Chou Lisa Dunlap Ying Sheng Lianmin Zheng Wei-Lin Chiang Anastasios Nikolas Angelopoulos & advisors Ion Stoica Joey Gonzalez trevordarrell Find link below👇
Arena.ai180,151 views • 1 year ago

🚨 BIG NEWS 🚨 Search Arena is live with 7 top models with search capabilities ready for testing. Be sure to have the "Search" modality selected in the chat box, and get testing. 🌐 xAI: Grok 4 Paul Jankura: Claude Opus 4 ✨Flex_Lex 💪🏽✨: Sonar Pro High & Reasoning Pro High OpenAI: o3 & GPT 4o-Search Preview Google DeepMind: Gemini 2.5 Pro Grounding
lmarena.ai89,753 views • 10 months ago