Loading video...
Video Failed to Load
Introducing Copilot Arena - Interactive coding evaluation in the wild. Our extension lets you test top models for free, right in VSCode. Let's vote and build the Copilot leaderboard! Download here: Led by Wayne Chi and Valerie Chen at CMU. 1/🧵
158,045 views • 1 year ago •via X (Twitter)
10 Comments

2/ What is Copilot Arena? Unlike traditional code completion, Copilot Arena suggests paired completions from different LLMs, including GPT-4o, Claude-3.5, Gemini, Llama-3.1 and more. Together let's build an LLM leaderboard for code. The more votes, the better the leaderboard!

3/ Discover what models you prefer. You can see the model you picked after every vote. After twenty votes, you will unlock a personal leaderboard. A global leaderboard is coming soon - how does your personal leaderboard compare?

4/ How do you get started? Installing Copilot Arena is as simple as downloading any other VSCode extension! Simply look up Copilot Arena on the marketplace and click install. Alternatively, download the extension from the marketplace:

5/ Check out our repo for further details: Before using Copilot Arena, we ask that you read the information on how to use the extension and the different settings available (e.g., privacy and data collection) detailed in our repository:

6/ There is more on the way! We have multiple new features in the works. Join our discord channel to learn more: If you want to stay up-to-date on leaderboard results and regular updates, give @CopilotArena a follow. We will also be releasing a series of blog posts in the coming weeks on how we built Copilot Arena, our initial leaderboard, and more!

7/ The team behind Copilot Arena. Copilot Arena is led by the CMU team @iamwaynechi @valeriechen_ @chrisdonahuey @atalwalkar, in close collaboration with @infwinston @ml_angelopoulos @StringChaos @tianjun_zhang Ion Stoica from the LMArena team!

@iamwaynechi @valeriechen_ @simpsoka @itsakdev

@iamwaynechi @valeriechen_ I'd love to see Copilot Arena integrated with more LLMs, like Claude or LLaMA 3. Have you considered adding a leaderboard for model accuracy or user preferences?

@iamwaynechi @valeriechen_ we don't care, what is really important is where is the new nemotron model on the leaderboard? C:

@iamwaynechi @valeriechen_ This is cool but aren't you sending your private code/possibly secrets to be recorded in the arena dataset?



