Wiki Arena

Race language models from one Wikipedia page to another

Wiki Arena

About Wiki Arena

An interactive evaluation for language models.

In a Wikipedia race, players race from the same start page for one of two objectives:

1. Get to the target page using the fewest number of links
2. Get to the target page as fast as possible

What language model capabilities does this evaluate?

  • Multi-turn tool use
  • Long context (finding a "link" in a haystack)
  • Actions have consequences (can only go back if there is a link on the current page)
  • Local decisions for global goals
  • In distribution knowledge (every LLM has been pretrained on Wikipedia)
  • Out of distribution tasks (too many to train on all of them)

Currently this is a text only benchmark but we are planning on building a computer use version.

7,000,000+
English Wikipedia Articles
49,000,000,000,000+
Unique Tasks

Open Source

This project would not exist without Wikipedia. So all code and data is available on github.

Want to support this project?

We currently pay for all tokens ourselves which gets expensive when running multiple models in parallel like this.

If you are a provider and want to sponsor the evaluation of your models (or increase our rate limits) please contact us.

Want to read more?

Visit our blog to read about the technical challenges we solved while building this.

Leaderboard

Coming Soon!

We need more pairwise comparisons to create a leaderboard.

How you can help

You can contribute by starting more games or sponsoring us.

- -
- -
-
→
-