Race language models from one Wikipedia page to another
About Wiki Arena
An interactive evaluation for language models.
In a Wikipedia race, players race from the same start page for one of two objectives:
1. Get to the target page using the fewest number of links
2. Get to the target page as fast as possible
What language model capabilities does this evaluate?
- Multi-turn tool use
- Long context (finding a "link" in a haystack)
- Actions have consequences (can only go back if there is a link on the current page)
- Local decisions for global goals
- In distribution knowledge (every LLM has been pretrained on Wikipedia)
- Out of distribution tasks (too many to train on all of them)
Currently this is a text only benchmark but we are planning on building a computer use version.
Open Source
This project would not exist without Wikipedia. So all code and data is available on github.
Want to support this project?
We currently pay for all tokens ourselves which gets expensive when running multiple models in parallel like this.
If you are a provider and want to sponsor the evaluation of your models (or increase our rate limits) please contact us.
Want to read more?
Visit our blog to read about the technical challenges we solved while building this.
Coming Soon!
We need more pairwise comparisons to create a leaderboard.
How you can help
You can contribute by starting more games or sponsoring us.