The 2018 Roller Derby World Cup’s main tournament was an exceptionally large event for a track sport, with 38 teams across all experience levels from world-leading to entirely new to the world stage. With a number of constraints placed on the design of the tournament (the minimum number of games played by a team should be as high as possible, teams should not be segregated into “tiers”, accurate ratings should be produced, an elimination tournament should be seeded from competitors, with accurately selected opponents…), and just 4 days and 4 tracks to place it in, the approach used by the tournament was necessarily radical.

Those who want a good introduction to the theory of the tournament should go read Frogmouth’s article here: https://medium.com/@frogmouth_inc/roller-derby-world-cup-math-explained-d971689d9bde

In this article, we’re going to look at how the tournament did in practice, and discuss both some improvements, and some extensions to the results of the system on the basis of what we learned.

## Overview

The aim of the 2018 RDWC tournament rating and scheduling system was to provide fair ratings for all 38 teams competing, across an expected strength difference on the order of 1:1000 between the top and bottom ends. We needed to provide fair ratings when teams were closely matched – but also when teams were widely differing in strength.

- On balance, we believe that the rating and ranking system achieved this aim at least as well as any of the potential competitors would have. The final day bouts were all exceptionally close, and generally the top 16 were selected fairly, based on those performances. This is a significant technical improvement over RDWC2014.

Additional aims of the scheduling system were to provide better matchings than a group scheme would, especially for newer teams to the tournament, with no “initial seeding” expectation.

- The system did fairly well at this, although the scheduler was hampered by additional restrictions on it (it also needed to set up an elimination tournament after it). Around 10 “poor matchings” were scheduled in Days 2 and 3: compared to an 8 Group selection on 38 teams, you would have expected at least 16, so even with constraints, the matcher provided improved pairings over group approaches.

Finally, the scheduler and rating system needed to provide as many games (as much track time) as possible for as many of the competitors present as possible, within a 4 day, 4 track schedule.

- It’s clear that, from observation, this was achieved,
*within*the restriction of needing a single-elimination style tournament to occupy the entire final day. A version of the schedule without a single-elimination component could have given all teams (not just some) one additional “full length” game, shortened the length of Days 2 and 3 by 2 hours each, and provided an opportunity for a “final” playoff between the top two teams at the end of Day 4.

The restriction on the scheduler avoided almost all repeated games in the elimination phase, except for one (which would not have been close in the Day 1 ranking, and so could not have been predicted before Day 2 with just the tournament data).

In the rest of this feature, we’ll work through the two components of the system, and discuss how they did, and some additional interesting results we can extract from the bouts played.

Page 2: Results and Ratings ; Page 3: Pairings and Scheduling ; Page 4: Additional Results ; Page 5: Future Work and Improvements