Roller Derby and Simulation… Pt1

All of the roller derby rating systems I am aware of assign a single value to the “strength” of each team, whether it’s the official MRDA and WFTDA rankings and strength factors, the FlatTrackStats Elo rankings, or my own derived strength ratings.

This is partly because of the paucity of detailed data available for roller derby bouts – while all WFTDA-rules bouts tend to collect a wealth of data (scores per pass per jam, penalties, etc), which can be used to derive useful stats, that data is almost never made public, unless its a WFTDA-sanctioned bout (in which case its in the WFTDA Stats Repository, except that that’s also missing stuff since the 2016 changes to the StatsBook) or it’s one of the rare bouts to get full statsbooks uploaded to FlatTrackStats. As there’s a lot more unsanctioned bouts than sanctioned ones, this means that for most bouts all we know is the scores.

However, scores give more information than most single-value ranking systems use – most of them are concerned with either the score difference (how much team 1 beat team 2 by) or the score ratio (how many points team 1 scored for every point team 2 scored), or a derived metric related to them. Both of these schemes throw away additional “scale” information – a game where both teams produce high scores should be different to one where both teams are held to low score, for example.

There are score-based generic ranking systems for sports – the “Offense/Defence Method” attempts to assign “Offense” and “Defense” scores to teams, based on how highly they tend to score (and how low they keep their opponents’ scores) – but tests of this against Roller Derby ratings produce odd results.

We suspect that this is because Roller Derby is actually quite an odd sport – unlike many sports, both teams can be scoring points at the same time; and (except in power jams) both teams definitely need to play both offense and defence simultaneously.

So, while it definitely makes sense to consider a dual-ranking for teams on Offense and Defence, it seems that our approach will need to be more Roller Derby specific.

Making a model.

When we consider making a model of a process, we usually start with the simplest, most abstract representation we can, and only add bits if the bits we removed seem to be significant.

Our plan with this model is to then run a huge number of simulations, for different combinations of offense/defense strengths for each team, to produce a “fingerprint” score for any given combination. By statistically analysing the bout records for the roller derby community, we should be able to produce an inferred offense and defence for every team that’s played enough bouts in a given period. (This is similar in approach to how the Long-baseline Interferometer Gravitational-wave Observatory, LIGO’s observation of gravitational waves – just, essentially, a “chirp of gravity” – was mapped to a specific black hole merger event. Simulation of huge numbers of combinations of different astrophysical events gives the LIGO a library of “fingerprints” for those events, which they can then match to a given observed signal.)

Reducing Roller Derby to the most basic level, then: Roller Derby is all about the time it takes for a jammer to pass through barriers (the pack). The jammer who passes through the barrier fastest then has at least as much time as it takes the other jammer to follow to score points. (And they score points by… passing through barriers again.)

By thinking about Roller Derby this way, we can derive some basic results without even using statistics:

Firstly, we need to establish the limits on the number of jams (and the total track time) available to score in.

Under WFTDA/MRDA rules [MADE, RDCL, USARS are different], a bout consists of 2 periods of 30 minutes each. Jams are a maximum of 2 minutes long, and there’s 30 seconds between them.

Immediately, there’s a few obvious things that jump out. Firstly, as that 30 seconds of non-skating time is constant for each jam, to maximise your track time [and thus scoring potential], you need to have fewer, longer jams, rather than more shorter ones. With 2 minute jams, 4/5th (2 minutes out of 2 min 30) of the time in the bout is spent on track; with, say, 30 second jams, only 50% of the bout actually has any skating in it.

So, the maximum amount of track time, if all jams went to 2 minutes, would be: 4/5 of 60 minutes, or 48 minutes. This naturally corresponds to the smallest number of possible jams, which is 60/(2.5) or 24 jams.

The minimum amount of track time is arbitrarily close to 0, in principle (as a jammer can get lead very quickly and call it). The hypothetical maximum number of “infinitesimal jams” in that case would be 60/0.5, or 120 jams. Of course, this realistically never happens – a more reasonable measure for a minimum jam length is probably 20 seconds, resulting in a track time of 2/5 of 60, or 24 minutes, and a max jams of 60/(5/6) or 72 jams.

(The average roller derby bout, from experience, seems to have a number of jams somewhere in the 40s, corresponding to an average jam length of around 1 minute or so*.)


Expected scoring is harder to estimate. In one limit, assuming one team is totally dominant over the other, so their jammers always near-instantly take lead, and have the full 2 minutes to perform scoring passes at, say 27/5 rate (for 10 passes, and a partial one, per jam – or, say, 54 points per jam), the total score would be 54*24, or 1296:0.

The (famously) highest scoring bout in history to date was Victorian Roller Derby League/Chikos [FTS79263] with a score of 921:2, really not far off this maximum rate.

Bouts this unbalanced rarely happen tough, so let’s also take the example of two very closely matched teams. We might expect these teams to trade lead jammer evenly between jams, but with the opposing jammer close behind – so each jam would result in a single scoring pass for one team. (Remember, we’re not doing statistics here, so this is an unrealistic average).

In the case that both teams are very strong defensively, the limit would be 2 minute jams for 12 single pass scores each – and a total score of about 12*4 = 48 : 48.

In the case that both teams are very weak defensively, we’ll assume our 20 second jam minimum, for 36 single pass scores each – and a total score of around 36*4 = 144 : 144.

Thanks to Flat Track Stats, we can look at the recorded history of all roller derby bouts ever, and extract the closest bouts (which I’m defining as being within about 10 pts of each other) to compare. Looking back within the history of the current WFTDA ruleset, the highest scoring close bouts are actually pretty high!

The highest scoring close bout in the last year was a
273 : 260 game between  “Wilkes-Barre/Scranton Roller Radicals” (300th WFTDA) v “Harrisburg Area Roller Derby” (297th WFTDA) [FTS67692]!
Unfortunately, no bout stats were uploaded for this bout (a huge problem for Roller Derby statistics in general), so we can’t see how this happened in any detail.

The highest scoring close bout with actual detailed stats is the Ohio : Houston 1pt game from the Skate To Thrill 2016 tournament [FTS76742]. That was a 222:221 bout, running to a total of 44 jams (an average of 51 seconds per jam, and 37.4 minutes on track).

Within our simple “trading single passes” model, we’d have expected 44 jams to result in a score of 22*4 = 88 points per team – clearly we’re underperforming by more than 50%. Part of this is because of power jams; there were 11 of them in this game, and most resulted in multiple scoring passes. However, the highest scoring jams weren’t power jams at all, they were instances where a large blocker numbers advantage (due to blocker penalties) gave a huge offensive advantage to that team.

Even compensating for swings due to penalties, though, we see that multiple passes are a bit more common than you might expect, and the variation in scoring is pretty wide per jam. (In fact, as the footnote* notes, points-per-jam in WFTDA Division 1 is around 8 to 9, or 2 passes, with close games closer to 6pts per jam!)

In order to model this more effectively, we’ll have to build a proper statistical model, which we’ll begin in the next post in this series….

* In fact, rollerderbynotes already did this for us, for many years running. In WFTDA high level play, at least, there’s been solidly 43 to 44 jams per bout for the last several years, on average. (To look ahead to the rest of our discussion, Windyman also determines the Points-per-Jam metric for each year’s playoffs, which usually rests around the 8 to 9 mark – or about 2 passes per jam, on average.)


About aoanla

Aoanla is a physicist/systems support guy for the UK bit of the LHC experiment at CERN in real life, and therefore already had some experience in looking at high-speed collisions before getting into roller derby. He writes bout reports for the bouts he turns up to on his own blog, but is now planning on writing articles and bugging people for interviews here, too.
This entry was posted in Articles. Bookmark the permalink.

One Response to Roller Derby and Simulation… Pt1

  1. Pingback: Ranking of Roller Derby Internationally | scottish roller derby

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s