Ranking mechanisms from sport and Roller Derby

Rankings

With so many rankings we have to compare, it’s difficult to reasonably display all of them in their entirety. The output is available at [link] in full, for those who are interested.

In the interests of space, we’ll look primarily at the top 10 places predicted by each method. Firstly, working with the full set of all teams who are connected by at least one game:

Offense/Defence Massey(diff) Massey(ratio)
London Rollergirls
Victorian Roller Derby League
Gotham Girls Roller Derby
Angel City Derby Girls
Rose City Rollers
Bruise Crew
Crime City Rollers
Atlanta Rollergirls
Jacksonville Roller Girls
Arch Rival Roller Derby
Gotham Girls Roller Derby
Victorian Roller Derby League
Rose City Rollers
Angel City Derby Girls
London Rollergirls
Texas Rollergirls
Jacksonville Roller Girls
Arch Rival Roller Derby
Atlanta Rollergirls
Rat City Roller Girls
Gotham Girls Roller Derby
Victorian Roller Derby League
Angel City Derby Girls
London Rollergirls
Rose City Rollers
Texas Rollergirls
Jacksonville Roller Girls
Arch Rival Roller Derby
Atlanta Rollergirls
Rat City Roller Girls
Massey(Keener) PVD(diff) PVD(ratio)
Gotham Girls Roller Derby
Victorian Roller Derby League
Angel City Derby Girls
London Rollergirls
Rose City Rollers
Texas Rollergirls
Jacksonville Roller Girls
Arch Rival Roller Derby
Atlanta Rollergirls
Minnesota RollerGirls
Gotham Girls Roller Derby
Victorian Roller Derby League
Rose City Rollers
Angel City Derby Girls
London Rollergirls
Rat City Roller Girls
Arch Rival Roller Derby
Crime City Rollers
Dallas Derby Devils
Montreal Roller Derby
Victorian Roller Derby League
Angel City Derby Girls
London Rollergirls
Gotham Girls Roller Derby
Rose City Rollers
Rat City Roller Girls
Minnesota RollerGirls
Texas Rollergirls
Denver Roller Derby
Arch Rival Roller Derby
Keener Spring(diff) Spring(ratio)
Victorian Roller Derby League
Gotham Girls Roller Derby
Rose City Rollers
London Rollergirls
Angel City Derby Girls
Arch Rival Roller Derby
Rat City Roller Girls
Texas Rollergirls
Minnesota RollerGirls
Jacksonville Roller Girls
Victorian Roller Derby League
Gotham Girls Roller Derby
Angel City Derby Girls
Rose City Rollers
London Rollergirls
Rat City Roller Girls
Minnesota RollerGirls
Arch Rival Roller Derby
Texas Rollergirls
Jacksonville Roller Girls

As can be seen, some of the ranking mechanisms produce more “reasonable” results than others. Allowing such a huge variation in ability to enter the ranking calculations causes instabilities in the models which they’re simply not designed for. Nevertheless,

Restricting our algorithms to purely consider the Top 40 teams in the first place, we get the following Top 10s:

Offense/Defence Massey(diff) Massey(ratio) Massey(Keener) PVD(diff) PVD(ratio) Keener Spring(diff) Spring(ratio)
Gotham Girls Roller Derby
Victorian Roller Derby League
London Rollergirls
Angel City Derby Girls
Rose City Rollers
Arch Rival Roller Derby
Crime City Rollers
Texas Rollergirls
Jacksonville Roller Girls
Denver Roller Derby
Gotham Girls Roller Derby
Victorian Roller Derby League
London Rollergirls
Rose City Rollers
Angel City Derby Girls
Texas Rollergirls
Arch Rival Roller Derby
Jacksonville Roller Girls
Minnesota RollerGirls
Rat City Roller Girls
Gotham Girls Roller Derby
Victorian Roller Derby League
London Rollergirls
Angel City Derby Girls
Rose City Rollers
Jacksonville Roller Girls
Texas Rollergirls
Arch Rival Roller Derby
Minnesota RollerGirls
Rat City Roller Girls
Massey(Keener) PVD(diff) PVD(ratio)
Gotham Girls Roller Derby
Victorian Roller Derby League
London Rollergirls
Angel City Derby Girls
Rose City Rollers
Texas Rollergirls
Jacksonville Roller Girls
Arch Rival Roller Derby
Minnesota RollerGirls
Rat City Roller Girls
Angel City Derby Girls
Victorian Roller Derby League
Minnesota RollerGirls
Montreal Roller Derby
Arch Rival Roller Derby
Rat City Roller Girls
Rose City Rollers
Gotham Girls Roller Derby
London Rollergirls
Texas Rollergirls
Victorian Roller Derby League
Angel City Derby Girls
Minnesota RollerGirls
Montreal Roller Derby
Rat City Roller Girls
Crime City Rollers
Texas Rollergirls
Arch Rival Roller Derby
Rose City Rollers
Gotham Girls Roller Derby
Keener Spring(diff) Spring(ratio)
Victorian Roller Derby League
Gotham Girls Roller Derby
Rose City Rollers
London Rollergirls
Angel City Derby Girls
Arch Rival Roller Derby
Rat City Roller Girls
Texas Rollergirls
Minnesota RollerGirls
Jacksonville Roller Girls
Victorian Roller Derby League
Gotham Girls Roller Derby
London Rollergirls
Rose City Rollers
Angel City Derby Girls
Minnesota RollerGirls
Rat City Roller Girls
Arch Rival Roller Derby
Texas Rollergirls
Jacksonville Roller Girls

Tests.
It’s important to also validate the results of your ranking (regardless of how good they look to you). We validated our rankings by comparing their predicted scores for bouts in the 2 months after the ranking period against the actual results in the FTS dataset. It’s worth noting that most reviews of prediction systems for other sports attain “best prediction” rates of less than 80% (and that purely on getting wins v losses right)*.

We present two metrics: the absolute win prediction rate (“how many wins were correctly predicted”) and the root-mean-squared score prediction accuracy (“how far off the predicted score we were, on average”). These are quite different, especially for close ranked teams – you might get a win wrong, but only because the game was close and your actual scores were just off; but you might also get a win right, but get the predicted scores totally wrong (just in the “safe” direction). We also note that the PVD results don’t seem to actually match the metric they were optimised for, so their RMS accuracy scores are particularly huge, given their overall win prediction rates.

Again, starting with the prediction success for the entire largest connected field (several hundred teams, over the sampled period), we see that the Massey methods do the best out of the “traditional” methods, achieving a comfortable 75% accuracy, which is considered pretty good for such a large field.
The pure Keener method is terrible, managing worse than chance at predicting winners; although the PVD methods (which are, in some senses, improvements on Keener) do fairly well by extending the metric used.
Our novel Spring methods do the best out of the field, with 81% plus success rates, almost unaffected by the core metric (score difference or score ratio).

*RMS Error is in different units between diff and ratio based metrics, and is only comparable within a type

Offense/Defense Massey(diff) Massey(ratio) Massey(Keener)
Correct % 66.7% 75.3% 74.7% 75.7%
RMS Error* 0.93 108 0.991 0.931
Keener PVD(diff) PVD(ratio) Spring(diff) Spring(ratio)
Correct % 43.8% 70.0% 71.6% 82.2% 81.3%
RMS Error* 1.00 1400 9.66 84.5 0.611

For just the WFTDA Top 40 teams for the time period, our metrics perform, on the whole, a little better (as expected), although the
PVD methods seem to suffer slightly, surprisingly, but the random components of their mechanism (the choice of the teams to use as “pivots” for their expansion of the metric) may also explain this variation.
Notably, the Spring methods are tied at an extremely high Win Prediction ratio of 89.6%, which is exceptional for purely score-based prediction.

*RMS Error is in different units between diff and ratio based metrics, and is only comparable within a type

Offense/Defense Massey(diff) Massey(ratio) Massey(Keener)
Correct % 73.5% 87.7% 78.9% 78.9%
RMS Error* 0.60 67.8 0.742 0.708
Keener PVD(diff) PVD(ratio) Spring(diff) Spring(ratio)
Correct % 44.6% 68.4% 68.4% 89.6% 89.6%
RMS Error* 1.16 1490 9.40 44.3 0.313

Conclusions

As can be seen, some success at prediction can be obtained by many of the tested ranking schemes for other sports, when applied to Roller Derby. As we’ve held for some time, the Massey methods are the best of these, attaining as high as 88% accuracy for the Top 40 circle of WFTDA leagues.
Our Spring-based method is an improvement on this, introducing a natural physical representation for the reliability of results, and a well-supported way of introducing the effect of time on the applicability of a sample to the current strength of a team. The result is a noticeable improvement even on Massey methods, which struggle to cope with rapidly changing team performance over time.


*Some of this is due to inherent randomness in the sport, of course – it’s not always the case that the “best” team, statistically, wins, especially if the two teams are closely matched.
Generally, however, absolute prediction rates of 73% to the low 80%s are considered gold-standard in the typical Sports Prediction literature, regardless of the sport.

 


[1]WFTDA Rankings
[2]DerbyMath
[3]FTS ranking scheme
[4]Toposort SRD [Also available in code in ranking git repo]
[5]Massey/LSQ SRD [Also available in code in ranking git repo]
[6]Bayesian SRD [Also available in code in ranking git repo]
[7]See, e.g. http://netprophetblog.blogspot.co.uk/ which has explored many more ranking schemes in this context.
[8]Govan, Anjela Y et. al.:”Offense-Defense Approach to Ranking Team Sports” Journal of Quantitative Analysis in Sports, Vol 5, Issue 1, Article 4 2009
[9]Massey http://www.masseyratings.com/theory/massey97.pdf
[10] Keener DOI:10.1137/1035004
[11]PV ranking https://codeandfootball.wordpress.com/2011/04/10/the-simple-ranking-system-a-perl-implementation/

 


As always, code for performing most of the work in this article is available in our GitHub repo, at:https://github.com/aoanla/ranking-chain-inference
The various experimental rankings are in an Experimental subdirectory, with subdirectories per system.

In accordance with our previous policy, this article is licensed CC BY-NC-SA 4.0 (Creative Commons 4.0 – all contents of article may be freely reused and redistributed, and used in new works, as long as attribution is given, the new work is also licensed the same way, and the new work is not for Commercial use.)

 

Advertisements

About aoanla

Aoanla is a physicist/systems support guy for the UK bit of the LHC experiment at CERN in real life, and therefore already had some experience in looking at high-speed collisions before getting into roller derby. He writes bout reports for the bouts he turns up to on his own blog, but is now planning on writing articles and bugging people for interviews here, too.
This entry was posted in Articles. Bookmark the permalink.

One Response to Ranking mechanisms from sport and Roller Derby

  1. Pingback: Ranking of Roller Derby Internationally | scottish roller derby

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s