## A guide to reading those Bayesian ranking charts.

While they were intended to provide a useful visual reference for skill distribution (and something I’ve personally wanted to have a source of for several years), it seems that my Bayesian probability distributions for team skill are more a source of confusion.

So, here’s a quick guide on how to read them.

We’ll work with this chart:

Representative figure (European Smackdown relative skills wrt London Rollergirls).

As you can see, there’s multiple coloured shapes on the graph. Each shape represents a given team – the key lets you see that purple/lilac is Malmö for example, and green is Glasgow (the others are all overlapping, so we’ll work with the two clearer ones).

On the x (horizontal) axis of the graph we’re measuring “points that London would score relative to this team” – so the position marked 4 represents a strength such that if that team scored 50 points, London would be expected to score 200.

On the vertical (y) axis, we’re measuring “probability”. So, the taller a rectangle is, the more probable the strength is sit on is to be accurate/the true strength of the team.

If we had perfect knowledge, then each team’s shape would be a single thin spike located directly on the x axis at their actual strength relative to London. However, we don’t have perfect knowledge – hence the need for Bayesian inference in the first place – we just have the results of games played between the teams.  [The thin spikes we show on this graph are the centres of FlatTrackStats’ predictions for each team.]

So, instead of a single thin spike, each team is represented by a roughly-bell-shaped curve (they’re only roughly bell shaped for several reasons, which we won’t go into here). The highest point on the shape for each team represents the centre of the “most likely” true strength of the team – for Malmö this is somewhere between 2.5 and 3, complicated by the double-peak and flat top; for Glasgow it’s around 40 to 50. But we can’t rule out their strength being different from that, so the curve extends out, dropping in height in proportion to the probability of the true strength being in each range – there’s a much lower probability of Malmö having a strength of 2 relative to London, and the existence of colour up to the 1 mark suggests a tiny possibility that they might have been as strong as London (and having a really bad day).

Strictly, we can only really talk about the probability of a range of values of strength for any team, which is proportional to the total area under the shape for that team. For example, roughly half of the area for Malmö’s curve lies between 1 and 2.75, so there’s a 50% chance their strength is in that range. Because of the way the curve peaks, there’s also a 50% chance that their strength is in the range 2.25 to 3.25 or so, as the curve is much higher there than it is at the edges.

(For this reason, also, a very strongly peaked curve indicates a team where there’s a lot more confidence in their strength being near the “top of the peak”, while a more shallowly sloping curve suggests a lack of certainty about the true position of their strength.)

Because of this, also, we can visually infer if a team is “significantly” better or worse than another team. If their curves overlap significantly, especially near the peaks, then this is an indication that there’s not enough data to prove that they have different strengths. This is what we mean by “statistically inseparable” – it’s certainly possible that the two teams have different strengths (they can have any strength within their curve, with a given likelihood), but the available data does not allow us to demonstrate this.