The 2018 Roller Derby World Cup: Rating and Pairing, a postmortem

The 2018 Roller Derby World Cup’s main tournament was an exceptionally large event for a track sport, with 38 teams across all experience levels from world-leading to entirely new to the world stage. With a number of constraints placed on the design of the tournament (the minimum number of games played by a team should be as high as possible, teams should not be segregated into “tiers”, accurate ratings should be produced, an elimination tournament should be seeded from competitors, with accurately selected opponents…), and just 4 days and 4 tracks to place it in, the approach used by the tournament was necessarily radical.

Those who want a good introduction to the theory of the tournament should go read Frogmouth’s article here: https://medium.com/@frogmouth_inc/roller-derby-world-cup-math-explained-d971689d9bde

In this article, we’re going to look at how the tournament did in practice, and discuss both some improvements, and some extensions to the results of the system on the basis of what we learned.


Overview

The aim of the 2018 RDWC tournament rating and scheduling system was to provide fair ratings for all 38 teams competing, across an expected strength difference on the order of 1:1000 between the top and bottom ends. We needed to provide fair ratings when teams were closely matched – but also when teams were widely differing in strength.

  • On balance, we believe that the rating and ranking system achieved this aim at least as well as any of the potential competitors would have. The final day bouts were all exceptionally close, and generally the top 16 were selected fairly, based on those performances. This is a significant technical improvement over RDWC2014.

Additional aims of the scheduling system were to provide better matchings than a group scheme would, especially for newer teams to the tournament, with no “initial seeding” expectation.

  • The system did fairly well at this, although the scheduler was hampered by additional restrictions on it (it also needed to set up an elimination tournament after it). Around 10 “poor matchings” were scheduled in Days 2 and 3: compared to an 8 Group selection on 38 teams, you would have expected at least 16, so even with constraints, the matcher provided improved pairings over group approaches.

Finally, the scheduler and rating system needed to provide as many games (as much track time) as possible for as many of the competitors present as possible, within a 4 day, 4 track schedule.

  • It’s clear that, from observation, this was achieved, within the restriction of needing a single-elimination style tournament to occupy the entire final day. A version of the schedule without a single-elimination component could have given all teams (not just some) one additional “full length” game, shortened the length of Days 2 and 3 by 2 hours each, and provided an opportunity for a “final” playoff between the top two teams at the end of Day 4.
    The restriction on the scheduler avoided almost all repeated games in the elimination phase, except for one (which would not have been close in the Day 1 ranking, and so could not have been predicted before Day 2 with just the tournament data).

In the rest of this feature, we’ll work through the two components of the system, and discuss how they did, and some additional interesting results we can extract from the bouts played.

Page 2: Results and Ratings ; Page 3: Pairings and Scheduling ; Page 4: Additional Results ; Page 5: Future Work and Improvements

Advertisements
Posted in Articles | Leave a comment

Ranking mechanisms from sport and Roller Derby

While Roller Derby still has a huge problem with availability of public statistics, relative to most other sports, there’s quite a few ranking systems around to rank teams on score.

Most of them, however, draw from WFTDA’s own points-based ranking scheme[1][2], with the sole exception being Flat Track Stats’ Elo-based ranking[3]. (The Scottish Roller Derby ranking mechanisms[4][5][6] are also totally different to WFTDA’s, but we don’t believe that we have much mindshare.)

Meanwhile, there’s a whole field of research in statistical ranking approaches for sports[7]; partly because of the financial importance of betting in sports, and partly because of the widespread availability of public datasets for the popular American college Football, Basketball and Baseball leagues, which makes testing models easy.

Of course, as these models are trained against American Football, it’s not obvious that they will work as well against Roller Derby – both because the level of “randomness” is different in different sports, and because Roller Derby’s mechanics are fairly unique*.

As always, we’ll be attempting to rank as large a number of teams as possible, rather than limiting ourselves to WFTDA members, or members of higher or lower brackets. This makes it particularly hard for any ranking system, as the skill disparity between the lowest and highest ranks is particularly large (and those teams have no competitors in common).  We’ll also, for comparison, run the same algorithms against the WFTDA Top 40 teams [calculated as of 30April2016], to give them an easier run.

The ranking models we will be covering are the “Offense/Defense Model” of Govan[8], extended Massey rankings [9] and Keener rankings[10], Principal value decomposition[11], and a novel physically inspired spring model which we introduce in this article.

As a brief overview:
Govan’s Offense/Defense model assumes that teams have two ratings – “offense” (ability to score) and “defense” (ability to reduce opponent score) – which it attempts to calculate on the basis of an iterative series of estimations. This turns out to be mathematically equivalent to performing a matrix balancing procedure via the Sinkhorn-Knopp theorem. Uniquely amongst the rating schemes here, the ODM model is concerned with the actual scores produced by both teams in a game, rather than score differences or ratios.
Massey’s Least Squares Ranking approach has been covered in the blog before, as it is the basis of the simpler of the two current SRD rankings.
Keener’s paper actually covers multiple ranking methods which he had experimented with. However, the most commonly attributed ranking mechanism uses a modified score ratio [( score + n) / (total score + 2n)  where n can be chosen] to form a matrix of values. The Perron vector (eigenvector with the largest eigenvalue) for this matrix is then taken to be the ranking of the teams. This is therefore a special case of the PVD case we discuss next.
Principle Value Decomposition methods are also matrix mathematics approaches, based on the concept that the largest eigenvalue of a matrix of observations corresponds to the most significant signal in those observations – so its associated eigenvector must give the most significant ranking values for the underlying variables (the teams). In our implementation, we borrow from a multidimensional graph layout algorithm to full out our matrix of results more completely first.
The novel Spring ranking mechanism uses a physically inspired approach to ranking teams. Model each team as a “puck” on a frictionless rod, connected to other pucks by springs which represent bouts played. Each spring has a relaxed length corresponding to the score ratio or score difference (or other metric) for the bout it represents, and a stiffness (resistance to compression and stretching) proportional to the recency of the bout (older bouts provide less resistance). We perform an optimisation to minimise the total energy of the spring system, and return the resulting positions of the pucks on the rod as their ranking (relative to the topmost puck). Unlike the other ranking mechanisms, this approach has a natural way to represent the different confidence/significance of a bout, due to age or other factors.

Massey, PVD and Spring ranking schemes all require a choice of metric to use for a bout. Conventionally, you could use Score Difference, or the logarithm of Score Ratio [not pure Score Ratio, as we need additive quantities]. We test both, as well as a modified “Keener” Ratio derived from his approach, with n chosen as 4.
As we’re sampling over a long period of time, Offense/Defence, Massey, Keener and PVD also can have “aging” parameters added to reduce the significance of older bouts to their calculation. This is somewhat adhoc, but we perform an optimisation process in order to get the best value for this in terms of prediction.

We ran the rankings on data derived from the Flat Track Stats bout database, sampling all bouts from the largest connected group of teams playing in the period [06 December 2015] to [06 May 2016]. (And also on the WFTDA Top 40, which is a subset of this group, for the same period – we have to extend our limits back to November to get a fully connected group here, due to the particular scheduling arrangements for T1 level teams!)

(The rankings and test results are on Page 2)

 


*Differences between roller derby and football include the fact that Roller Derby allows both teams to score simultaneously, includes simultaneous offense and defence, and allows rounds of scoring to be called off asymmetrically by the “dominant” scorer.

In more technical terms, all ranking systems do better the more fully-connected the graph of team games is. (That is: the more games each team has played with other teams.) The silo-ized nature of roller derby communities means that the widest possible graph for any sampling period also has those silos connected by only one or two bouts, meaning that the stability of ranking between those silos is very low – if that game was anomalous, then it affects the relative rankings of many other teams who never played each other directly.

‡The value of n here was optimised, approximately, from tests against the Spring model, but handily also matches up with the value of a single pass, which is nice for theoretical reasons. As n increases, approximately, the relative value of a massive blowout is reduced (and the difference between scoring 0 points and 1 point is also reduced). Approximately, n interpolates between “score difference” and “score ratio” measures of importance.

¶Essentially, this is an iterative optimisation on the aging parameter based on the accuracy of predictions on the test dataset mentioned later. In other words, we optimise for forward predictive accuracy, specifically. A similar method was used to derive our optimal “home advantage” factor of 0.96.

Posted in Articles | 1 Comment

Roller Derby and Simulation… Pt1

All of the roller derby rating systems I am aware of assign a single value to the “strength” of each team, whether it’s the official MRDA and WFTDA rankings and strength factors, the FlatTrackStats Elo rankings, or my own derived strength ratings.

This is partly because of the paucity of detailed data available for roller derby bouts – while all WFTDA-rules bouts tend to collect a wealth of data (scores per pass per jam, penalties, etc), which can be used to derive useful stats, that data is almost never made public, unless its a WFTDA-sanctioned bout (in which case its in the WFTDA Stats Repository, except that that’s also missing stuff since the 2016 changes to the StatsBook) or it’s one of the rare bouts to get full statsbooks uploaded to FlatTrackStats. As there’s a lot more unsanctioned bouts than sanctioned ones, this means that for most bouts all we know is the scores.

However, scores give more information than most single-value ranking systems use – most of them are concerned with either the score difference (how much team 1 beat team 2 by) or the score ratio (how many points team 1 scored for every point team 2 scored), or a derived metric related to them. Both of these schemes throw away additional “scale” information – a game where both teams produce high scores should be different to one where both teams are held to low score, for example.

There are score-based generic ranking systems for sports – the “Offense/Defence Method” attempts to assign “Offense” and “Defense” scores to teams, based on how highly they tend to score (and how low they keep their opponents’ scores) – but tests of this against Roller Derby ratings produce odd results.

We suspect that this is because Roller Derby is actually quite an odd sport – unlike many sports, both teams can be scoring points at the same time; and (except in power jams) both teams definitely need to play both offense and defence simultaneously.

So, while it definitely makes sense to consider a dual-ranking for teams on Offense and Defence, it seems that our approach will need to be more Roller Derby specific.


Making a model.

When we consider making a model of a process, we usually start with the simplest, most abstract representation we can, and only add bits if the bits we removed seem to be significant.

Our plan with this model is to then run a huge number of simulations, for different combinations of offense/defense strengths for each team, to produce a “fingerprint” score for any given combination. By statistically analysing the bout records for the roller derby community, we should be able to produce an inferred offense and defence for every team that’s played enough bouts in a given period. (This is similar in approach to how the Long-baseline Interferometer Gravitational-wave Observatory, LIGO’s observation of gravitational waves – just, essentially, a “chirp of gravity” – was mapped to a specific black hole merger event. Simulation of huge numbers of combinations of different astrophysical events gives the LIGO a library of “fingerprints” for those events, which they can then match to a given observed signal.)

Reducing Roller Derby to the most basic level, then: Roller Derby is all about the time it takes for a jammer to pass through barriers (the pack). The jammer who passes through the barrier fastest then has at least as much time as it takes the other jammer to follow to score points. (And they score points by… passing through barriers again.)

By thinking about Roller Derby this way, we can derive some basic results without even using statistics:

Firstly, we need to establish the limits on the number of jams (and the total track time) available to score in.

Under WFTDA/MRDA rules [MADE, RDCL, USARS are different], a bout consists of 2 periods of 30 minutes each. Jams are a maximum of 2 minutes long, and there’s 30 seconds between them.

Immediately, there’s a few obvious things that jump out. Firstly, as that 30 seconds of non-skating time is constant for each jam, to maximise your track time [and thus scoring potential], you need to have fewer, longer jams, rather than more shorter ones. With 2 minute jams, 4/5th (2 minutes out of 2 min 30) of the time in the bout is spent on track; with, say, 30 second jams, only 50% of the bout actually has any skating in it.

So, the maximum amount of track time, if all jams went to 2 minutes, would be: 4/5 of 60 minutes, or 48 minutes. This naturally corresponds to the smallest number of possible jams, which is 60/(2.5) or 24 jams.

The minimum amount of track time is arbitrarily close to 0, in principle (as a jammer can get lead very quickly and call it). The hypothetical maximum number of “infinitesimal jams” in that case would be 60/0.5, or 120 jams. Of course, this realistically never happens – a more reasonable measure for a minimum jam length is probably 20 seconds, resulting in a track time of 2/5 of 60, or 24 minutes, and a max jams of 60/(5/6) or 72 jams.

(The average roller derby bout, from experience, seems to have a number of jams somewhere in the 40s, corresponding to an average jam length of around 1 minute or so*.)

jamstrack1

Expected scoring is harder to estimate. In one limit, assuming one team is totally dominant over the other, so their jammers always near-instantly take lead, and have the full 2 minutes to perform scoring passes at, say 27/5 rate (for 10 passes, and a partial one, per jam – or, say, 54 points per jam), the total score would be 54*24, or 1296:0.

The (famously) highest scoring bout in history to date was Victorian Roller Derby League/Chikos [FTS79263] with a score of 921:2, really not far off this maximum rate.

Bouts this unbalanced rarely happen tough, so let’s also take the example of two very closely matched teams. We might expect these teams to trade lead jammer evenly between jams, but with the opposing jammer close behind – so each jam would result in a single scoring pass for one team. (Remember, we’re not doing statistics here, so this is an unrealistic average).

In the case that both teams are very strong defensively, the limit would be 2 minute jams for 12 single pass scores each – and a total score of about 12*4 = 48 : 48.

In the case that both teams are very weak defensively, we’ll assume our 20 second jam minimum, for 36 single pass scores each – and a total score of around 36*4 = 144 : 144.

Thanks to Flat Track Stats, we can look at the recorded history of all roller derby bouts ever, and extract the closest bouts (which I’m defining as being within about 10 pts of each other) to compare. Looking back within the history of the current WFTDA ruleset, the highest scoring close bouts are actually pretty high!

The highest scoring close bout in the last year was a
273 : 260 game between  “Wilkes-Barre/Scranton Roller Radicals” (300th WFTDA) v “Harrisburg Area Roller Derby” (297th WFTDA) [FTS67692]!
Unfortunately, no bout stats were uploaded for this bout (a huge problem for Roller Derby statistics in general), so we can’t see how this happened in any detail.

The highest scoring close bout with actual detailed stats is the Ohio : Houston 1pt game from the Skate To Thrill 2016 tournament [FTS76742]. That was a 222:221 bout, running to a total of 44 jams (an average of 51 seconds per jam, and 37.4 minutes on track).

Within our simple “trading single passes” model, we’d have expected 44 jams to result in a score of 22*4 = 88 points per team – clearly we’re underperforming by more than 50%. Part of this is because of power jams; there were 11 of them in this game, and most resulted in multiple scoring passes. However, the highest scoring jams weren’t power jams at all, they were instances where a large blocker numbers advantage (due to blocker penalties) gave a huge offensive advantage to that team.

Even compensating for swings due to penalties, though, we see that multiple passes are a bit more common than you might expect, and the variation in scoring is pretty wide per jam. (In fact, as the footnote* notes, points-per-jam in WFTDA Division 1 is around 8 to 9, or 2 passes, with close games closer to 6pts per jam!)

In order to model this more effectively, we’ll have to build a proper statistical model, which we’ll begin in the next post in this series….


* In fact, rollerderbynotes already did this for us, for many years running. In WFTDA high level play, at least, there’s been solidly 43 to 44 jams per bout for the last several years, on average. (To look ahead to the rest of our discussion, Windyman also determines the Points-per-Jam metric for each year’s playoffs, which usually rests around the 8 to 9 mark – or about 2 passes per jam, on average.)

Posted in Articles | 1 Comment

A guide to reading those Bayesian ranking charts.

While they were intended to provide a useful visual reference for skill distribution (and something I’ve personally wanted to have a source of for several years), it seems that my Bayesian probability distributions for team skill are more a source of confusion.

So, here’s a quick guide on how to read them.

We’ll work with this chart:

figure_1-es2

Representative figure (European Smackdown relative skills wrt London Rollergirls).

As you can see, there’s multiple coloured shapes on the graph. Each shape represents a given team – the key lets you see that purple/lilac is Malmö for example, and green is Glasgow (the others are all overlapping, so we’ll work with the two clearer ones).

On the x (horizontal) axis of the graph we’re measuring “points that London would score relative to this team” – so the position marked 4 represents a strength such that if that team scored 50 points, London would be expected to score 200.

On the vertical (y) axis, we’re measuring “probability”. So, the taller a rectangle is, the more probable the strength is sit on is to be accurate/the true strength of the team.

If we had perfect knowledge, then each team’s shape would be a single thin spike located directly on the x axis at their actual strength relative to London. However, we don’t have perfect knowledge – hence the need for Bayesian inference in the first place – we just have the results of games played between the teams.  [The thin spikes we show on this graph are the centres of FlatTrackStats’ predictions for each team.]

So, instead of a single thin spike, each team is represented by a roughly-bell-shaped curve (they’re only roughly bell shaped for several reasons, which we won’t go into here). The highest point on the shape for each team represents the centre of the “most likely” true strength of the team – for Malmö this is somewhere between 2.5 and 3, complicated by the double-peak and flat top; for Glasgow it’s around 40 to 50. But we can’t rule out their strength being different from that, so the curve extends out, dropping in height in proportion to the probability of the true strength being in each range – there’s a much lower probability of Malmö having a strength of 2 relative to London, and the existence of colour up to the 1 mark suggests a tiny possibility that they might have been as strong as London (and having a really bad day).

Strictly, we can only really talk about the probability of a range of values of strength for any team, which is proportional to the total area under the shape for that team. For example, roughly half of the area for Malmö’s curve lies between 1 and 2.75, so there’s a 50% chance their strength is in that range. Because of the way the curve peaks, there’s also a 50% chance that their strength is in the range 2.25 to 3.25 or so, as the curve is much higher there than it is at the edges.

(For this reason, also, a very strongly peaked curve indicates a team where there’s a lot more confidence in their strength being near the “top of the peak”, while a more shallowly sloping curve suggests a lack of certainty about the true position of their strength.)

Because of this, also, we can visually infer if a team is “significantly” better or worse than another team. If their curves overlap significantly, especially near the peaks, then this is an indication that there’s not enough data to prove that they have different strengths. This is what we mean by “statistically inseparable” – it’s certainly possible that the two teams have different strengths (they can have any strength within their curve, with a given likelihood), but the available data does not allow us to demonstrate this.

Posted in Articles | Leave a comment

Saudade and Video Games: Part 3

In parts 1 and 2 of this series, we talked about the two custom chips in the Amiga which contributed to the graphical excellence of the platform compared to its competitors. In this part, we will talk a bit about how the third and final chip, Paula, lead to the invention of an entirely new audio representation format – the tracker module – and what this meant for audio in games. Tracker software still exists today, and is heavily used in the techno music scene . While it isn’t used so much in video games now, outside of handhelds, the early Unreal engine supported a kind of tracker format for in-game music until about 2000 or so (the games Unreal, Unreal Tournament and Deus Ex all had their memorable sound tracks composed in this format, and all hold up today).

At the time the Amiga 1000 launched, almost all home computers generated audio via internal synthesizers – essentially, simple chips capable of generating simple sound types like square, sine and triangle waves, and combining multiple signals into a more complex sound. The modern music form called “chiptune” is the survival of the kind of musical techniques capable with this kind of technology.

With the Paula custom chip, however, the Amiga supported sample based audio. Instead of generating sound by adding together primitive signals, sample based audio uses recordings of real sounds as a basis for audio, playing them back either at their original pitch, or shifted to different notes. Paula was capable of playing up to 4 samples at once, two in the left channel and two in the right for stereo sound, but could change which samples it used during playback, freely.

In order to manage this newly available audio type, Amiga programmers developed a new type of audio programming tool, a “tracker”, which could be used to orchestrate the samples used and the sequence of notes to play with them. The resulting collection of samples and orchestration was saved as a “module” file, taking up a bit more space than synthesizer-based formats (because of the included samples), but still significantly less space than a full recording of an audio performance.

Screen shot of the OctaMED tracker, from xmp.sourceforge.net

Screen shot of the OctaMED tracker, from xmp.sourceforge.net

Tracker-based music allowed surprisingly high quality music to be provided with almost all Amiga games (and also filtered rapidly into other avenues, like the demoscene). Almost all of my abiding memories of Amiga games are related to their music and soundscapes, from the achingly sad piano in the Agony soundtrack, through Sensible Software’s ska-based track to Cannon Fodder, Zool’s rock’n’roll intro, and Gods’ intro “Into The Wonderful”

But more than just video games, the “community” nature of home computers in the 80s and early 90s meant that tracker music was often distributed on coverdisks (of which more later) as artistic pieces. Because tracker software was often public domain or shareware, you could play these musical pieces (for example, these tracks compiled here) in the tracker itself – I have strong memories of watching the sample tracks sweep by, and constructing my own (terrible, of course) tracks from samples ganked from these coverdisk compositions. The experience of being able to watch the actual composition being interpreted as I watched was something that deeply resonated with me, and I think contributed to a deeper sense that anything a computer did wasn’t magic, but the result of understandable, if complex, processes.

Paula was the only of the three custom chips in the Amiga never to be replaced with an upgraded version in later models. Even by 1990, the newly released Super Nintendo still couldn’t reliably match the old Amiga sound chip (thanks mainly to memory limitations), as this side by side comparison demonstrates: https://www.youtube.com/watch?v=s2kAPcMDExE

Posted in Articles | Leave a comment

Saudade and Video Games: Part 2

The first part of this series was concerned with how one part of the Amiga’s chipset, Angus, helped to make it attractive to the artistic hacking movement known as demoscene. This part concerns another part of the chipset, Denise, and how it made the Amiga an essential part of video effects and graphics work in the late 80s to early 90s.

While Angus, via its ability to modify memory in rapid and complex ways, was essential to produce powerful visual effects by modifying the representation of the display in memory, Denise was the actual graphics display chip, responsible for translating that representation into an actual signal for a television or monitor.

This was a period long before LCD or LED displays ever existed, so all display technologies essentially worked via timing on an analogue signal. You knew that the TV, for example, would sweep an electron beam over the display, covering each line in so many microseconds, and completing a sweep of the whole display from top to bottom usually 30 or 25 (or twice that, for 60 or 50Hz) times a second. You were responsible for sending a signal to the TV to turn the electron beam on or off at the right times, such that only the parts of the screen you wanted to be hit (and thus bright) were. Obviously, the faster you could send the signal, the smaller a section of display you could light up, and thus the higher the resolution (larger number of pixels) you could attain.

Denise was capable of sending timing signals as short as 70ns, corresponding to a horizontal resolution of 640 pixels in the timing of a TV of the period. The vertical resolution was more constrained by the limitations of TV raster technology, but could be pushed up to 512 pixels via a technique called “interlacing” (essentially, writing all the odd lines one frame, then all the even lines in the next, halving frame rate in favour of more pixels) at 25Hz. (By comparison, conventional “EGA” graphics adapters built into IBM PCs of the period could achieve around 640×350 resolution.)

More impressive, however, was the range of colours that Denise was designed to represent. All colour schemes for computer displays work by encoding the amount of Red, Green and Blue to mix to produce the final colour (on a loose model of human vision). The EGA standard was a 6-bit colour system – you could have 2 bits (encoding a value from 0 (off) to 3 (maximum) ) of control over each of the Red, Green and Blue channels, for a total combination of 64 colours. Meanwhile, Denise supported twice the bits – 4 bits per channel (for 16 different intensities of the primaries), for 12 bits in total – a whopping 4096 different colours!

As memory capacity was much smaller then than it is now, it wasn’t feasible to store a whole 12 bits of data for each individual pixel in memory. Instead, you would pick a palette of up to 32 colours that you wanted to use for the whole image. The image would then be represented by 5 bits per pixel, corresponding to a palette colour (indexed 0 to 31). Only 32 colours for an entire image is pretty limiting, however, so Denise provided some special features to allow for more variety.

Firstly, you could choose to specify an additional (sixth) bit of colour in “Extra Half-Brite” mode – if turned on for a given pixel, Denise would halve (darken) the palette colour referenced by the other 5 bits, providing for cheap “shadow” effects. Secondly, and more technically unique, an alternate video mode called HAM (for Hold-And-Modify) also used 6 bits to encode palette changes into the display image itself. Essentially, in HAM mode, only 4 bits would be used to reference the palette (meaning you had a choice of only 16 colours from 4096 at any point in time). The other two bits could obviously store values from 0 to 3. If set to 0, then Denise would just look up the palette colour as normal. If set to another value, however, Denise would use the palette colour referenced in the previous pixel, but change either the red (1), green (2), or blue (3) value of the palette to the value stored in the “index” provided. (This change would then persist for that palette colour unless changed again.)

By suitable encoding, then, HAM mode was capable of using all of the 4096 colours that the early Amiga was capable of displaying, all in one image. [The limitation that you could only change red, green or blue parts of a palette colour at a time meant that you couldn’t necessarily display all possible images, of course – changing a white palette colour to black would require three pixel changes in a row, and leave a coloured “halo” at the boundary as a result. Demoscene coders, of course, got around this by using Angus’ copper processor to additionally directly change the palette behind Denise’s back, resulting in a composite, more flexible, mode called “Sliced HAM”.]

(The replacement for Denise in later Amigas, Lisa, could work with a whole 8 bits per channel, rather than 6, and provided an enhanced HAM mode, HAM8, which allowed a whopping 16 million colours to be displayed at once. This is still the maximum number of colours that most displays can represent today, although modern video technology doesn’t need to use HAM-style tricks to do so. )

This extra graphical functionality meant that paint packages were a big thing on the Amiga, with Deluxe Paint being the dominant series. (Perhaps surprisingly, given their exclusive focus on games in the modern era, Deluxe Paint was an early Electronic Arts product.)

The iconic Tutankhamun image used to advertise Deluxe Paint (created by Avril Harrison)

The iconic Tutankhamun image used to advertise Deluxe Paint (created by Avril Harrison)

While I was terrible at computer art, I did spend many hours playing about in Deluxe Paint III, either trying to make sprites for computer games or trying to reproduce art from coverdisks. When it was ported to the IBM PC, Deluxe Paint became the dominant artistic tool used for game art production for much of the 1990s.

The other functionality that the Amiga provided videowise, and the thing that made it so attractive for video post-production work, was “genlocking”. As mentioned above, video work against an analogue display is all about timing. Normally, most computers would output their own timing signals, and thus could only work as a source for video images to be displayed. The Amiga, however, was also capable of locking its timings to an externally generated  signal to synchronise its video output. This meant that the output from an Amiga could be combined with a video signal from, say, a camera, providing an overlay of computer generated graphics alongside live recordings. (Without genlocking, the Amiga and the camera signals would never be in sync, and thus could not be cleanly combined.)

Many of the computer generated effects of the late 80s and early 90s were produced on Amigas, especially for television. (“The Chart Show”, an attempt at UK reproduction of the success of MTV, had the bridge sections between music videos entirely rendered on Amigas, for example. More notably, the first season of Babylon 5 had all the CGI rendered on Amigas equipped with Video Toaster hardware.) In this sense, the Amiga is a covert part of every young person’s memories of the late 80s, through its influence of on the visual appearance of television.

Posted in Articles | 1 Comment

Saudade and Video Games: Part 1

The timing of the Amiga’s 30th Anniversary yesterday was fortuitous, as I’ve been planning a series of somewhat reflective essays relating to computer culture as it affected me in the late 80s through 90s. You can consider these a kind of expansion around some of the thoughts touched on in the earlier essay.

This first essay is going to talk about another aspect of computer culture which was drawn to the Amiga as a result of its powerful additional chips for display, memory and sound manipulation: the demoscene.

To understand demoscene, you first have to understand the artistic and competitive aspects of the hacking subculture, which date back to the 1950s (in computer form). Hacking culture is about demonstrating your deep understanding of a complex system – a computer hardware, a mechanical device, or something similar – by making it do something which people less skilled in the art might think impossible, or which it was never designed to do. In the context of computers, it really emerged from talented students trying to get the limited computers of the 1950s (and later) to perform useful or interesting tasks – hacking is nothing without working against strict limitations. (The modern media concept of “hacking” meaning “to compromise the security of a system” is a derivative of one kind of hacking here, applying your knowledge of computer systems to penetrate supposedly impenetrable security, but the actual term is not so limited to purely destructive use.) Hacking, by its very nature, combines both aesthetic and technical aspects – even the earliest hacks included making a mainframe computer play musical pieces (which it was never designed for at all). *

By the late 1970s, some hackers had been incensed by attempts by early games companies to limit copying of their software via various techniques. (A common belief of hackers is that attempts to restrict information, including software, is an offence to society.) They started working to “crack” the software protection, by analogy with cracking a safe, and releasing “free”, “cracked” copies of the software by other routes. Of course, being hackers, there wasn’t any point to doing this if no-one could tell who had been smart enough to crack the security. The cracked software was therefore usually modified to display a small tag at the start, letting you know who was responsible.

Displaying a small tag isn’t very fun, though, so slowly, different hackers started working to produce more impressive pieces of artwork, or animations as their added intros.

By the mid 80s, these intros had become a separate thing to the cracked software itself. The demoscene had emerged, devoted purely to the creation of more impressive artistic displays, limited by the limited computing and storage available at the time. (Some particular aspects of demoscene place additional limits on themselves – the 64k intro scene, for example, consciously limit themselves to only 64 kilobytes of space.)

Raster Bars (horizontal and vertical), example from Wikipedia

Raster Bars (horizontal and vertical), example from Wikipedia, a still from Angel’s “Coppermaster” demo.

Given the opportunity to use the Amiga’s additional resources, the demoscene flourished on the platform, helped mostly by the memory management chip called Angus. As well as controlling access to memory itself, Angus had two features that were critical to impressive video effects – a blitter and a copper.  The blitter was a specialised unit designed to perform very rapid operations on rectangular sections of memory – including the display, as that must be represented in memory before being sent to a screen. Most trivially, it could copy one rectangular section to another rectangular section (duplicating an area of a display, for example), but it could also be made to combine multiple areas into a single result (for example, using one section as a “mask” to display only certain parts of a second rectangle). The copper, or co-processor, was a simple, but fast, system which could change various settings of the display in synchrony with the video display. The classic use of the copper was to change colours in the display as it was being drawn, creating a “horizontal striping” effect known as “raster bars”. However, it could change practically any aspect of the display configuration during the display process, including the actual resolution – you could have two parts of the screen using entirely different resolutions, at the same time, with the copper switching between them appropriately.

By creative use of the blitter and copper, combined with the CPU itself, and tracker music for high quality audio in a small space, demoscene could accomplish impressive things on the Amiga. Some examples of the height of the art are: Incision (a 64k intro) and TBL’s TINT (note that both of these were generated, in real time, on hardware a thousand times slower than a modern computer).

Many demoscene artists crossed over with the computer games industry as well, and many computer games showed the influence of their more flashy visual effects. [More recently, the programmer of Angry Birds is an old demoscener, and noted that his experience in doing impressive things with limited hardware was pivotal in making the game work on the small capabilities of a mobile phone.] The modern FPS, written entirely in 96kb, .kkrieger, is a stunning example of what demoscene techniques can bring to games development.

Demoscene is still going strong, although the limitations are now mostly self-imposed, as modern computers are far too powerful to provide an interesting restriction by themselves. An example of the current cream of the crop is in this YouTube playlist, but the scene itself mostly accessible via http://www.demoscene.info/the-demoscene/

* For more information on early hacker culture, the most accessible reference is still Steven Levy’s “Hackers: Heroes of the Computer Revolution”, ISBN 0-385-19195-2. A copyright-free copy of the first two chapters is available via Project Gutenberg.

Posted in Articles, Retrocomputing | Tagged | 2 Comments