Don't Blame the Data

You can disagree with me, but don't blame the data

Analysing AFL Team Age... Properly!

When using the median would be a more appropriate measure of the oldest (or youngest) AFL playing lists

3 minute read

After seeing my beloved Hawthorn Hawks tweet out an article on their website regarding player ages for each team, it got me riled up that the media love to cite Champion Data’s “average age” as their measure.

As can be seen with the age distribution of players as at the start of the 2020 AFL Premiership Season, the figure is going to be skewed by the older players on lists, especially with guys like Shaun Burgoyne, Kade Simpson, Gary Ablett Junior…



Mean, Median…what of it?

The mean, from Wikipedia;

For a data set, the arithmetic mean, also called the mathematical expectation or average, is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values.

The median, also Wikipedia;

The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the “middle” value. For example, in the data set {1, 3, 3, 6, 7, 8, 9}, the median is 6, the fourth largest, and also the fourth smallest, number in the sample. For a continuous probability distribution, the median is the value such that a number is equally likely to fall above or below it.

When data is not normally distributed around the mean (as is the case here), using it to describe the measure of centre is misleading. In cases like this, the median provides a more representative statistic. With a positively skewed distribution like the one above, statistics 101 tells us that the median will be less than the average, while the opposite would hold true where the distribution was negatively skewed.

The average age of all players on team lists for the 2020 AFL premiership season is 24.17 years, while the median age is 23.6 years.



So why do they still report averages?

I suspect media outlets report the average age for two reasons;

  1. Who can be bothered explaining the nuances between the mean and the median; or more likely,
  2. The average is more prone to pull this number higher for some teams, fuelling the narrative they want to run with.



Implications of the different measures

When we plot both the average and median ages of each of the teams, we can see that we get some very different outcomes.

We are led to believe that the Geelong Cats have the fourth oldest list but when using the median as the statistic, they are the 11th oldest playing list, with the Kangaroos, GWS, Saints, Demons, Bombers, Bulldogs and Tigers all having older lists than them.

Some of the other implications:

  • The Brisbane Lions have the youngest list for the 2020 season, not the Gold Coast
  • The Hawks are actually the fourth oldest list, with North Melbourne being third
  • The Demons are the the 7th oldest list, not the 12th oldest
  • The Power are the equal third (with the Swans) youngest list, not the 10th oldest



Why does any of this matter?

How this figure is reported probably doesn’t matter ultimately; the age of a playing list likely isn’t all that important when it comes to the ultimate glory of premiership success, as can be seen in the different age distributions of recent premiership teams in this great little analysis at TheArc.

That aside, it would be nice to see this figure reported a little more accurately.

Rant. Over.

The data for this post and the code used to scrape it can be found here.

comments powered by Disqus

Recent posts

See more

Categories

About

A data visualisation blog. Sports. Analytics. Sports analytics