2014/2015 League Commentary

I've done a data dump automatically graphing a bunch of statistics from the season 2014/15 concerning five leagues, the four English leagues, Premiership, Championship, League One and League Two as well as the Scottish Premiership.

You can peruse all of the graphs here. But you'll likely get a bit tired of seeing scatter plots. Hence here, I explain a few of the more interesting graphs.

In [1]:
%matplotlib inline
In [2]:
import league_analysis
epl_league = league_analysis.year_201415.epl_league
ech_league = league_analysis.year_201415.ech_league
elo_league = league_analysis.year_201415.elo_league
elt_league = league_analysis.year_201415.elt_league
spl_league = league_analysis.year_201415.spl_league
all_leagues = [epl_league, ech_league, elo_league, elt_league, spl_league]

Shots For

One of the simplest metrics we measure is the number of shots a team takes. Generally the more shots a team takes the better they do. Of course some teams are quite frugal in that they only take shots when there is a high chance of scoring whilst others shoot on sight. However, here are a couple of graphs from the English and Scottish Premierships, showing a slight difference in the way that the two champions have gone about their successes.

In [3]:
league_analysis.graph_leagues('Shots For', 'Points', leagues=[epl_league, spl_league],
              annotate_teams=['Chelsea', 'Liverpool', 'Man City', 'QPR', 'Swansea', 'Arsenal',
                              'Celtic', 'St Mirren', 'Motherwell', 'St Johnstone', 'Aberdeen'])
line of best fit: 0.1662 x - 29.48
line of best fit: 0.1744 x - 13.14

Chelsea managed to win the league whilst rating only 4th in the league for shots taken. This is likely due to their defensive approach in the second half of the season. One could conclude that Chelsea were simply winning a lot of matches from early on and hence spent a lot of time in the lead. Teams tend to take fewer shots when defending a lead. However, look at Celtic, a theme which occured in many of the Scottish premiership teams is that Celtic are a huge outlier in terms of their output, but they are close to being exactly on the line of best fit. In other words, Celtic are scoring more points than their rivals, but not more than you might expect from their underlying statistics.

QPR are another hugh outlier, having finished bottom of the league despite taking a large number of shots, a number of shots more in line with the top half of the league. Of course, they may be taking a bunch of dreadful shots, but still they had some adventure about them.

A couple of other outliers, Aberdeen and St Johnstone appear to have gotten quite a bit of mileage (that is points) out of their shots. Aberdeen still ranked second in the league for shots taken (and finished second in the league on points), but they are above the trend line. St Johnstone on the other hand are well above the trend line and well down on the number of shots they took.

Shots Against

Similar to the above except that the fewer shots a team allows to be taken against them the more points you get. So the relationship is inverse.

In [4]:
league_analysis.graph_leagues('Shots Against', 'Points', leagues=[epl_league, spl_league],
              annotate_teams=['Chelsea', 'Liverpool', 'Man City', 'QPR', 'Swansea', 'Arsenal',
                              'Celtic', 'St Mirren', 'Motherwell', 'St Johnstone', 'Aberdeen'])
line of best fit: -0.1519 x + 127.1
line of best fit: -0.198 x + 128.8

In particular here though note that QPR are pretty much bang on the trend line. This means that they were not simply involved in games with a lot of shots. Chelsea are again quite far above the trend-line and again not the best performers in the league despite having the most points. Again Celtic represent an outlier, this time though they are above the trend-line. St Mirren can feel a touch agrieved being below the trend-line for both 'Shots For' and 'Shots Against'.

Total Shots Ratio

Or TSR. This is the ratio of the shots taken in a game, taken by the given team. So if Chelsea play Arsenal and Chelsea take 9 shots whilst Arsenal take 3, then Chelsea have a TSR for the game of 9/12 = 0.75 and Arsenal have a total shot ratio of 3/12 = 0.25. James Grayson has written about this as being one of the more repeatable statistics and hence a good measure of underlying ability. Although we do not that TSR is likely to be affected by the state of the game, that is whether a team is winning, drawing or losing.

In [5]:
league_analysis.graph_leagues('TSR', 'Points', leagues=[epl_league, spl_league],
              annotate_teams=['Chelsea', 'Liverpool', 'Man City', 'QPR', 'Swansea', 'Arsenal',
                              'Celtic', 'St Mirren', 'Motherwell', 'St Johnstone', 'Aberdeen'])
line of best fit: 195.7 x - 45.61
line of best fit: 169.5 x - 31.23

So here again, Chelsea seem to have done really well from their underlying shots ratios. They are still 4th, but if I were a Chelsea fan I'd worry that some of those winning points were unmerited. Then again, Jose Mourinho does seem like the kind of manager who can have a successful strategy that outwits the statistics. Again, Celtic are an outlier but it does not seem hugely, if at all, unmerited.

Again, Swansea and St Johnstone appear to be over-performing and again St Mirren and QPR appear a touch unfortunate to have only garnered the points totals they did.

TSR is so important I've included the other three leagues here as well:

In [6]:
league_analysis.graph_leagues('TSR', 'Points', leagues=[ech_league, elo_league, elt_league],
              annotate_teams=['Blackpool', 'Watford', 'Norwich', 'Bournemouth', 'Wigan',
                              'Millwall', 'Charlton',
                              'Bristol City', 'Yeovil', 'Milton Keynes Dons',
                              'Burton', 'Tranmere', 'Cheltenham', 'Shrewsbury', 'Dag and Red'])
line of best fit: 184.4 x - 29.86
line of best fit: 133.1 x - 3.673
line of best fit: 220.7 x - 47.45

So both Bournemouth and more so Watford appear to have overperformed a bit. In fact, in all three leagues the champions seem to have overperformed at least according to the TSR metric. And in all three cases the second placed team had a points total much more in-line with their shots ratios. The fact that all three of these leagues (and the English Premiership) has this property is almost certainly a statistical fluke. However, if TSR is any measure of ability at all, then it appears that a team can win the league with a dose of good fortune.

At the other end of the leagues this seems even more pronounced. As if relegation really is mostly bad luck. It's clear the Blackpool were having off-the-field problems are were probably the worst team in the league, but they likely deserved some more points than they got. Their fellow relegatees appear, by TSR standards, to be decidedly mid-table. Charlton fans beware.

Similarly Yeovil seem a touch unlucky and Transmere. Cheltenham likely deserved to be relegated but Dagenham should be a touch worried.

You can check the main data dump for some other similar graphs of goals for/against which as you might expect line-up pretty well with points scored. In most of these graphs you can see that Celtic are a huge outlier in terms of their numbers, but they are more or less bang-on the trend line. This is in contrast to the other leagues where the winners tend to out-perform what their statistics suggest (other than for goals of course). In particular the shots on target graphs are interesting. Huddersfield in the English Championship and York in the English League Two are incredible outliers here. They take a lot of shots on target but don't seem to have been rewarded with as many points.

Shots on Target Ratio

There are other similar ratios, some look at the ratio of on target shots which are goals. Again you can have a look at the full data dump for some of these ratios against points. Here is a kind of combined ratio, called TSOTR. It measures the ratio shots to shots on target. The rough idea is that if a high proportion of the shots a team takes are on target then then they are creating good chances. The TSOTR ratio combines this with the ratio of shots to shots on target that a team allows, hence it is the equivalent defensive stat. So we get TSOTR by computing:

(shots on target for / shots for) - (shots on target against / shots against)

This gives us a mean of zero and a range from minus one to one. The main idea is to combine this with TSR, so that it offsets an artificially good or bad TSR. For example you might have taken 10 shots compared to your opponent's 20 shots, but 8 of your 10 were on target and only 3 of theirs were. Of course, in such a strange case it's not really clear which side were the better team, or deserved more goals. But just by using such stats we can build up more knowledge about each team's underlying strength.

In [7]:
league_analysis.graph_leagues('TSOTR', 'Points', leagues=[epl_league, spl_league],
              annotate_teams=['Chelsea', 'Man City', 'Everton', 
                              'Celtic', 'St Mirren', 'Motherwell'])
line of best fit: 203.2 x + 52.81
line of best fit: 192.1 x + 52.71

First of all, note that the relationship between TSOTR and points is not so incredible. So here we see Chelsea and Everton with incredible TSOTR values. Notice that another low performer Motherwell in the Scottish premiership also had a quite incredible TSOTR. In both cases it appears that it is caused by opposition teams taking a reasonable number of shots but not getting a lot of those on target. I'm not sure what to make of this.

PDO

PDO was first developed in ice hockey and is intended to be a measure of luck. The basic premise is that teams are essentially trying to take shots on target (and avoid their opponents doing the same), and whether those shots on target are converted into goals is to at least some degree outwith a team's control. You can see how, with the smaller goals, this might be a more effective measure of luck in ice hockey than football. Still there is some evidence that the measure regresses to the mean, meaning that the PDO of a team is not a measure of skill, or underlying talent/tactics within a team. Hence it may well be quite a decent measure of luck.

So let's first plot PDO against points:

In [8]:
league_analysis.graph_leagues('PDO', 'Points', leagues=[epl_league],
              annotate_teams=['Chelsea', 'Man City', 'Everton', 'Liverpool',
                              'West Brom', 'Leicester'])
line of best fit: 188.3 x + 53.25

So here we see that Chelsea have been the recipient of a lot of luck. Now it may be that Chelsea spent a lot of time in the lead of games which is thought to increase PDO (for various reasons), but still, if I were a Chelsea fan I'd be worried that that PDO value is not sustainable. However, also note that Chelsea have out-performed their luck. They got a lot more points than can be attributed to luck alone.

Liverpool also look like they have gotten more points than their luck would dictate and having a PDO of less than 0 may mean they are 'due' for a bit of extra luck next season.

The teams to worry about would be those that didn't achieve a great points haul but cannot blame that on luck as calculated by PDO. In this case West Brom and Leicester. Though you may have been a touch concerned about them in any case.

In [9]:
league_analysis.graph_leagues('PDO', 'Points', leagues=[ech_league],
              annotate_teams=['Bournemouth', 'Wigan', 'Charlton', 'Bolton', 'Huddersfield'])
line of best fit: 206.2 x + 63.02

Not that much to say here, it seems that points have followed quite closely from PDO. One conclusion might be that the league is very competitive. Of course if you're too good for the league you get promoted and if you're not good enough you get relegated so this may well be the case. On this basis, Wigan were pretty unfortunate to be relegated. I'd also say Huddersfield might be in for a better season.

In [10]:
league_analysis.graph_leagues('PDO', 'Points', leagues=[elo_league],
              annotate_teams=['Crawley Town', 'Bristol City',
                              'Yeovil', 'Leyton Orient', 'Notts County'])
line of best fit: 115.2 x + 63.26

This is interesting mostly because of the relegation battle. But first note that Bristol City the league winners were assisted by a very high PDO. The trend line shows you can expect 115.2 points for every full unit of PDO. If we bring Bristol City back to even the second highest PDO in the league of about 0.12 from 0.2, then we would expect Bristol City to lose 115.2 * 0.08 or about 9 points. Bristol City won the league from MK Dons by 8 points.

Of the relegation teams, Yeovil and Leyton Orient seem a touch unfortunate. Notts County even a little so as well, though there are many teams with a lower PDO than Notts County. But Crawley Town are the team that were given every PDO-of-a-chance to stay in the division and still failed.

I would have to observe that PDO is not affecting points as much as in the championship. That does not square well with the idea that leagues with both promotion and relegation are highly competitive with PDO separating out the teams.

In [11]:
league_analysis.graph_leagues('PDO', 'Points', leagues=[elt_league],
              annotate_teams=['Dag and Red', 'Cheltenham', 'Tranmere',
                              'Mansfield', 'Oxford'])
line of best fit: 146 x + 62.86

The most obvious thing to point out here is that the relagation teams, Cheltenham and Tranmere appear to have been a little unfortunate. Although, they do rank low in PDO, it is not that low. They rank low in PDO with only Mansfield and Oxford (who both incidentally have battled well against very low PDO numbers) with lower PDO values, but with the exception of Dagenham, the PDO values in the league are not spread out much, meaning that most teams had quite close to average luck. Dagenham, are a massive outlier. With the caveat that PDO is not an exact measure of luck, it would appear that Dagenham were hugely fortunate, and they cannot really expect that much of a helping hand next season.

In [12]:
league_analysis.graph_leagues('PDO', 'Points', leagues=[spl_league],
              annotate_teams=['Celtic', 'Motherwell', 'St Mirren', 'Aberdeen'])
line of best fit: 193.4 x + 53.06

Celtic are again a huge outlier in the premier league. This graph if anything calls into question the idea that PDO is a measure of luck. No one seriously thinks Celtic were particularly lucky to win the league. They still are out-performing their PDO, but not by much. Again this may be an indicator of spending a lot of time in the lead of matches.

Ignoring Celtic and it does seem that the relegation and play-off teams, St Mirren and Motherwell were on the end of some not altogether fantastic luck.

Home and Away PDO

To give an idea of how unrepeatable PDO is, here are the leagues with the PDO at home plotted against the PDO away. There is a bit of correlation, suggesting that playing style and general ability have something to do with PDO. Additionally it could be that some teams play different at home than they do away.

In [13]:
league_analysis.graph_leagues('Home PDO', 'Away PDO', leagues=all_leagues,
                              annotate_teams=[],
                              get_x_stat=lambda l,t: l.home_team_stats[t].pdo,
                              get_y_stat=lambda l,t: l.away_team_stats[t].pdo,
                             )
line of best fit: 0.3657 x - 0.02863
line of best fit: 0.241 x - 0.007052
line of best fit: 0.4445 x + 0.01878
line of best fit: 0.0428 x - 0.02513
line of best fit: 0.3421 x - 0.01338
In [14]:
league_analysis.graph_leagues('Home Points', 'Away Points', leagues=all_leagues,
                              annotate_teams=[],
                              get_x_stat=lambda l,t: l.home_team_stats[t].points,
                              get_y_stat=lambda l,t: l.away_team_stats[t].points
                              )
line of best fit: 0.7017 x + 0.5345
line of best fit: 0.7083 x + 2.485
line of best fit: 0.5182 x + 11.46
line of best fit: 0.4229 x + 11.57
line of best fit: 0.6463 x + 5.316

Here is just the English premier league, compare the graphs of home and away PDO to the graph of home and away points:

In [15]:
annotate_teams = ['Crystal Palace']
league_analysis.graph_leagues('Home PDO', 'Away PDO', annotate_teams=annotate_teams,
              leagues=[epl_league],
              get_x_stat=lambda l,t: l.home_team_stats[t].pdo,
              get_y_stat=lambda l,t: l.away_team_stats[t].pdo
              )
league_analysis.graph_leagues('Home Points', 'Away Points', annotate_teams=annotate_teams,
              leagues=[epl_league],
              get_x_stat=lambda l,t: l.home_team_stats[t].points,
              get_y_stat=lambda l,t: l.away_team_stats[t].points
              )
line of best fit: 0.3657 x - 0.02863
line of best fit: 0.7017 x + 0.5345
In [16]:
import itertools
In [17]:
def get_league_differences(league):
    def get_difference(team):
        home_points = league.home_team_stats[team].points
        away_points = league.away_team_stats[team].points
        return away_points - home_points
    differences = [(t, get_difference(t)) for t in league.teams]
    return differences
all_leagues = league_analysis.year_201415.all_leagues
differences = [diff for league in all_leagues for diff in get_league_differences(league)]
differences = sorted(differences, key=lambda pair: pair[1], reverse=True)
league_analysis.display_pairs(differences[0:17])
Wycombe 10
Sheffield Weds 8
Notts County 8
St Mirren 8
Leyton Orient 7
Doncaster 7
Crystal Palace 6
Wigan 5
Bradford 5
Morecambe 3
Norwich 2
Colchester 2
York 2
Ross County 2
Coventry 1
Oxford 1
Exeter 0

So note that Crystal Palace were the only team in the Premier League to score more points away from home than they did at home (and had the 6th highest ratio of the five leagues we have been looking at).

Home TSR / Away TSR

But of course your home/away points are, to some extent, based on your home/away PDO. For comparison here is the home tsr of a team compared to the away tsr. We expect these to be highly correlated. Of course we expect each team to have a better home tsr than they do away tsr. However, because we think tsr is a some measure of at least some skill, then we expect the two to be correlated. That's what we find.

In [18]:
league_analysis.graph_leagues('Home TSR', 'Away TSR',  leagues=all_leagues,
                              annotate_teams=[],
                              get_x_stat=lambda l,t: l.home_team_stats[t].tsr,
                              get_y_stat=lambda l,t: l.away_team_stats[t].tsr
                              )
line of best fit: 0.9435 x - 0.09411
line of best fit: 0.8237 x - 0.006376
line of best fit: 0.8542 x - 0.0138
line of best fit: 0.8708 x - 0.01265
line of best fit: 0.8071 x + 0.008291

Comments

Comments powered by Disqus