EPL 31-02 October/November 2015

EPL 31 October - 02 November

As usual we have a quick shot-statistical look at the weekend matches in the English Premier League.

In [1]:
%matplotlib inline
import league_analysis
from IPython.display import display, HTML
epl = league_analysis.epl
weekend_matches = league_analysis.get_matches(epl, '31/10/2015', '03/11/2015')
league_analysis.display_given_matches(weekend_matches)
Home Away
Team Chelsea Liverpool
Goals 1 3
Shots 8 16
SOT 2 7
Home Away
Team Crystal Palace Man United
Goals 0 0
Shots 10 5
SOT 5 1
Home Away
Team Man City Norwich
Goals 2 1
Shots 21 5
SOT 5 3
Home Away
Team Newcastle Stoke
Goals 0 0
Shots 15 9
SOT 6 2
Home Away
Team Swansea Arsenal
Goals 0 3
Shots 8 15
SOT 3 5
Home Away
Team Watford West Ham
Goals 2 0
Shots 20 8
SOT 5 3
Home Away
Team West Brom Leicester
Goals 2 3
Shots 14 13
SOT 6 5
Home Away
Team Everton Sunderland
Goals 6 2
Shots 15 17
SOT 8 9
Home Away
Team Southampton Bournemouth
Goals 2 0
Shots 11 16
SOT 2 2
Home Away
Team Tottenham Aston Villa
Goals 3 1
Shots 17 13
SOT 6 2

Chelsea 1-3 Liverpool

Probably the headline of the week as Chelsea's woes continue. However, those looking at underlying numbers were likely betting on this. Chelsea have been genuinely bad, here they are out-shot at home. Liverpool are a good side, but not so good the champions would expect to be deservedly beaten at home.

Crystal Palace 0-0 Manchester United

Manchester United continue to be pretty boring. It's a strange tactic though, against Manchester City who are likely a better team constricting the game feels like a good approach. But Manchester United are title contenders and are likely the better team against most opponents in the league. I suspect that the better team are well advised to open the game up more, and I'm surprised it does not happen more often. It may not feel like much of a prediction, but I don't think Manchester United will win the league by constricting opponents.

Manchester City 2-1 Norwich

Pretty strange game. I think everyone expected City to win and they did win, but they appeared to make it a bit hard for themselves. Losing an equaliser in the 83rd minute gave them a bit of a fright. When a team out-shoots their opponents 21-5 they are going to win most of the time, but it feels like City made it quite hard for themselves because only 5 of their 21 shots were on target.

However, it's always a good idea to back up your instinct with some data. So looking at all matches for which we have data, we first filter them according to matches which had the same or more uneven shot ratios. We then look at the proportion of such matches that the more dominant team won. I had expected a tsr as or more dominant than 21-5 to be substantially more correlated with victories than a sotr as or more dominant than 5-3, but here are the numbers:

In [2]:
tsr_threshold = 21.0 / (21.0 + 5.0)
sotr_threshold = 5.0 / (5.0 + 3.0)
def match_has_tsr_threshold(match):
    return match.home_tsr >= tsr_threshold or match.away_tsr >= tsr_threshold
def tsr_winners_won(match):
    return ((match.FTR == 'H' and match.home_tsr > 0.5) or
            (match.FTR == 'A' and match.away_tsr > 0.5))
def match_has_sotr_threshold(match):
    return match.home_sotr >= sotr_threshold or match.away_sotr >= sotr_threshold
def sotr_winners_won(match):
    return ((match.FTR == 'H' and match.home_sotr > 0.5) or
            (match.FTR == 'A' and match.away_sotr > 0.5))
tsr_threshold_matches = league_analysis.get_all_matches(filter_fun=match_has_tsr_threshold)
tsr_wins, tsr_thresholds = league_analysis.get_fraction_of_matches(filter_fun=tsr_winners_won,
                                                                   matches=tsr_threshold_matches)
tsr_percent = 100.0 * (float(tsr_wins) / float(tsr_thresholds))
sotr_threshold_matches = league_analysis.get_all_matches(filter_fun=match_has_sotr_threshold)
sotr_wins, sotr_thresholds = league_analysis.get_fraction_of_matches(filter_fun=sotr_winners_won,
                                                                     matches=sotr_threshold_matches)
sotr_percent = 100.0 * (float(sotr_wins) / float(sotr_thresholds))
print('{0} of {1} tsr matches or {2} percent'.format(tsr_wins, tsr_thresholds, tsr_percent))
print('{0} of {1} sotr matches or {2} percent'.format(sotr_wins, sotr_thresholds, sotr_percent))
352 of 530 tsr matches or 66.41509433962264 percent
4171 of 6886 sotr matches or 60.57217542840546 percent

So there are far fewer matches with as dominant a tsr has City enjoyed over Norwich as there are matches in which a team enjoys as dominant a shots on target ratio. However, the percentage of those matches in which the more dominant team won is only 66.4 percent. The percentage for the shots on target ratio is 60.5 percent, so a bit less than a 6 percent difference. Somehow I had expected the difference to be larger. Okay, one last bit of analysis then:

In [3]:
def sotr_tsr_same_direction(match):
    return ((match.home_sotr > sotr_threshold and match.home_tsr > tsr_threshold) or
            (match.away_sotr > sotr_threshold and match.away_tsr > tsr_threshold))
tsr_sotr_doms = league_analysis.get_all_matches(filter_fun=sotr_tsr_same_direction)
tsr_sotr_wins, tsr_sotr_doms = league_analysis.get_fraction_of_matches(filter_fun=sotr_winners_won,
                                                                       matches=tsr_sotr_doms)

sotr_tsr_ratio = float(tsr_sotr_doms) / float(tsr_thresholds)
tsr_sotr_ratio = float(tsr_sotr_doms) / float(sotr_thresholds)

tsr_sotr_dom_win_ratio = float(tsr_sotr_wins) / float(tsr_sotr_doms)
print('{0} wins from both tsr/sotr doms {1} ratio: {2} wins'.format(tsr_sotr_wins, tsr_sotr_doms,
                                                                    tsr_sotr_dom_win_ratio))
print('{0} both from {1} ratio: {2} tsr threshold matches'.format(tsr_sotr_doms, tsr_thresholds,
                                                                  sotr_tsr_ratio))
print('{0} both from {1} ratio: {2} sotr threshold matches'.format(tsr_sotr_doms, sotr_thresholds,
                                                                   tsr_sotr_ratio))
335 wins from both tsr/sotr doms 496 ratio: 0.6754032258064516 wins
496 both from 530 ratio: 0.9358490566037736 tsr threshold matches
496 both from 6886 ratio: 0.07203020621550973 sotr threshold matches

To take you through the mess of my variable names here this means that there have been 496 games in which a team has dominated tsr and sotr by as much or more as City dominated Norwich. In 335 of those games, or a surprisingly low 67.5 percent, the team that dominated shots won.

Because there were only 530 games in which a team dominated the total shots by as much as City did, that means that 93.6 percent of the time a team has that much total shots domination, that team also dominates the shots on target by 5-3 or more.

But because there were 6876 matches in which a team has a shots on target ratio of 5-3 or better, only 7.2 percent of the time did that team also have a total shots advantage by as much as 21-5.

Clear? Okay, in other words, a team that dominates total shots by as much as City did would normally expect to dominate shots on target by at least as much as City did. However a team that dominates shots on target by as much as City did can't really expect to necessarily dominate total shots by that much. However, they are both indicative of similar 60 to 66 percent chance of winning the match.

Newcastle 0-0 Stoke

Newcastle continue to not really be rewarded for their shots, although generally pretty poor. Stoke continue to be pretty horrible.

Swansea 0-3 Arsenal

Arsenal continue to look like title contenders. Swansea are probably otherwise fine, they shouldn't read too much into losing to Arsenal at home.

Watford 2-0 West Ham

West Ham continue to look pretty poor, but for one of the first times this season their low underlying performance has actually resulted in a deserved loss. Look at it this way, Watford dominated West Ham shots-wise but roughly the same amount as Manchester City dominated Norwich. Though the red card may have had something to do with this.

West Ham won entry into the Europa league qualification rounds via their fair-play. They then promptly managed to get themselves dumped out of the qualification rounds whilst incurring 3 red cards. So how are the fair play champions doing this time around:

In [4]:
pairs = [(stats.teamname, stats.booking_points_for)
         for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'Booking Points'], pairs)
Position Team Booking Points
1 Sunderland 300
2 Chelsea 285
3 Newcastle 275
- Liverpool 275
5 Southampton 270
6 Tottenham 260
7 West Ham 250
8 Everton 240
9 Crystal Palace 235
10 Leicester 230
- Aston Villa 230
- West Brom 230
- Man City 230
14 Stoke 220
15 Watford 215
16 Swansea 210
17 Norwich 200
18 Man United 190
19 Bournemouth 180
20 Arsenal 160

I'm not quite sure how fair play is calculated. For this I've used the notion of booking points because that was in the data set I've used. Basically a team is given 10 points for each yellow card they receive and 25 points for each red card. I'm not sure if the data is updated to reflect rescinded cards. I'm not quite sure if two yellows are counted as 20, 25 or possibly 45 points. But this rough measure has West Ham in 7th place, hence in the "bottom" half for fair play (the greater the number of booking points the worse the team is). This also has the worst culprits as Sunderland, and behind them Chelsea.

However, it could be that West Ham, or Sunderland or Chelsea, have had the bad fortune to play in games with particularly tough referees. Or perhaps they have been unfortunate to play in games that have, for some reason, been particularly belligerent, so perhaps a more fair measure is to ask what share of the booking points in each team's games have been awarded to that team, so basically the same as taking a shot ratio or a goal difference but for booking points:

In [5]:
pairs = [(stats.teamname, float(stats.booking_points_for) / float(stats.booking_points_for +
                                                                  stats.booking_points_against))
         for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'Booking Points Ratio'], pairs)
Position Team Booking Points Ratio
1 Southampton 0.6666666666666666
2 Norwich 0.625
3 Tottenham 0.6046511627906976
4 Newcastle 0.5789473684210527
5 Liverpool 0.5670103092783505
6 Aston Villa 0.5609756097560976
7 Sunderland 0.5504587155963303
8 Leicester 0.5411764705882353
9 Watford 0.524390243902439
10 West Brom 0.4842105263157895
11 Everton 0.48
12 Stoke 0.4782608695652174
13 Crystal Palace 0.47474747474747475
14 Man City 0.4742268041237113
15 Chelsea 0.47107438016528924
16 West Ham 0.44642857142857145
17 Man United 0.4418604651162791
18 Swansea 0.39622641509433965
19 Bournemouth 0.3956043956043956
20 Arsenal 0.3404255319148936

That's interesting, that has changed things quite significantly, at least for West Ham and, even more so, Chelsea. Arsenal and Bournemouth, and Manchester United seem pretty fair according to either measure. The other oddity is Norwich, who go from 17th worst to 2nd worst, in other words go from looking quite fair to being the second most naughty team, behind, Southampton. That's a bit of a surprise. It's not as though Southampton were looking great by the first measure either.

Wildly speculative narrative but indulging a little; Southampton are managing to restrict the number of shots teams have, perhaps aggression is one factor in that.

I suspect two things:

  1. fair play, however it is measured, has a pretty large luck component, meaning that it is not very repeatable
  2. it is however probably correlated with possession, since it's easier to foul seriously enough to incur a card when attempting to obtain possession than when attempting to retain possession.

But I don't have any data/analysis to back up these claims.

Just for fun here are how the teams have changed in rank for the number of yellow and red cards over the course of the season so far:

In [6]:
after_game_no_dicts = league_analysis.collect_after_game_dicts(epl, '01/08/2015', '03/11/2015')
first_teams = epl.teams[:10]
second_teams = epl.teams[10:]
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'yellows',
                                    rankings=True, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'yellows',
                                    rankings=True, teams=second_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'reds',
                                    rankings=True, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'reds',
                                    rankings=True, teams=second_teams)

West Brom 2-3 Leicester

Leicester continue to pick up points as Vardy continues to score.

Everton 6-2 Sunderland

Interesting game with 8 goals from 17 shots on target. Both teams hit the target with more than half of their shots. It is not terribly surprising that Everton won this. It is a bit surprising that Sunderland actually out-shot their hosts. Still, Sunderland are not great.

Southampton 2-0 Bournemouth

Southampton are a genuinely good team, but here were out-shot by what are becoming an increasingly unfortunate Bournemouth team. Here are the PDO stats:

In [7]:
teams = ['Tottenham', 'Arsenal', 'Watford', 'Liverpool', 'West Ham', 'Everton',
         'Southampton', 'Sunderland', 'Chelsea', 'Bournemouth', 'Man United']
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'pdo',
                                    y_axis_lims=(-0.3, 0.3), teams=teams)
pairs = [(stats.teamname, stats.pdo)
         for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'PDO'], pairs)
Position Team PDO
1 West Ham 0.1796235679214403
2 Man United 0.15201465201465203
3 Everton 0.13721264367816088
4 Arsenal 0.11711711711711711
5 Watford 0.06493506493506496
6 Leicester 0.05886243386243384
7 Stoke 0.03634669151910533
8 Tottenham 0.02436239055957362
9 Crystal Palace 0.015510204081632645
10 Chelsea 0.011805555555555569
11 Man City 0.0035714285714285587
12 West Brom -0.004239533651298366
13 Sunderland -0.016295707472178067
14 Aston Villa -0.04105571847507333
15 Swansea -0.07914893617021279
16 Newcastle -0.08792846497764534
17 Southampton -0.10356536502546693
18 Norwich -0.10677698975571315
19 Liverpool -0.11644204851752021
20 Bournemouth -0.20594965675057209

Liverpool and Southampton (who were unlucky enough to be off the chart until game number 7) have picked up but are still well below average. West Ham continue to receive a fair amount of luck, and also Everton aren't doing too badly for luck either. Manchester United also seem to be receiving a fair amount of luck, but they are restricting teams such that there are few shots in their games, so the magnitude of their PDO may be influenced by having a smaller sample size. Arsenal's luck has been improving, which may be a result of having spent more time in the lead after they began scoring.

Tottenham 3-1 Aston Villa

A well deserved and pretty predictable home victory for Tottenham against currently managerless Villa, who are definitely in a spot of bother. Tottenham on the other hand are on the longest current unbeaten run with 10 matches, with the next highest being Liverpool with 6. Even the next longest at any time this season is West Ham with 7, which was ended this weekend by Watford.

In [8]:
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_unbeaten_run',
                                    rankings=False, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_unbeaten_run',
                                    rankings=False, teams=second_teams)

Tottenham's unbeaten run is only matched by Villa's winless run. Newcastle and Sunderland have been "doing well" on this measure too, but recent wins have reset them.

In [9]:
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_winless_run',
                                    rankings=False, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_winless_run',
                                    rankings=False, teams=second_teams)

As always thanks again for reading.

Comments

Comments powered by Disqus