EPL 31-02 October/November 2015
EPL 31 October - 02 November
As usual we have a quick shot-statistical look at the weekend matches in the English Premier League.
%matplotlib inline
import league_analysis
from IPython.display import display, HTML
epl = league_analysis.epl
weekend_matches = league_analysis.get_matches(epl, '31/10/2015', '03/11/2015')
league_analysis.display_given_matches(weekend_matches)
Chelsea 1-3 Liverpool
Probably the headline of the week as Chelsea's woes continue. However, those looking at underlying numbers were likely betting on this. Chelsea have been genuinely bad, here they are out-shot at home. Liverpool are a good side, but not so good the champions would expect to be deservedly beaten at home.
Crystal Palace 0-0 Manchester United
Manchester United continue to be pretty boring. It's a strange tactic though, against Manchester City who are likely a better team constricting the game feels like a good approach. But Manchester United are title contenders and are likely the better team against most opponents in the league. I suspect that the better team are well advised to open the game up more, and I'm surprised it does not happen more often. It may not feel like much of a prediction, but I don't think Manchester United will win the league by constricting opponents.
Manchester City 2-1 Norwich
Pretty strange game. I think everyone expected City to win and they did win, but they appeared to make it a bit hard for themselves. Losing an equaliser in the 83rd minute gave them a bit of a fright. When a team out-shoots their opponents 21-5 they are going to win most of the time, but it feels like City made it quite hard for themselves because only 5 of their 21 shots were on target.
However, it's always a good idea to back up your instinct with some data. So looking at all matches for which we have data, we first filter them according to matches which had the same or more uneven shot ratios. We then look at the proportion of such matches that the more dominant team won. I had expected a tsr as or more dominant than 21-5 to be substantially more correlated with victories than a sotr as or more dominant than 5-3, but here are the numbers:
tsr_threshold = 21.0 / (21.0 + 5.0)
sotr_threshold = 5.0 / (5.0 + 3.0)
def match_has_tsr_threshold(match):
return match.home_tsr >= tsr_threshold or match.away_tsr >= tsr_threshold
def tsr_winners_won(match):
return ((match.FTR == 'H' and match.home_tsr > 0.5) or
(match.FTR == 'A' and match.away_tsr > 0.5))
def match_has_sotr_threshold(match):
return match.home_sotr >= sotr_threshold or match.away_sotr >= sotr_threshold
def sotr_winners_won(match):
return ((match.FTR == 'H' and match.home_sotr > 0.5) or
(match.FTR == 'A' and match.away_sotr > 0.5))
tsr_threshold_matches = league_analysis.get_all_matches(filter_fun=match_has_tsr_threshold)
tsr_wins, tsr_thresholds = league_analysis.get_fraction_of_matches(filter_fun=tsr_winners_won,
matches=tsr_threshold_matches)
tsr_percent = 100.0 * (float(tsr_wins) / float(tsr_thresholds))
sotr_threshold_matches = league_analysis.get_all_matches(filter_fun=match_has_sotr_threshold)
sotr_wins, sotr_thresholds = league_analysis.get_fraction_of_matches(filter_fun=sotr_winners_won,
matches=sotr_threshold_matches)
sotr_percent = 100.0 * (float(sotr_wins) / float(sotr_thresholds))
print('{0} of {1} tsr matches or {2} percent'.format(tsr_wins, tsr_thresholds, tsr_percent))
print('{0} of {1} sotr matches or {2} percent'.format(sotr_wins, sotr_thresholds, sotr_percent))
So there are far fewer matches with as dominant a tsr has City enjoyed over Norwich as there are matches in which a team enjoys as dominant a shots on target ratio. However, the percentage of those matches in which the more dominant team won is only 66.4 percent. The percentage for the shots on target ratio is 60.5 percent, so a bit less than a 6 percent difference. Somehow I had expected the difference to be larger. Okay, one last bit of analysis then:
def sotr_tsr_same_direction(match):
return ((match.home_sotr > sotr_threshold and match.home_tsr > tsr_threshold) or
(match.away_sotr > sotr_threshold and match.away_tsr > tsr_threshold))
tsr_sotr_doms = league_analysis.get_all_matches(filter_fun=sotr_tsr_same_direction)
tsr_sotr_wins, tsr_sotr_doms = league_analysis.get_fraction_of_matches(filter_fun=sotr_winners_won,
matches=tsr_sotr_doms)
sotr_tsr_ratio = float(tsr_sotr_doms) / float(tsr_thresholds)
tsr_sotr_ratio = float(tsr_sotr_doms) / float(sotr_thresholds)
tsr_sotr_dom_win_ratio = float(tsr_sotr_wins) / float(tsr_sotr_doms)
print('{0} wins from both tsr/sotr doms {1} ratio: {2} wins'.format(tsr_sotr_wins, tsr_sotr_doms,
tsr_sotr_dom_win_ratio))
print('{0} both from {1} ratio: {2} tsr threshold matches'.format(tsr_sotr_doms, tsr_thresholds,
sotr_tsr_ratio))
print('{0} both from {1} ratio: {2} sotr threshold matches'.format(tsr_sotr_doms, sotr_thresholds,
tsr_sotr_ratio))
To take you through the mess of my variable names here this means that there have been 496 games in which a team has dominated tsr and sotr by as much or more as City dominated Norwich. In 335 of those games, or a surprisingly low 67.5 percent, the team that dominated shots won.
Because there were only 530 games in which a team dominated the total shots by as much as City did, that means that 93.6 percent of the time a team has that much total shots domination, that team also dominates the shots on target by 5-3 or more.
But because there were 6876 matches in which a team has a shots on target ratio of 5-3 or better, only 7.2 percent of the time did that team also have a total shots advantage by as much as 21-5.
Clear? Okay, in other words, a team that dominates total shots by as much as City did would normally expect to dominate shots on target by at least as much as City did. However a team that dominates shots on target by as much as City did can't really expect to necessarily dominate total shots by that much. However, they are both indicative of similar 60 to 66 percent chance of winning the match.
Newcastle 0-0 Stoke
Newcastle continue to not really be rewarded for their shots, although generally pretty poor. Stoke continue to be pretty horrible.
Swansea 0-3 Arsenal
Arsenal continue to look like title contenders. Swansea are probably otherwise fine, they shouldn't read too much into losing to Arsenal at home.
Watford 2-0 West Ham
West Ham continue to look pretty poor, but for one of the first times this season their low underlying performance has actually resulted in a deserved loss. Look at it this way, Watford dominated West Ham shots-wise but roughly the same amount as Manchester City dominated Norwich. Though the red card may have had something to do with this.
West Ham won entry into the Europa league qualification rounds via their fair-play. They then promptly managed to get themselves dumped out of the qualification rounds whilst incurring 3 red cards. So how are the fair play champions doing this time around:
pairs = [(stats.teamname, stats.booking_points_for)
for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'Booking Points'], pairs)
I'm not quite sure how fair play is calculated. For this I've used the notion of booking points because that was in the data set I've used. Basically a team is given 10 points for each yellow card they receive and 25 points for each red card. I'm not sure if the data is updated to reflect rescinded cards. I'm not quite sure if two yellows are counted as 20, 25 or possibly 45 points. But this rough measure has West Ham in 7th place, hence in the "bottom" half for fair play (the greater the number of booking points the worse the team is). This also has the worst culprits as Sunderland, and behind them Chelsea.
However, it could be that West Ham, or Sunderland or Chelsea, have had the bad fortune to play in games with particularly tough referees. Or perhaps they have been unfortunate to play in games that have, for some reason, been particularly belligerent, so perhaps a more fair measure is to ask what share of the booking points in each team's games have been awarded to that team, so basically the same as taking a shot ratio or a goal difference but for booking points:
pairs = [(stats.teamname, float(stats.booking_points_for) / float(stats.booking_points_for +
stats.booking_points_against))
for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'Booking Points Ratio'], pairs)
That's interesting, that has changed things quite significantly, at least for West Ham and, even more so, Chelsea. Arsenal and Bournemouth, and Manchester United seem pretty fair according to either measure. The other oddity is Norwich, who go from 17th worst to 2nd worst, in other words go from looking quite fair to being the second most naughty team, behind, Southampton. That's a bit of a surprise. It's not as though Southampton were looking great by the first measure either.
Wildly speculative narrative but indulging a little; Southampton are managing to restrict the number of shots teams have, perhaps aggression is one factor in that.
I suspect two things:
- fair play, however it is measured, has a pretty large luck component, meaning that it is not very repeatable
- it is however probably correlated with possession, since it's easier to foul seriously enough to incur a card when attempting to obtain possession than when attempting to retain possession.
But I don't have any data/analysis to back up these claims.
Just for fun here are how the teams have changed in rank for the number of yellow and red cards over the course of the season so far:
after_game_no_dicts = league_analysis.collect_after_game_dicts(epl, '01/08/2015', '03/11/2015')
first_teams = epl.teams[:10]
second_teams = epl.teams[10:]
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'yellows',
rankings=True, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'yellows',
rankings=True, teams=second_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'reds',
rankings=True, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'reds',
rankings=True, teams=second_teams)
West Brom 2-3 Leicester
Leicester continue to pick up points as Vardy continues to score.
Everton 6-2 Sunderland
Interesting game with 8 goals from 17 shots on target. Both teams hit the target with more than half of their shots. It is not terribly surprising that Everton won this. It is a bit surprising that Sunderland actually out-shot their hosts. Still, Sunderland are not great.
Southampton 2-0 Bournemouth
Southampton are a genuinely good team, but here were out-shot by what are becoming an increasingly unfortunate Bournemouth team. Here are the PDO stats:
teams = ['Tottenham', 'Arsenal', 'Watford', 'Liverpool', 'West Ham', 'Everton',
'Southampton', 'Sunderland', 'Chelsea', 'Bournemouth', 'Man United']
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'pdo',
y_axis_lims=(-0.3, 0.3), teams=teams)
pairs = [(stats.teamname, stats.pdo)
for stats in epl.team_stats.values()]
league_analysis.display_ranked_table(['Team', 'PDO'], pairs)
Liverpool and Southampton (who were unlucky enough to be off the chart until game number 7) have picked up but are still well below average. West Ham continue to receive a fair amount of luck, and also Everton aren't doing too badly for luck either. Manchester United also seem to be receiving a fair amount of luck, but they are restricting teams such that there are few shots in their games, so the magnitude of their PDO may be influenced by having a smaller sample size. Arsenal's luck has been improving, which may be a result of having spent more time in the lead after they began scoring.
Tottenham 3-1 Aston Villa
A well deserved and pretty predictable home victory for Tottenham against currently managerless Villa, who are definitely in a spot of bother. Tottenham on the other hand are on the longest current unbeaten run with 10 matches, with the next highest being Liverpool with 6. Even the next longest at any time this season is West Ham with 7, which was ended this weekend by Watford.
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_unbeaten_run',
rankings=False, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_unbeaten_run',
rankings=False, teams=second_teams)
Tottenham's unbeaten run is only matched by Villa's winless run. Newcastle and Sunderland have been "doing well" on this measure too, but recent wins have reset them.
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_winless_run',
rankings=False, teams=first_teams)
league_analysis.plot_changing_stats(epl, after_game_no_dicts, 'current_winless_run',
rankings=False, teams=second_teams)
As always thanks again for reading.
Comments
Comments powered by Disqus