Result Histograms

Suppose you wish to bet on the correct score for games in a particular league. Of course the odds for each result will depend heavily on the two teams involved but it would at least help to know what the base rates are. That is, how often each kind of result occurs in a particular league. There is no point in betting on a four-nil away victory if that result never happens. Here, we give a few histograms over the kinds of results that happen in our standard five leagues.

We start off with a bunch of imports to get us started.

%matplotlib inline
from IPython.display import display, HTML
import league_analysis
import collections
import matplotlib.pyplot as plot
import numpy
league_names = ['epl', 'ech', 'elo', 'elt', 'spl']

Then we require some code to actually draw the histogram of results from a particular set of matches. Our code here is generic enough to draw the histograms for both results (home, away, draw) and correct scores (0-0, 0-1, etc.).

def format_result(p):
    return "{0}-{1}".format(p[0], p[1])

def get_scoreline(match):
    result = (match.FTHG, match.FTAG)
    return format_result(result)

def get_result(match):
    return match.FTR

def histogram_match_results(matches, title='Match histogram', get_result=get_scoreline,
                            most_common_plotted=15):
    plot.title(title)
    result_counts = collections.Counter([get_result(m) for m in matches])
    total_games = sum(result_counts.values())
    most_common = list(result_counts.most_common(15))
    names = [n for n, _ in most_common]
    values = [result_counts[r] / float(total_games) for r in names]
    
    others = total_games - sum([c for _, c in most_common])
    if others > 0:
        names = names + ['Others']
        values = values + [others / float(total_games)]

    x_positions = numpy.arange(len(names))
    plot.yticks(x_positions, names)
    plot.barh(x_positions, values, align='center', alpha=0.4)
    plot.show()

Now we can get on with the business of actually plotting some histograms. First a couple of histograms for the results over all the matches for all years, basically all the matches we have data for.

histogram_match_results(league_analysis.get_all_matches(),
                        title='All Leagues - All years',
                        get_result=get_scoreline)
histogram_match_results(league_analysis.get_all_matches(),
                        title='All Leagues - All years',
                        get_result=get_result)

We see that 1-1 is the most frequent result and a home win occurs in atouve 43 percent of games. Now we do the same thing but for each league individually.

for league_name in ['epl', 'ech', 'elo', 'elt', 'spl']:
    leagues = [league_name + '_league']
    matches = list(league_analysis.get_all_matches(leagues=leagues))
    title = "{0} League - All years".format(league_name.upper())
    histogram_match_results(matches, title=title, get_result=get_scoreline)
    histogram_match_results(matches, title=title, get_result=get_result)

These are interesting graphs, but it's a little difficult to compare each leage against any other. The following requires a bit of effort but we produce a table showing the most common score lines in each league.

def get_result_counts(league_name, get_result=get_result):
    leagues = [league_name + '_league']
    matches = list(league_analysis.get_all_matches(leagues=leagues))
    result_counts = collections.Counter([get_result(m) for m in matches])
    return result_counts

league_result_counts = {league_name: get_result_counts(league_name, get_result=get_result)
                        for league_name in league_names}
league_scoreline_counts = {league_name: get_result_counts(league_name, get_result=get_scoreline)
                           for league_name in league_names}

def most_common_for_league(league_name, get_result=get_result):
    result_counts = get_result_counts(league_name, get_result=get_result)
    return [n for n, _ in result_counts.most_common(15)]

league_names = ['epl', 'ech', 'elo', 'elt', 'spl']
most_common_result_lists = {league_name: most_common_for_league(league_name)
                            for league_name in league_names}
most_common_scoreline_lists = {league_name: most_common_for_league(league_name,
                                                                   get_result=get_scoreline)
                               for league_name in league_names}
def make_row(s):
    return "<tr>" + s + "</tr>"
def make_cell(league_name, i):
    try:
        scoreline = most_common_scoreline_lists[league_name][i]
    except IndexError:
        scoreline = "N/A"
    return "<td>" + scoreline + "</td>"
headers = make_row(" ".join(["<th>{0}</th>".format(name) for name in league_names]))
rows = [make_row(" ".join([make_cell(league_name, i) for league_name in league_names]))
        for i in range(15)]
rows = "\n".join(rows)
html = "\n".join(["<table>", headers, rows, "</table>"])
display(HTML(html))

So now we see that all leagues have the first two most common results in common. The score lines 1-1 and 1-0 seem to be always the most common. But then we diverge and quite significantly for the top two leagues have a 2-1 home victory as the next most common result whilst the lower leagues have a 0-1 away victory as the next most common.

This is nice information, but could be more useful if we knew by how much each result was more likely in each given league. The following code allows us to histogram the proprotion of each result across each of the five leagues.

def get_result_proportion(league_name, counts_dictionary, result):
    result_counts = counts_dictionary[league_name]
    total_games = sum(result_counts.values())
    proportion = result_counts[result] / float(total_games)
    return proportion

def plot_result_proportion(result, counts_dictionary):
    plot.title('Proportion of {0} results'.format(result))
    names = league_names
    values = [get_result_proportion(league_name, counts_dictionary, result)
              for league_name in league_names]
    x_positions = numpy.arange(len(names))
    plot.yticks(x_positions, names)
    plot.barh(x_positions, values, align='center', alpha=0.4)
    plot.show()
for result in ['H', 'A', 'D']:
    plot_result_proportion(result, league_result_counts)

for scoreline in most_common_scoreline_lists['epl']:
    plot_result_proportion(scoreline, league_scoreline_counts)

Conclusions

Conclusions are terribly thin on the ground here. There are some differences between leagues but the do not seem that significant. The idea that a 2-0 home victory is a good default is a little hard to justify with it being the 6th most common result overall, but a little better for just the EPL where it is the 4th most common result. If nothing else, I hope this data is at least interesting.

epl	ech	elo	elt	spl
1-1	1-1	1-1	1-1	1-1
1-0	1-0	1-0	1-0	1-0
2-1	2-1	0-1	0-1	0-1
2-0	0-1	2-1	0-0	1-2
0-0	0-0	2-0	2-1	2-0
1-2	2-0	1-2	1-2	2-1
0-1	1-2	0-0	2-0	0-0
2-2	2-2	2-2	2-2	0-2
3-1	0-2	0-2	0-2	3-0
3-0	3-0	3-0	3-0	2-2
0-2	3-1	3-1	3-1	3-1
1-3	1-3	1-3	1-3	4-0
0-3	3-2	3-2	3-2	0-3
2-3	0-3	0-3	2-3	1-3
3-2	2-3	4-1	0-3	2-3

epl	ech	elo	elt	spl
1-1	1-1	1-1	1-1	1-1
1-0	1-0	1-0	1-0	1-0
2-1	2-1	0-1	0-1	0-1
2-0	0-1	2-1	0-0	1-2
0-0	0-0	2-0	2-1	2-0
1-2	2-0	1-2	1-2	2-1
0-1	1-2	0-0	2-0	0-0
2-2	2-2	2-2	2-2	0-2
3-1	0-2	0-2	0-2	3-0
3-0	3-0	3-0	3-0	2-2
0-2	3-1	3-1	3-1	3-1
1-3	1-3	1-3	1-3	4-0
0-3	3-2	3-2	3-2	0-3
2-3	0-3	0-3	2-3	1-3
3-2	2-3	4-1	0-3	2-3

epl	ech	elo	elt	spl
1-1	1-1	1-1	1-1	1-1
1-0	1-0	1-0	1-0	1-0
2-1	2-1	0-1	0-1	0-1
2-0	0-1	2-1	0-0	1-2
0-0	0-0	2-0	2-1	2-0
1-2	2-0	1-2	1-2	2-1
0-1	1-2	0-0	2-0	0-0
2-2	2-2	2-2	2-2	0-2
3-1	0-2	0-2	0-2	3-0
3-0	3-0	3-0	3-0	2-2
0-2	3-1	3-1	3-1	3-1
1-3	1-3	1-3	1-3	4-0
0-3	3-2	3-2	3-2	0-3
2-3	0-3	0-3	2-3	1-3
3-2	2-3	4-1	0-3	2-3

Result Histograms

Conclusions

Comments

epl	ech	elo	elt	spl
1-1	1-1	1-1	1-1	1-1
1-0	1-0	1-0	1-0	1-0
2-1	2-1	0-1	0-1	0-1
2-0	0-1	2-1	0-0	1-2
0-0	0-0	2-0	2-1	2-0
1-2	2-0	1-2	1-2	2-1
0-1	1-2	0-0	2-0	0-0
2-2	2-2	2-2	2-2	0-2
3-1	0-2	0-2	0-2	3-0
3-0	3-0	3-0	3-0	2-2
0-2	3-1	3-1	3-1	3-1
1-3	1-3	1-3	1-3	4-0
0-3	3-2	3-2	3-2	0-3
2-3	0-3	0-3	2-3	1-3
3-2	2-3	4-1	0-3	2-3