Monday, 9 May 2016

How to be Bayesian and spare yourself a dreadful afternoon with your stupid football team losing the derby

Yesterday was the second-last game of the Italian Serie A; I've been a Sampdoria supported since I was 12 $-$ at that time, they were starting to become one of the best clubs in Serie A (and that was back in the 80's when Serie A was arguably the best league in the world), although they hadn't won anything and didn't have prospects for that season either. But they were a young, good side, playing nicely and so I kind of fell in love with them (and their shirt). Then they did become a very good side, winning the league and a few more trophies $-$ so good timing on my part! But also, then they reverted to some relative mediocrity $-$ of course, once you've decided you support a team, you're stuck with them no matter what.

Anyway, this season has been rather crappy and yesterday it was a crucial game: we were playing the derby against local rival Genoa entering the game with 40 points and two games left in the campaign. Two teams couldn't reach us any more (as they were trailing by over 6 points). But at least one between Carpi and Palermo could still overtake us if we lost our two remaining games and they won all of theirs. Also, Udinese was just one point behind us so they too could overtake us, technically. With three teams being relegated, we weren't statistically safe yet.

So, that's kind of nervous and earlier last week I thought about this a bit. I had a bad feeling about our game, because we've not been great lately (the previous game we were beaten by Palermo) and, clearly, Genoa would try really hard to mess it up for us... But, irrespective of the outcome of the derby, if at least one between Carpi, Palermo and Udinese failed to win their match we would be safe (as there wouldn't be enough points left for them to catch us). Carpi played at home against Lazio, whose season hasn't been great either, but they were already safe and with not much else to fight for, except a strong finish; Palermo were away at Fiorentina, who theoretically were still fighting for a Europa league qualification and so should have something to play for; Udinese were away at Atalanta, who much as Lazio were mathematically safe and with not much to play for.

Although one can make a much more complex model, I reasoned that instead of the actual result, what was only important was the chance that either of the three teams behind us would win and so I set up a model with $ y_{\rm{Car}} \sim \mbox{Bernoulli}(\theta_{\rm{Car}})$, $y_{\rm{Pal}} \sim \mbox{Bernoulli}(\theta_{\rm{Pal}})$ and $y_{\rm{Udi}} \sim \mbox{Bernoulli}(\theta_{\rm{Udi}})$ where the "success" would in fact be the worst possible outcome, ie a win for them.

Then I set up some priors: I reasoned that because they were playing at home, Carpi may have a slightly higher chance of winning the game $-$ I figured something about 35%. Also, I thought (hoped) that Lazio wouldn't be a walkover and so I assumed that 90% of the mass for the chance of Carpi winning their game was around 45%. These can be turned into an informative Beta(15.80107,28.4877) prior $-$ it's fairly easy to work out the parameters of a Beta distribution given the mode (0.35, in this case) and some percentile (0.45 as the 90th percentile, in this case); Christensen et al (page 100) show some theory, while this is some relevant R code.

This is effectively the prior I was assuming:

and I thought it was just about reasonable (the dotted vertical lines indicate a rough estimate of the 95% prior credible interval). Then I did something similar to derive the priors for a Palermo and Udinese win $-$ because they were playing away, I figured they would have an average chance of winning of around 20% with 90% of the mass before 40%, which can be turned into a Beta(3.279775,10.1191) prior, looking like this:

Again, I was relatively happy with this and so used these priors in my model, which one could code in R as something like ~ rbeta(10000,15.80107,28.4877) # P(win) on average .35 and with 95% mass <= .45
p.pal ~ rbeta(10000,3.279775,10.1191) # P(win) on average .2 and with 95% mass <=.4
p.udi ~ rbeta(10000,3.279775,10.1191) # P(win) on average .2 and with 95% mass <=.4 <- 1-(*p.pal*p.udi)

The most important variable in the model is the probability of Sampdoria being mathematically certain of avoiding relegation,, which is 1 minus the probability of the worst happening $-$ this assumes independence in the three games for Palermo, Carpi and Udinese; in general that's probably not the best assumption, but in this case they kind of had to win to have a good shot at safety themselves and so I think it's OK to assume independence. The results were kind of reassuring:
$-$ I got an estimated posterior average of 97.8% with a 95% credible interval of 93.8 to 99.7%. 

I am not really one to stay at home on a Sunday just to watch the football game (so perhaps I'm not really a footaball fan?) and we'd planned to see some friends, but this reassured me that we shouldn't be in too much trouble, even if we lost the derby. In the event, Kobi wasn't great (possibly as a result of venturing an outing at the seaside on Saturday) and so we stayed at home $-$ but I decided not to bother with watching the game (again: a) a bold move for a real football fan, confident about his team; b) a cowardly move from a real football fan scared of what the outcome may be; c) not a real football fan).

We did lose the derby very badly, but Carpi, Palermo and Udinese all failed to win their games, which means we are safe. I'm glad I didn't watch the game...


  1. You should take a look at -- since it is based on ELO, it really is a Bayesian approach to understanding match odds. If you read German, you may find my blog on the Bundesliga interesting ( which is making probabilistic prognoses for season outcomes for the three top German leagues ...

  2. Hi Christoph, alas Ich kann nicht Deutsch sprechen...