Gianluca Baio's blog: Ordinal football

Monday, 1 October 2012

Ordinal football

I've had a quick look at this article on R-bloggers $-$ I don't think I've followed the whole exchange, but I believe they have discussed what models should/could be applied to estimate football scores (specifically, in this case they are using the Dutch league).

The main point of the post is that using ordinal regression models can improve the performance (I suppose in terms of prediction or validation of the probability associated with the observed frequency of the results).

At a very superficial level (since I've just read the article and have not thought about this a great deal), I think that assuming that the observed number of goals can be considered as an ordinal variable, much as you would do for a Likert scale, is not quite the best option.

This assumption might not have a huge impact on the actual results of this model; just as for an ordinal variable, the distance between the modalities is not linear (thus moving from scoring 0 to scoring 1 goal does not necessarily take the same effort required for moving from scoring 3 to scoring 4 goals). And ordinal regression can accommodate this situation. But I think this formulation is unnecessarily complicated and a bit confusing.

Moreover (and far more importantly, I think), if I understand it correctly, both the original models and those discussed in the post I'm considering seem to assume independence between the goals scored by the two teams competing in a single game. This is not realistic, I think, as we proved in our paper (of course drawing on other good examples in the literature).

In particular, we were considering a hierarchical structure in which the goals scored by the two competing teams are conditionally independent given a set of parameters (accounting for defence and attack, and home advantage); but because these were given exchangeable priors, correlation would be implied in the responses $-$ something like this:

The Bayesian machinery was very good at prediction, especially after we considered a slightly more complex structure in which we included information on each team's propensity to be "good", "average", or "poor". This helped avoid overshrinkage in the estimations and we did quite well.

An interesting point of the models discussed in the posts at R-bloggers is the introduction of a time effect (in this particular case to account for winter breaks in the Dutch league). In our experience, we have only considered the Italian, Spanish and English leagues (which, as far as I am aware of) do not have breaks.

But including external information is always good: for example, teams involved in European football (eg Champion's or Europa League) may do worse on the league games immediately before (and/or immediately after) their European fixture. This would be easy enough to include and could perhaps increase the precision in the estimations.

8 comments:

flo2speak4 October 2012 at 08:51
i'm not sure but it's still in sample prediction right? how's the prediction when you explain future games and not hypothetical ones with your model?
ReplyDelete
Replies
Unknown4 October 2012 at 10:47
I suppose what is "tricky" in this case is to define the "out-of-sample" predictions. What we did was to replay the whole season at once, which I know may not be the objective, eg if you're a bookie.

In this sense, the games are hypothetical, I think. What we could have done (and didn't in the end, although we played with the idea of actually doing it) was to take for example the first two/three weeks of observed data and based on those predict the next round of games. Would that count as "future games"?

Also, the model was relatively simple and crucially didn't include any observed covariate; I think this would be fundamental to do real prediction (as opposed to showing that your team were good and deserved better in that particular season, which was my main goal $-$ Marta didn't really care about this one, though).

As I was mentioning in my comment to Kees's post, information on the current form, eg in terms of having just (or being about to) played (play) an European fixture, or injuries/suspensions etc. would make the model much more robust and better in predictions, especially for "future" (vs "hypothetical", in the sense I was hinting to above) games.
ReplyDelete
Replies
Unknown5 October 2012 at 09:32
Well, if you only consider limited number of games, you are probably going to have a very large uncertainty associated with your predictions.

Especially in this case, the Bayesian approach is helpful in including extra information (eg team form, etc).

In fact that's exactly what the bookies do...
ReplyDelete
Replies
historypak17 September 2015 at 08:21
This comment has been removed by the author.
ReplyDelete
Replies
rosstaylor50516 July 2021 at 06:43
Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained!
แทงบอล
ReplyDelete
Replies
Anonymous19 July 2021 at 06:39
I love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .
แทงบอล
ReplyDelete
Replies
Seo Leena1 December 2021 at 05:39
Perform the dreaded internet search. I say dreaded because you will literally find millions of results for "flag football." In your search, be sure to include the city, county, and state you are located in. This may help some. visit website

ReplyDelete
Replies

Add comment

Gianluca Baio's blog

Monday, 1 October 2012

Ordinal football

8 comments:

LINKS

Labels