Gianluca Baio's blog: December 2017

Friday, 22 December 2017

A Bayesian analysis of polls in the Catalan elections

(Invited post by Virgilio Gómez-Rubio, UCLM, Albacete, Spain. Thanks Gianluca for the invitation!!)

I have been involved in the planning and analysis or survey polls almost since I came back to Albacete 9 years ago. Last months in Spanish politics have been dominated by the 'Catalan referendum' and the call for new elections from the national government via article 155 in the Spanish Constitution (which had never been enforced before). This elections have been different for many reasons, so I decided to do a (last minute) analysis of the available polls to try to predict the allocation of seats in the elections.

The Catalan parliament has 135 seats, split in four electoral districts which correspond to the four provinces in the region, with different number of seats depending on their population: Barcelona (85 seats), Gerona (17 seats), Lérida (15 seats) and Tarragona (18 seats). Seats are allocated according to D'Hondt method.

Several polls have been published in the mass media, and the proportions of votes to parties (as well as sample size, etc.) are either reported at the regional level (which is useless to allocate seats per provinces) and province level. Given that most polls are aggregated at the regional level it makes sense to combine both types of polls into a single model to provide some insight on the voters' preferences at the province level to allocate the number of seats.

Bayesian hierarchical models are great at combining information from different sources. The model that I have considered now is very simple. The number of votes (reported in the poll) to each party at the regional level are assumed to follow a multinomial distribution with probabilities $P_i, i=1,\ldots, p$, where $p$ is the number of political parties. In this case, we have 7 main parties plus another group for 'other parties'. Probabilities $P_i$ are assigned a vague Dirichlet prior. The number of votes at the province level are assumed to follow a multinomial distribution as well, with probabilities $p_{i,j},\ i=1,\ldots,p, j=1,\ldots,4.$. Both probabilities are linked by assuming that $\log(p_{i,j})$ is proportional to $\log(P_i)$ plus a province-party specific random effect $u_{i,j}$. I have used this model before with good results.

As simple as it is, this model allows the combination of polls at different aggregation levels. I have used JAGS to fit the model and to allocate the number of seats by exploiting the probabilities from the MCMC output to obtain 10000 draws of the allocation of seats by applying D'Hont rule to the proportion of votes to each party at the proven level.

Next plot shows the distribution of seats against the actual distribution of seats:

I'd say the coverage is good for most parties. Polls did not show the loss of voters for CUP and Partido Popular (PP).

Another nice thing of being Bayesian (and using MCMC) is that other probabilities could be computed. For example, the next plot shows the posterior distribution of the number of seats allocated to pro-independence parties so that the probability of them having a majority can be computed (59.86%):

As I promised to have a shot for each seat allocated correctly, I've got some work left to do until the end of the Christmas break... Merry Christmas and Happy New Year!!!

Tuesday, 19 December 2017

Does Peppa Pig encourage inappropriate use of primary care resources?

This is a very important contribution to the medical literature, recently published in the BMJ.

I think the sample size is probably not large enough to grant robust inference. And perhaps it would have been helpful to consider alternative settings, say to consider the wide diversity in the target population of Ben and Holly's little Kingdom, just to give an example.

But I do applaud the effort of the author!

Monday, 18 December 2017

Unpleasantville

Last week, Kristian Lum has written a blog post to report her experience of inappropriate behaviour by some senior male colleagues at statistical conferences (ISBA and JSM, in particular).

I don't personally know Kristian, although I think I did have lunch with her, a common friend and bunch of other people, at JSM in Montreal in 2013. Anyway, even if I were completely agnostic about the whole thing (and I don't think I am...), seems to me like her account has been corroborated by some hard facts as well as discussion with other friends/colleagues who actually know her rather well. So while it's important to avoid "courts martial", I think the discussion here isn't really about whether these things happened or not (which at this point I'm pretty sure they did $-$ just to clarify).

I've been left with mixed feelings and a sense of kind-of-having lost my bearings, since I found this out last week. Firstly, I am not surprised to hear that such things can happen at a conference or in academia, in general. What has kind of surprised me is the fact that while I do move more or less in those circles, I wasn't aware of the reputation of the two people who have been named. Some people (for example here) have made a point that these stories were well known and Kristian said so herself in her blog post. As somebody who's involved in ISBA, this is troubling and I kind of feel like we've hid our collective head under the sand, possibly for a very long time. To be fair, ISBA is now coming up with a task-force to create protocols and prevent issues such as these arising again in the future. Still, doesn't feel particularly good...

Secondly, this may be some sort of self-preservation (or may be denial?) instinct and may be there is indeed a much more rooted problem in statistics and in fact in Bayesian statistics, which I make myself struggle to see because it hurts to think that the environment in which I work is actually flawed in bad ways. But what I mean is that perhaps it's not like there's a couple of areas in which bad guys operate and if only we could get rid of those bad guys in those areas, then society would be idyllic. I think that, unfortunately, there's plenty of examples where people with/in power (statistically more likely to be white men) do behave badly and abuse their power in many ways, including sexually. May be our field does represent men disproportionately $-$ and it may well be that this is even truer for Bayesian statistics than for other branches of statistical science. And so, as painful as it is to realise quite clearly that the grass ain't so green after all, it is what it is. But the problem is (much) bigger than that...

Finally, I've particularly liked my friend Julien's Facebook post (I actually see now that he was in fact linking to somebody else's tweet):

Retweeted Carlos Scheidegger (@scheidegger):
We should all read and acknowledge @KLdivergence's and other women's harrowing stories. But I want to try something different here. Do you all know of her amazing work at @hrdag? This, on predictive policing, is so good https://t.co/YDsijFsiT2 https://t.co/GbwgKzSgMb

Dan's post has some lengthy discussion about the use of the term "mediocre" to characterise the two offenders. I think that neither mediocrity (= how poor one is at their work) nor excellence (= how good one is at their work) should be excuses $-$ but I see how this may matter because, arguably, the better and more respected you are in your field, the more power you wield over junior colleagues... But I think it feels right to point out Kristian's work qualities. Somehow, it seems to put things in a better perspective, I think.

Gianluca Baio's blog