Gianluca Baio's blog: March 2015

Tuesday, 31 March 2015

House of stats

[This is a rather long joint post with Roberto Cerina and compounds our paper in the April 2015 issue of Significance]

1. Prelude (kind-of unrelated to what follows).
Last week, Marta and I finished watching the last series of House of Cards, the Netflix adaptation of the original BBC series (which I may have liked even more... I've not decided yet). The show is based around the US politics and the fictional President Underwood.

Although political science is of course not my main research area (or even one of my research areas at all, to be more precise!), I'm in general interested in politics and I am interested in (some of) the application of statistical methods to this area. Thus, last year I decided to offer an undergraduate project which would be vaguely related to statistical modelling of political data (eg on polls, elections, etc).

The project caught Roberto's attention and he decided to take it on $-$ in fact, he has even started to ask for things to read before he was supposed to begin the project! Originally, we intended to work on Italian data (which would have been nice, given we're both Italian and the proliferation of parties, which would have made for more complex modelling); but it was much easier to find data about last year's US Senate elections and so we decided to use those.

2. Background
Modelling US Senate elections is more difficult than the presidential elections for two main reasons: firstly, there are far fewer polls per state than there are at the national level for the main "event". This can also mean a high impact for the “house effect” $–$ where polls favour one party or the other, depending on the polling house conducting the study.

The second problem has to do with correctly accounting for the effect of the economy on voting behaviour. In Senate elections, this is thought to be weaker due to the lack of precise “blame” to be directed at the incumbent. Presidents are seen to be responsible for the economy, and are heavily rewarded or penalised for it. However, the same cannot be said (at least to the same extent) for a senator. Nevertheless, we want to include some long-term factors in our model, to compound the short-term shocks produced by the ongoing polls; to this aim, we collected state-specific data on macroeconomic variables and included them in the model.

3. The model
The basic idea is to extend an interesting model by Drew Linzer, which was developed for the 2012 US presidential elections. In a nutshell, the model aims at combining data from the polls that start being conducted weeks before the elections with increasing intensity and data on some historical trends. In particular, the objective is to perform the estimation of the results of the elections in a dynamic way, so that most recent polls tend to weigh more than older ones.

Our main data are the polls; in particular the number $y_{itk}$ of respondents who declared they would vote for the Republican candidate in the $k-$th poll in the $i-$th state (we consider the 33 in which elections were taking place) at the $t-$th week of the election campaign (we consider a total of $T=22$ weeks) out of the sample size $n_{itk}$

Then we aggregate the data over weeks as $N_{it}=\sum_{k} n_{itk} $ and $Y_{it} = \sum_k y_{itk}$ and model $Y_{it} \sim\mbox{Binomial}(p_{it},N_{it})$. The parameter $p_{it}$ is the object of our inference represents the probability that a random elector votes for the Republican candidate, in state $i$ at week $t$ and we model it as
$$ \mbox{logit}(p_{it}) = \alpha_{it} + \beta_t $$
where $\alpha_{it}$ is a state-specific effect on a given week of the campaign and $\beta_t$ is the common trend amongst Republican candidates at the national level.

Following the original model of Linzer, we then assume a reverse random walk structure on the $\beta$-s:
$$\beta_{t}| \beta_{t+1} \sim \mbox{Normal}(\beta_{t+1}, \sigma^2_\beta ),$$
with $\beta_T := 0$. This encodes the assumption that individual preferences at week $t+1$ will be affected by the preferences at week $t$. In addition, because of the anchoring at 0 for election week, we imply that, that as election day becomes closer, the Republican vote share will not be affected by national campaign effects on election week.

As for the parameter $\alpha_{it}$ we assume a prior specification $\alpha_{it}\mid \alpha_{it+1}\sim\mbox{Normal}(\alpha_{it+1},\sigma^2_\alpha)$. In this case, the anchoring at time $T$ is in terms of $\alpha_{iT} \sim \mbox{Normal}\left(\mbox{logit}(h_i),s^2_h\right)$, where $h_i$ is the historical forecast of the incumbent party vote share and is modelled using a full Bayesian specification as a regression with state-specific macroeconomic factors as well as nationwide structural indicators. Specifically, we regress the incumbent candidate’s vote share on factors such as a state level dummy variable representing the incumbency of a candidate, the incumbent president's approval rating and a dummy variable representing affinity of the incumbent party, and then convert these long run predictions for the incumbent candidates to Republican party predictions. such as a state level dummy variable representing the incumbency of a candidate, the incumbent president's approval rating and a dummy variable representing affinity of the incumbent party. The full Bayesian specification as well as the inclusion of state-specific variables is what differentiates our model from Linzer's.

In a nutshell, the model can be represented graphically as in the following graph.

4. Results
The first output of our model is the estimation, at election week, of the outcome of each election.

We compared the predicted two-party vote share distribution with the actual results. The prediction interval is reported as 2 standard deviations (sd) around the predicted mean Republican vote share. A safe Republican seat is coloured red and is defined as the mean being at least 2 sd greater than the 0.5 cut-off; a likely Republican seat is coloured light red and is defined when the mean is larger than the 0.5 cut-off, but the lower tail touches the line. Democratic seats are defined in the same way, with a blue colour scale. The left axis contains the State names, in the Incumbent party’s colours. The green squares belong to the predictions of our Bayesian adaptation of the model for the historical trends. The predicted probability of a Republican senate takeover, according to the model, was 94% by the end of election week. The most probable outcome under our model is predicted to be a Republican net gain of 7 seat, 1 more than they need to take over the Senate. Republicans had an overwhelming advantage on election day. The model assigns Harry Reid, the Democratic Senate majority leader, only a 6% chance of keeping his job.

We can also look at the dynamic forecasts for specific states, which show how a stakeholder in a specific race (e.g. the Democratic national Committee) updates his predictions as the weeks go by, and can use this model to allocate resources amongst the races. These are also useful for a “post-mortem” analysis of the vote, and we can see how actual campaign events match up to changes in the weekly prediction intervals.

For example, the following graph shows the situation for Kansas. This is a good example of what happens when the structural forecast does not give us much information on the re-election chances of the incumbent, and the polls end up being skewed. Here the race was extremely uncertain at the beginning of the monitoring, as one can see from the width of the prediction interval at week 1 (21 weeks to go).

Incumbent Republican Pat Roberts is shown to be quite unpopular, and is consistently low in the polls up to week 11 (11 weeks to go). This is consistent with the political news at the time, with Roberts barely making it out alive from his tea-party primary challenge, with less than 50% of the vote. After a marginal gain in the following couple of weeks, due to the indecision in the Democratic party as to whether it was worth competing or endorsing the Independent challenger, Roberts drops due to the Republican campaign effect, coinciding with the low poll numbers for congress.

Greg Orman, the Independent opponent (here modelled as a democrat for computational purposes), had to make up for the lack of party structure behind his campaign with personal finances, giving over $1 million to his own campaign. To help Orman get a chance at defeating Roberts, the official Democratic candidate Chad Taylor quit the race at the beginning of September. This seems to have contributed in stalling Roberts' rise for a couple of weeks (16 and 17), but doesn't seem to have had the overall desired effect of tipping the balance in favour of the Independent. The steady rise of Roberts from then on suggests that this was not a race decided by particular events, rather it was a case of a consistently better campaign on the part of the incumbent. Orman was outspent also by SuperPacs, with outside groups supporting Roberts outspending the independent's by a 2:1 margin. Our model judges the race a toss-up, giving a tiny advantage to the Republican candidate. However, Roberts won by over 5 points! This suggests that the pollsters misjudged the race.

At the other end of the spectrum, is the situation for North Carolina $-$ kind of our Achille's heel. Analysing what happened here, one can immediately see that the structural model for North Carolina is solid: it reduces our uncertainty of the entire race to less than 10 percentage points, with the Republican challenger Thom Tillis at a slight disadvantage, all the way up to week 7 (15 weeks to go).

An odd ball in this race, was represented by Libertarian candidate Sean Haugh, who polled vertiginous high for a third party candidate all the way up to election day. His presence breaks the assumption that allowed us to model the 2-party vote share, which is that it is ok to only consider Democrats and Republicans, as long as the assumption that independents, third party candidates and non-respondents in polls break evenly for both parties.

This wasn't the case in this race, and especially in the months of June and July, Haugh polled with heights of 11% and a mid-June to mid-August Average of around 8%. However at mid August, his vote share suddenly drops to about 5%, coinciding with Tills gaining momentum and bringing the race to a toss-up. The reasons for such drop are not certain, however it is not far-fetched to think that disgruntled Republican voters wanted to send a message to the Republican establishment, represented by Tillis, and considered voting for the Libertarian candidate as protest. In accordance to the enlightened preferences theory, as the Republican voters learnt that Tillis was their only chance at not getting the incumbent Democrat Kay Hagan re-elected, the protest voters gradually came back to the Republican base.

It is worth saying that they were heavily pushed by the two campaigns and by outside groups, who ended up pouring close to $90 million in the campaign, dubbing it ``the most expensive Senate election in history''. This process kept going all the way up to election day, where the Libertarian candidate ended up leaking all but 3.7% of the vote.

Support for Kay Hagan seems to be consistent throughout, and polls don't seem to incorporate much variability, giving the incumbent a small lead. Tillis, the Republican opponent, suffered from the usual Republican drop in week 15, but managed to put on a good show in the last tv debates. Especially the last debate (around 2 weeks to election day), was a big hit for Tillis as Hagan was a no-show in the televised debate amid criticism over her husband reaping personal benefits from the Obama stimulus package. This propelled him closer to Hagan, but he never quite caught up in the polls. On election day, he won with a 1.5% lead.

Thursday, 19 March 2015

Utility bills

Because I'm involved in many collaborative projects, some of which luckily involving LaTeX, and because I'm trying (sort-of succeeding) to spend as much time as possible outside the office (mostly failing) to work on the books, in the past few weeks I've found myself wanting some track-changes utility for the work I was sharing with my LaTeX-savvy colleagues. [Could this be a candidate for the longest opening sentence of a post, ever?]

I had a quick look online and found this very nice package $-$ it's probably well established, but I'd not encountered it before, so I was very pleased to discover it.

It works quite smoothly and lets you annotate the original .tex file with changes, additions and notes. And what's even nicer is that the compiled document has some mark-up (eg different colour for new text), but it's not very cluttered, so that you can read fairly easily the current version with notes.

Speaking of LaTeX, I also found this other couple of useful programmes: the first one is a perl script that creates the bibtex code of a given reference $-$ basically you can copy and paste the full reference of a text of interest and the script will return the LaTeX code to paste into a .bib file. The second one searches PubMed and retrieve the LaTeX code for the hits that match the search string.

Again, both are probably quite old and well established. But it was quite serendipitous.

Friday, 6 March 2015

Banned!

This is not really news any more, but I still think it's an interesting story.

Last week the journal Basic and Applied Social Psychology has published an editorial setting out their views (or rather prescriptions) for how statistical analyses should be conducted in papers that seek publication with them.

The editorial starts by effectively banning the use of p-values and null hypothesis significance testing, which "is invalid, and thus authors would be not required to perform it". Then it goes on to say that "Bayesian procedures are more interesting", but also suffer from issues with "Laplacian assumption" (non-informative priors) and therefore they "reserve the right to make case-by-case judgments, and thus Bayesian procedures are neither required nor banned from BASP.

The conclusion of the editorial is then that basically psychologists do not need to bother with any inferential procedure, "because the state of the art remains uncertain. However, BASP will require strong descriptive statistics, including effect sizes. We also encourage the presentation of frequency or distributional data when this is feasible".

This has caused quite a stir among many statisticians (and I think psychologists should join the protest!). Here's a series of responses by important statisticians. I personally think that some of the problem at least is the view that statistics is some sort of recipe-book: if you have such and such data collection, then do a t-test; if you have such and such a design, then do an ANOVA; or perhaps if you have this other data, then use meta-analysis and throw in some priors-kind of thing $-$ I'm no real expert here, but I think that psychology as a field suffers particularly from this problem (perhaps for historical reasons?).

Most importantly, this reminds me of my first ISBA conference, back in 2006 (I think that's the last time it was held in the Valencia area). The final night of the conference, some attendees prepare some entertainment and that year, together with a few (back then) young friends, we prepared a news broadcast $-$ we spent most of the last day of the conference doing this, rather than attending the talks, I'm half proud, half ashamed to confess.

Anyway, among the "serious" news we were reporting was a riot that had happened outside the conference hotel where frequentists had come in masses to protest, waiving placards reading "We value p-value!" (worryingly, we also reported that Alan Gelfand, then-President of ISBA, had to be transferred to a secure location).

Thursday, 5 March 2015

Cannabis on trial

The other night, Channel 4 has broadcast this programme. That's some sort of spin-off from the trial we're working on at UCL (Valerie Curran is the principal investigator $-$ the whole group is really good and all nice people to work with!). The idea of the TV programme was to have a bunch of celebrities to try different forms of cannabis, to explore the hypothesis that it is the actual composition of cannabis that can make it harmful.

The point is that skunk is the younger and stronger version, which contains higher proportions of the "bad component" (THC), while hash is mainly made by a milder component (CBD), which seems to have far lower damaging effects $-$ in fact it can prove beneficial in some cases. A live blog detailing the programme is here.

I've mentioned this already (here, for example) and it's interesting from the stats point of view as we're implementing an adaptive design to this trial. We're collecting information on a set of volunteers and we'll continuously monitor the results, updating the uncertainty on whether several doses of the compound we're testing (based on CBD) is the most effective and can then proceed to a head-to-head trial phase against placebo.

Sunday, 1 March 2015

Non-trivial wedges

During February, I've been really bad at blogging $-$ I've only posted one entry advertising our workshop at the RSS, later this month. I have spent a lot of time working in collaboration with colleagues at UCL and the London School of Hygiene and Tropical Medicine to prepare a special issue of the journal Trials.

We've prepared 6 articles on the Stepped Wedge (SW) design. This is a relatively new design for clinical trials $-$ it's basically a variant of cluster RCTs, in which all clusters start the study in the control arm and then sequentially switch to the intervention arm, in a random order, until all the clusters are given the intervention.

There are some obvious limitations to this design (first and foremost the fact that there may be a time effect over and above the intervention effect, which means that time needs to be controlled for, to avoid bias). But, as we show in our several articles, there may be some benefits in applying it $-$ I think we've been very careful in detailing them, as practitioners need to be fully aware of the drawbacks.

The paper I've been working on mostly is about sample size calculations for a SW trial. Some authors have presented analytical formulae to do these, but while they work in specific circumstances, there are several instances in which the features of the SW formulation (time effect, repeated measurements on the same individuals in the clusters, etc) are better handled through a simulation-based approach, which is what we describe in details in our paper.

I'm also finalising a R package in which I'll collect the functions I've prepared to sort-of-automate the calculations, for a set of relatively general situations. I'm planning on naming the package SWSamp (Samp have won today, so I'm all up for it, right now $-$ we'll see how they when I get closer to finishing it, though...).

Gianluca Baio's blog