Tuesday, 30 April 2013

What the BBC isn't telling you

Yesterday Gareth pointed me to this article on the BBC website. The underlying story has to do with Meredith Kercher's murder and the subsequent trial involving mainly her flat-mate Amanda Knox, in Perugia (Italy). 

As often in these gruesome cases, the evidence is pretty difficult to handle (eg because of contamination). Thus, even in the era of CSI, it is extremely difficult to prove guilt beyond reasonable doubt based on DNA evidence. 

In Amanda Knox's trial, the appeal court decided to dismiss DNA evidence altogether, on the grounds that the available sample (which was used to convict Knox and her alleged accomplice Sollecito, in the first appeal) was unreliable. The appeal judge then ruled that it was pointless to produce further samples.

But the BBC article reports a contribution by Coralie Colmez, who points out some inconsistencies in the use of the available evidence. Her main point is that "the mathematics tells you that if you have an unreliable test, you do it again and you can be a lot more certain of that answer. And it's not intuitive but a simple calculation will tell you that it's true".

I suspect that her contribution may have been cut (too deeply) to fit the BBC article; but, it seems to me that in doing so the reported has not done a good job. The article says: "You do a first test and obtain nine heads and one tail... The probability that the coin is fair given this outcome is about 8%, [and the probability] that it is biased, about 92%. Pretty convincing, but not enough to convict your coin of being biased beyond a reasonable doubt, Colmez says".

Now, what the BBC is not telling you is that there is a statistical model (and in fact a relatively complex one) behind this conclusion. I think this is what is going on in the background. Assume that $y=9$ heads are observed out of $n=10$ trials. We can reasonably model this as $y \mid \pi,n \sim \mbox{Binomial}(\pi,n)$, where $\pi$ is the probability that the coin gives a head. Also, assume that there are two possible data generating processes at play. The first one ($D=1$) is such that $\pi=0.5$ with probability $1$, which implies that the coin is unbiased. In the second one ($D=2$) we assume no particular knowledge about the true chance of observing a head, and thus model $\pi \sim \mbox{Uniform}(0,1)$.

Effectively, we are implying a mixture prior

$$ p(\pi) = w_1\times \pi_1 + w_2\times\pi_2 $$
where $w_1=\Pr(D=1)$, $w_2=(1-w_1)=\Pr(D=2)$, $\pi_1=0.5$, $\pi_2 \sim \mbox{Uniform}(0,1)$
and the objective is to produce an estimation of the probability that the coin is biased, given the observed evidence: $\mbox{E}(\pi\neq 0.5 \mid y,n)$, or to be more precise $\Pr(D=2 \mid y,n)$.

Of course, there is nothing special about these two "hypotheses" and one can even argue that they are not necessarily the best possibilities! But, let's assume that these are OK. You can use simple MCMC to obtain this (I think there is a very similar code in The BUGS Book, dealing with pretty much the same example). Here's how I've coded it

model {
    y ~ dbin(pi[D],n)
    D ~ dcat(w[])
    pi[1] <- 0.5
    pi[2] ~ dunif(0,1)
    p.biased <- D - 1

Assuming you save this to a file BBCcoin.txt, and that you have R, JAGS and R2jags installed in your computer, you can run this model directly from R, using the following code.

working.dir <- paste(getwd(),"/",sep="")
# Observed data
y <- 9 # number of heads

n <- 10 # number of trials
# Prior for the data generating process
w <- numeric()
w[1] <- .5 # probability that coin is unbiased
w[2] <- 1-w[1] # probability that coin is biased
filein <- "BBCcoin.txt"
dataJags <- list(y=y,n=n,w=w)
params <- c("p.biased","pi")
inits <- function(){
n.iter <- 100000
n.burnin <- 9500
n.thin <- floor((n.iter-n.burnin)/500)
tic <- proc.time()
coin <- jags(dataJags, inits, params, model.file=filein,n.chains=2, n.iter, n.burnin, n.thin,
    DIC=TRUE, working.directory=working.dir, progress.bar="text")
print(coin,digits=3,intervals=c(0.025, 0.975))

This gives the following results

Inference for Bugs model at "BBCcoin.txt", fit using jags,
2 chains, each with 1e+05 iterations (first 9500 discarded),
n.thin = 181 n.sims = 1000 iterations saved
         mu.vect sd.vect  2.5% 97.5%  Rhat n.eff
p.biased   0.916   0.278 0.000 1.000 1.002  1000
pi[1]      0.500   0.000 0.500 0.500 1.000     1
pi[2]      0.802   0.160 0.308 0.974 1.002  1000
deviance   3.386   2.148 1.898 9.258 1.000  1000

For each parameter, n.eff is a crude measure of 
effective sample size, and Rhat is the 
potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.3 and DIC = 5.7
DIC is an estimate of expected predictive error 
(lower deviance is better).

which is consistent with the article claim that the probability of bias is about 92% (or equivalently that the posterior probability that $D=2$ is the "true" data generating process is about 8%). 

So quite some model details omitted, that are not not so obvious, I think! 

Friday, 26 April 2013

Posteriors vs predictives

Karl Saliba writes with some queries about our football paper, which I have already discussed on the blog hereHe says:

Thanks to the code in the appendix I could easily replicate (by using WinBUGS) a similar analysis on any football league of my choice. I am mostly concerned with the output of Table 2 in the Results section. I understand that each team is being assigned an attack and a defence parameter based on MCMC. I also understand that the higher the attack parameter, the higher the propensity to score, whereas the lower the defending parameter, the less likely it is that this team concedes goals.

As shown in Table 1, the last game of the 1991-1992 season [my note: in the Italian Serie A] was between home team Torino and away team Ascoli which ended up in a 5-2 win for Torino. The posterior parameters for the respective teams are:

Home effect: 0.2124
Torino attack effect: 0.0824
Torino defence effect: -0.4141
Ascoli attack effect: -0.2238
Ascoli defence effect: 0.4776

Based on these parameters, Torino are more likely to score than Ascoli. Secondly, Ascoli are more likely to concede than Torino. Furthermore, Torino are playing at home and so they have that extra boost called the 'home effect'. Hence, according to my intuition, it would make sense that Torino win this game.

I would like to ask the following 2 questions:

(1) Given any two competing teams as well as all five parameter estimates obtained from MCMC, how is it possible to calculate the probability that the home team wins, the probability that the away team wins, or perhaps the probability that the game ends in a draw?

(2) Based on these parameter estimates, is the following a plausible way of estimating the most likely number of goals that each team will score in an encounter?

Expected Torino goals = exp(0.2124+0.0824+0.4776) = 2.16
Expected Ascoli goals = exp(-0.4141-0.2238) = 0.52

Seems slightly way off. Is there a better way as to evaluate how many goals each team is likely to score?

I think that the main confusion here is in using the posterior distributions for the parameters instead of the relevant predictive distributions for the observable variables. Our model was directly concerned with the estimation of the predictive number of goals scored by each team on each occasion (games)
$$ p(y^*_{gj}\mid \mathbf{y}) = \int p(y^*_{gj} \mid \theta) p(\theta \mid \mathbf{y}) d\theta.$$
So, for each game $g$, the main output is a vector from the posterior predictive distribution of the number of goals scored by the team playing at home ($j=1$) or away ($j=2$). These can be directly used to estimate the probabilities requested by Karl. 

For example, in the BUGS model one could define additional nodes, eg
p.home[g] <- step(ynew[g,1]-ynew[g,2])
p.draw[g] <- equals(ynew[g,1],ynew[g,2])
(in line with the code provided in the appendix of the paper, the node ynew indicates the predictive distribution).

The first one is just an indicator function which takes value 1 if the predicted number of goals for the home team is higher than that of the away team and 0 otherwise. Thus, it quantifies the chance that the home team wins. Similarly, the second is an indicator taking value 1 if the predicted number of goals scored are the same for the two teams and 0 otherwise (and represents the chance of a draw).

The second question is directly related to this point; in this case, the objective is to estimate the predictive distribution. As for the specific example, the observed result is quite extreme (7 goals scored overall). So, given the fact that the hierarchical model produces some shrinkage in the estimates, it is unlikely that this extreme result is picked up "exactly" $-$ that's one of the points of the adjustment we propose in section 4 of the paper.

But the general principle stands: the posterior distribution of the model parameters is often the objective of inference, but it doesn't have to.

Tuesday, 23 April 2013

Russian dolls

This is a bit of a Russian doll situation, because I'm effectively pointing to another blog, which (kind of) points to this blog (and, who knows, may be this blog is somehow pointing to that other blog too... [To enjoy this fully, you should read it out loud and then follow up by an evil laugh, something like this]).

Anyway, what I mean is some nice mention of BCEA on the Health Economics and Decision Science blog. By the way, I'm (more or less) working to update the package $-$ in particular I'm toying with the idea of using ggplot2 to make the graphs (in fact, I should say that I'm getting some help with this). It seems cool and I have also flipped through this book (although I've never had time to actually learn this properly). I'll post about this when I have more details.

Friday, 19 April 2013

Most squares

Sometimes you try really hard to ask interesting questions, so that people think that you're clever and interesting too, that you actually don't realise how dumb you're being just for asking those very same questions.

For example, a few years ago, Marta's then boss invited us to see her daughter performing with her string quartet. As we sat down, waiting for the concert to start, only half (well, maybe I should say only $\frac{1}{n}-$th thinking), I asked how many people were playing in the quartet $-$ of course, just as I was speaking, I immediately realised how stupid the question was, but sadly it was too late to take it back...

I thought of this glorious moment again the other day, during one of the PSMR course lunches. I was having a chat with the other lecturers and we were commenting that this year's audience were quite good (according to the feedback forms, the feeling was mutual). 

Of course this is not always the case: Gareth, for example, remembered that a few years back he was teaching linear regression, explaining that the least square line minimises the overall difference between an observed value and the fitted value provided by a model. At that point, one person in the audience raised their hand and with a straight face asked: "why not taking the maximum?"

Friday, 12 April 2013

Health economics and sex workers

Today, the British media (eg here) have given some attention to a report commissioned by Westminster council into the condition of sex workers, with particular reference to the current economic recession.

One of the main conclusions is that, because of the recession, sex workers are forced to bring their price down and accept more clients to earn more money, thus opening themselves to higher risk of facing violence behaviour, rapes, robbery and murder.

The report (which I have flipped through) also includes some rudimentary economic evaluation which aims at highlighting what the cost-saving could be if Westminster council were to implement some measures to increase the number of crimes reported by sex workers. 

Using data from various sources (including a questionnaire that is part of the research, as well as official data from the UK Office for National Statistics), the report tries to assess the cost of implementing the intervention against the potential costs of avoiding negative events (eg a sex worker being raped). 

The authors of the report are pretty upfront in stating that their analysis is partial, only considers "first-order effects" and that the figures for the return of investment for the council are "not exact". Also, I suppose that their main objective was to raise the issue, so in a sense that's fine.

But I can't help feeling that with a little extra effort, a proper economic model could have been set up and run to give even more robust findings. For example, the report suggests that:
If one rape offender was convicted (due to a sex worker reporting the incident) and the perpetrator’s potential to re-offend was prevented, an average of £73,564 would be saved in a 12 month period (this could be greater if the offender re-offended in subsequent years). Calculation:If 10 people committed rapes, this would cost £960,000 [my comment: This is taken from a ONS report]. 2.67 of these people are likely to commit 2.87 more rapes in a 12 month period [my comment: this is based on data from the UK Home Office], so (2.67 x 2.87 x £96,000 =735,638.4 /10 people) = average of £73,564 per offender 
Surely all these numbers are subject to huge variability, which is not accounted for, here. Of course data are difficult to obtain for such a sensitive topic, but this seems one of those cases where the inclusion of informative priors to describe uncertainty around point estimates (which might be genuinely given with no measure of uncertainty in the available data) is extremely valuable.

Monday, 8 April 2013


Not sure if I'm giving away some good trick here, but I thought I'd shared my experience with this add on for gmail. Once installed, Boomerang allows you to schedule emails to be sent at a given time and date, using gmail. You compose your email normally, but there's an extra button "Send later" through which you can control the timing.

I think it's quite cool $-$ you can actually write emails at 11am and then send them out very late at night or really early in the morning, so that the recipients will go: "wow! This guy is really working very hard... Look at what time he's replied to my email!"

Then again, since XY has arrived, it is very likely that emails sent out from my account at 6am or late at night have actually just been written...

Tuesday, 2 April 2013

Petty in pink

I have to admit that in some sense I am being petty here. But I also think that this story is actually quite interesting, so I'm posting about it.

As I've mentioned elsewhere in the blog (here and here), together with some colleagues I'm working on an extensive research project on the health economics of human papilloma virus vaccination.

We already have a few papers out on different aspects of the problem (there's one which describes in more details the statistical issues and another one which is more about the economic implications) and a few more are under way.

It seems that a group of (mainly Italian) researchers have taken a close interest in our papers and have started to send letters to the editors or posts on the journal blogs, to which we have duly replied (here and here). 

Nothing wrong with criticising and commenting on our papers, of course. In fact, thanks to their comments we spotted some mistakes in the reference list of one of the paper (the references got messed up when the journal prepared the proofs, which we failed to notice $-$ so totally our fault there!). But most of the other comments were, in my view, a bit petty too, basically as if they were trying really hard to make a point, which was never there.

In the end, the editor of the journal sent us this message:
We sought advice from our editorial board in relation to the readers’ concerns and the response you provided. We have received comments from two Academic Editors who feel that your responses to the concerns are adequate. In light of the advice we have received, I am writing to let you know that we consider this matter settled by the public response you posted on the article.
Many thanks for your cooperation on this matter.
 which was nice to see.