As often in these gruesome cases, the evidence is pretty difficult to handle (eg because of contamination). Thus, even in the era of CSI, it is extremely difficult to prove guilt beyond reasonable doubt based on DNA evidence.
In Amanda Knox's trial, the appeal court decided to dismiss DNA evidence altogether, on the grounds that the available sample (which was used to convict Knox and her alleged accomplice Sollecito, in the first appeal) was unreliable. The appeal judge then ruled that it was pointless to produce further samples.
But the BBC article reports a contribution by Coralie Colmez, who points out some inconsistencies in the use of the available evidence. Her main point is that "the mathematics tells you that if you have an unreliable test, you do it again and you can be a lot more certain of that answer. And it's not intuitive but a simple calculation will tell you that it's true".
I suspect that her contribution may have been cut (too deeply) to fit the BBC article; but, it seems to me that in doing so the reported has not done a good job. The article says: "You do a first test and obtain nine heads and one tail... The probability that the coin is fair given this outcome is about 8%, [and the probability] that it is biased, about 92%. Pretty convincing, but not enough to convict your coin of being biased beyond a reasonable doubt, Colmez says".
Now, what the BBC is not telling you is that there is a statistical model (and in fact a relatively complex one) behind this conclusion. I think this is what is going on in the background. Assume that $y=9$ heads are observed out of $n=10$ trials. We can reasonably model this as $y \mid \pi,n \sim \mbox{Binomial}(\pi,n)$, where $\pi$ is the probability that the coin gives a head. Also, assume that there are two possible data generating processes at play. The first one ($D=1$) is such that $\pi=0.5$ with probability $1$, which implies that the coin is unbiased. In the second one ($D=2$) we assume no particular knowledge about the true chance of observing a head, and thus model $\pi \sim \mbox{Uniform}(0,1)$.
Effectively, we are implying a mixture prior
$$ p(\pi) = w_1\times \pi_1 + w_2\times\pi_2 $$
where $w_1=\Pr(D=1)$, $w_2=(1-w_1)=\Pr(D=2)$, $\pi_1=0.5$, $\pi_2 \sim \mbox{Uniform}(0,1)$
and the objective is to produce an estimation of the probability that the coin is biased, given the observed evidence: $\mbox{E}(\pi\neq 0.5 \mid y,n)$, or to be more precise $\Pr(D=2 \mid y,n)$.
model {
y ~ dbin(pi[D],n)
D ~ dcat(w[])
pi[1] <- 0.5
pi[2] ~ dunif(0,1)
p.biased <- D - 1
}
Assuming you save this to a file BBCcoin.txt, and that you have R, JAGS and R2jags installed in your computer, you can run this model directly from R, using the following code.
library(R2jags)
working.dir <- paste(getwd(),"/",sep="")
# Observed data
y <- 9 # number of heads
n <- 10 # number of trials
# Prior for the data generating process
w <- numeric()
w[1] <- .5 # probability that coin is unbiased
w[2] <- 1-w[1] # probability that coin is biased
filein <- "BBCcoin.txt"
dataJags <- list(y=y,n=n,w=w)
params <- c("p.biased","pi")
inits <- function(){
list(pi=c(NA,runif(1)),D=1+rbinom(1,1,.5))
}
n.iter <- 100000
n.burnin <- 9500
n.thin <- floor((n.iter-n.burnin)/500)
tic <- proc.time()
coin <- jags(dataJags, inits, params, model.file=filein,n.chains=2, n.iter, n.burnin, n.thin,
DIC=TRUE, working.directory=working.dir, progress.bar="text")
print(coin,digits=3,intervals=c(0.025, 0.975))
This gives the following results
Inference for Bugs model at "BBCcoin.txt", fit using jags,
2 chains, each with 1e+05 iterations (first 9500 discarded),
n.thin = 181 n.sims = 1000 iterations saved
mu.vect sd.vect 2.5% 97.5% Rhat n.eff
p.biased 0.916 0.278 0.000 1.000 1.002 1000
pi[1] 0.500 0.000 0.500 0.500 1.000 1
pi[2] 0.802 0.160 0.308 0.974 1.002 1000
deviance 3.386 2.148 1.898 9.258 1.000 1000
For each parameter, n.eff is a crude measure of
effective sample size, and Rhat is the
potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)
pD = 2.3 and DIC = 5.7
DIC is an estimate of expected predictive error
(lower deviance is better).
which is consistent with the article claim that the probability of bias is about 92% (or equivalently that the posterior probability that $D=2$ is the "true" data generating process is about 8%).
So quite some model details omitted, that are not not so obvious, I think!