Gianluca Baio's blog: June 2013

Friday, 21 June 2013

The pretender

Tonight I'll pretend to still be very, very young and go catch a Ryanair flight from Stansted, so that I can be all revenge-y at my brother's stag do, tomorrow morning.

I have pointedly avoided Ryanair for quite a while now, on the grounds that as you grow "less younger", you deserve a bit better quality (although most of the times I end up flying Easyjet, which are hardly the most comfy airlines around... But at least they fly from much closer to my home).

But, ubi maior... I will leave tonight at about 3am so that I can be in the Cambridgeshire countryside with hopefully some time to spare before they board my flight. Then, I'll be finally reunited with the travelling part of my family (I can't be sure at this point, but I seem to remember having wife & kid, although, they have decided a few weeks ago to abandon me to my own devices and flee the English winter in search for better climates).

Bayes 250 - as it happened

I was going to post some impressions on the two-day Bayes 250 conference, but Christian beat me to it with his perfect recollection of the event.
I thought it was a good workshop with interesting talks and an occasion for camaraderie. I enjoyed being there and I agree with Christian that Dennis Lindley's interview is something we Bayesian should use as promotional material!

Tuesday, 18 June 2013

BCEA 1.3.0

After months of work (although to be fair, we haven't worked 100% full time on this), Andrea and I are nearly ready to publish the next release of BCEA.

Andrea has done a brilliant job and is responsible for most of the good new features (NB: see what I'm doing here? Subtly putting the blame on him if things go tits up, but also appearing like a magnanimous supervisor who's only pretending not to deserve full credit, if they don't...).

But, seriously, I really mean that he's been brilliant, especially since he had to put up with my being extremely picky on details like font size and similar! Anyway, we're really excited (well, at least as excited as you can be about a computer package) about the new features, which basically are of three types.

The first one is in terms of the graphical capabilities of the package. We have implemented all the graphical functions in ggplot2, which now complements the base graphical engine.
The second one is that we have included the possibility of running multiple health economic comparisons in a single evaluation. In the current version of BCEA, you are allowed to have many interventions, but the comparisons are performed pairwise against one of them, which the user defines as the "reference" intervention. Now, it will be possible to produce an analysis of all the interventions jointly. This has clear links with multiple treatment comparisons (as pointed out in chapter 9 here).
The third new feature allows the user to compute the expected value of partial information (EVPPI), with respect to one of the parameters included in the model. This is a very important aspect of the process of probabilistic sensitivity analysis and normally is performed using a two-stage MCMC process (which is explained in chapter 3 and 4 of BMHE). But this can be (and nearly always is) a very computationally intensive process. Also, you can't use too few iterations in either of the two MCMC stages, because that has a crucial impact on the precision of the results. Also, it is difficult to standardise the analysis using the two-stage MCMC approach, because it depends very much on the model being fitted. However, Mohsen Sadatsafavi and colleagues have recently published a paper in which they found a clever way of approximating the EVPPI, once the original model has been run (ie with a single MCMC step, which you would do anyway). I wasn't aware of the paper, but after Mohsen contacted me and pointed it out, I decided we should implement it in BCEA.

I'm not completely sold on the ggplot2 thing. I think it can be very good and gives you a lot of freedom and flexibility. But sometimes it feels like overkilling it, really. But, for example, it will be helpful in problems with multiple interventions, where it is more important that the user can customise the resulting graphs, given that they can be very cluttered, if there are many interventions being compared at the same time (at the moment we allow a maximum of 6).

In the next couple of days we'll release the new version as some sort of beta test. We have done some tests ourselves, of course, and everything seems to work OK. But of course it would be good if we could get more feedbacks on different problems.

Thursday, 13 June 2013

Big in Japan

Inspired by this post on R-bloggers, I decided to check how BCEA was doing. Unfortunately, it does not feature in the top 100 most downloaded R packages. However, I think it's doing well $-$ considering the book (which is the main medium of advertising of the package) has been out for only a few months (since October last year) and it's kind of a specialised software, which basically you only need if you do health economic evaluations...

I've used some simple R code to download the log files containing all hits to http://cran.rstudio.com/ since October of 2012. Once the files (in .csv format, compressed in .gz files, one per each day) are downloaded, I have R extract the original file and then create a table, only selecting the records for BCEA.

The resulting dataset contains the date(s) and time(s) in which the library has been downloaded from CRAN, some information about the R version and architecture of the person who has downloaded the package, as well as their country.

Overall, BCEA has been officially downloaded 862 times (I suppose I should have a big celebration as soon as I hit 1000); most of the times, the download was from a user in the US (185). Surprisingly, BCEA is big in Japan (135 downloads). I did not see this coming, I have to say, but 日本ありがとう$-$ that's "thank you Japan", for those of you who can't speak Japanese (or can't use Google Translate).

Here's the (quickly prepared and hence not particularly elegant, nor necessarily super-efficient) code to download and format the data:
start <- as.Date('2012-10-01')
today <- as.Date('2013-06-12')
all_days <- seq(start, today, by = 'day')
year <- as.POSIXlt(all_days)$year + 1900
urls <- paste0('http://cran-logs.rstudio.com/', year, '/', all_days, '.csv.gz')
file <- basename(urls)
download.file(urls[1], file[1])
data <- read.table(gzfile(file[1]),sep=",",header=TRUE)
data <- data[data$package=="BCEA",]
for (i in 2:length(urls)) {
download.file(urls[i], file[i])
tmp <- read.table(gzfile(file[i]),sep=",",header=TRUE)
tmp <- tmp[tmp$package=="BCEA",]
data <- rbind(data,tmp)
}
data <- na.omit(data)

Monday, 10 June 2013

Running time

Marta and I are doing some re-analysis of our Eurovision contest (some context here and here). We have slightly modified our original model (mostly, I have navigated the mess in Marta's notation $-$ it's OK: I'm not at risk of her mighty wrath, as I've already joked about this with her!). Also, we've included the latest data (in fact, we're doing a little prediction as well, pretending that all the covariates are known, but the actual results aren't).

Last time around, Marta ran the model using parallelisation (through the R package snowfall) in WinBUGS on her Windows desktop, at work. The model is relatively complex (see here for a working description), but thanks to parallelisation the running time was not too bad (as I recall, about 3 hours to run 11000 iterations with a burn-in of 1000 and thinning of 20).

This time, I decided I would get my hands dirty on my Linux machine(s). To start with, I thought I'd give it a go using R2jags and its new(-ish) function jags.parallel, which is supposed to run multiple instances of the MCMC process at once. Unfortunately, that didn't work. I searched on line for some clue and found people sharing their common experiences. I don't really know what's failing, but JAGS complains that a node (which I'm actually passing as data) is not found.

So, thinking that running software natively rather than via emulation would be most efficient, I decided to install OpenBUGS and run the model using snowfall and R2OpenBUGS. This was better: I was able to check the model and load the data. But when it came to compiling the model, OpenBUGS took for ever $-$ I think for some reasons it was getting stuck with something. It was a bit surprising, as I thought that this would be a very robust and reliable alternative.

So, I decided to try and see how WinBUGS would do. I installed WINE and ran the model using R2WinBUGS and again through snowfall $-$ effectively replicating the exact procedure that Marta used last year. To my surprise, things have worked out quite OK and I shaved about an hour from Marta's previous attempt (mainly through blocking of some of the parameters).

Wednesday, 5 June 2013

You sure, Mike?

Holly Baxter writes an interesting article on the Michael Douglas/HPV story (widely covered in the past few days: eg here, here or here) in today's Guardian, in which she basically tries to steal our thunder... ;-)

Holly argues that Douglas' conclusion that his recent throat cancer was due to his continued practice of cunnilingus (and subsequent HPV infection) is in fact not quite granted. She rightly points out to other risk factors, which seem to have an even bigger impact on the occurrence of HPV. Also, she rightly mentions that while cervical cancer is almost completely caused by HPV, other tumours (such as throat cancer) are affected by it, but not to the same degree.

I think her comments are pretty much in line with the approach we have taken for our study on HPV vaccination (which I have mentioned for example here and here). As part of the research, we have conducted a sample survey on about 600 individuals and have tried to identify potential risk factors (we're writing up results and hopefully will have something soon!) and, as it turns out, the main issues are:

sexual promiscuity;
smoking and early sexual debut;
education

all of which increase (to various degrees) the risk of HPV acquisition.

In addition, Holly argues that male vaccination is an important issue $-$ of course we agree and in fact, the (Bayesian Markov) model we're currently finalising tries to assess whether offering universal vaccination (in our case to boys and girls aged 12) turns out to be cost-effective.

On that, however, I think the jury is still out. From our (semi-finalised) results, there may be a potential for cost-effectiveness in universal vaccination (which would mean reducing the overall number of subjects potentially at risk, because of the effects of herd immunity, which we are accounting for in our model). However, the results seem to be sensitive to some of the parameters for which the evidence available is very limited.

I'll post more once we've finished our analyses.

Tuesday, 4 June 2013

Daily bias in the mail

David Spiegelhalter writes in his blog about this news headline on the Daily Mail (DM)'s website. According to the article, statins can weaken muscle and joints, raising the problem by up to 20%. As David points out, this claim is not justified by the evidence in the original article, since the DM is actually mixing up the meaning and interpretation of odds ratio and relative risk.

So, in truth, the actual absolute difference in the risk of having muscle and joints problems is merely 2% (extra risk for those using statins). And even then, all sorts of additional problems arise (eg, again as pointed out by David, the fact that individuals taking statins are more likely to be under constant observation and information about their overall health status more likely to be recorded).

But what's also interesting, I think, is the confirmation bias in most of the comments to the article on the DM's website. Many of those who bothered: a) reading the article; and b) commenting (which of course in itself does not make for a very representative sample!), confirmed that they too had problems which by now they can assert with absolute confidence are due to consumption of statins!

UseR 2013

Although the programme is quite interesting, I am not really involved in this conference (in fact I'm not even going $-$ even though, sometimes, I think it would be nice to spend the whole summer away at conferences!).

But Vir has just put this picture on his facebook profile, which I think is just awesome! (The conference will be in Castilla-La Mancha, the land of Don Quixote, hence the brilliant logo).

So, nerd as I am, I thought I wrote something to show I really liked it! I wonder if I could pull something like this off for next year's Bayes Pharma?...

Gianluca Baio's blog