Monday, 6 July 2015

Stress testing

Lately, we've been spending a lot of time "stress-testing" our method for the computation of the Expected Value of Partial Perfect Information (EVPPI $-$ I know: the terminology is a bit strange and possibly not-very helpful, as "perfect" information doesn't really exist, in statistical terms...).

I have mentioned this already here, here and here and the idea is to combine results from spatial statistics (and INLA) and Gaussian Process regression into the health economic problem. 

In a nutshell (I'll avoid all the technical details here), the "data" for our model consist on a vector of values $\mbox{NB}(\theta)$ (the "monetary net benefit", which determines the utility of a given health intervention) and a matrix of simulations from the (joint posterior) distributions of a set of relevant model parameters. The idea is that the multivariate parameter set can be split in a subset of "parameters of interest" ($\phi$), while the rest ($\psi$) are sort of "nuisance" or "unimportant" parameters. 

Following up on the work by Mark Strong et al, the relevant model can be written as
$$ \mbox{NB}(\theta) = g(\phi) + \varepsilon $$and the objective is to estimate the function $g(\phi)$, which is then used to compute the EVPPI. This saves up a huge computation time, in comparison to other methods.

In our model, we extended this framework and modelled
$$ \mbox{NB}(\theta) = H\beta + \omega + \varepsilon$$where $H\beta$ is a linear component depending on the simulations for all the "important" parameters, while $\omega$ is a spatially structured component, which accounts for the correlation among the important parameters. The big advantage is that using this formulation we are able to make inference based on INLA/SPDE, which is super-fast and can save up a lot of time even in comparison with the already fast "standard" Gaussian Process regression model.

Our first tests were giving very good results (as reported here). Then we've used a more complex model and found that while still being faster, our method was losing in accuracy for some specific parameters. This was a bummer, but it also meant that we had to go back and try and understand a bit better what was going on.

I'll make this sound very easy (when in fact Anna has spent a lot of time on this $-$ and wishing she never met me and started her PhD on this, I'm sure!), but eventually we figured out what the problem was. Firstly, in very complex situations (which are not that uncommon in real health economic evaluation problems), there may be quite a large correlation and non-linearity in the relationships among the parameters in $\phi$. This means that the combination of the standard linear predictor and the spatially structured effect cannot model properly the observed data, resulting in lower accuracy in the estimations. But, interestingly, extending $H\beta$ to include interactions among the relevant parameters can fix this problem, with only a small increase in computational cost.

Secondly (this is a bit more technical, but also quite interesting), the spatially structured component is based on constructing a mesh which describes the relationship between the parameters in a Euclidean space. If the "boundaries" of the resulting mesh are too close to the range of the observed points, then the estimation procedure will return several predictions at 0 $-$ in a rather vague sense, something like: the boundaries are areas where the estimated smooth curve is 0. But this means that in the computation of the EVPPI there is an artificially large number of 0 values, which produces an under-estimation of the "true" value.

We have fixed this by modifying the evppi function in the development version of BCEA and now the user can specify a non-linear part as well as fiddle with the INLA-specific parameters defining the mesh for the spatially structured component. The results seem to be much more accurate, still with some substantial computational savings. I'll try to put some R script to help people test the function (although I've also modified the help for evppi) to guide through the example involving the (simpler) Vaccine model

Monday, 22 June 2015

Job advert

This is an interesting post just advertised at Imperial College London by Marta.

Department of Epidemiology and Biostatistics
School of Public Health
Research Associate in Biostatistics
Salary: £33,410 to £42,380 per annum

Duration: 3 years fixed term

This is an exciting opportunity for a researcher with a PhD in statistics, biostatistics or a related quantitative subject to join the research team at the national MRC-PHE Centre for Environment and Health ( The post is based within the Department of Epidemiology and Biostatistics and will be line managed by Dr Marta Blangiardo.

The post holder will work on the MRC methodology funded grant: “A general framework to adjust for missing confounders in observational studies” to develop a Bayesian statistical approach to integrate different sources of data and to use the propensity score for missing data imputation in the context of epidemiological observational studies. This is a collaborative project between Imperial College, LSHTM and Cambridge MRC-BSU.
You should have a thorough working knowledge of Bayesian modelling and some experience of working with spatial statistics and of analysing epidemiological/biomedical data. Familiarity with R/BUGS is essential. You should be motivated and extremely organised with experience of working in multi-disciplinary teams.

This full-time post is based at the St Mary’s Campus, Paddington and will be fixed term for three years. For informal enquiries please contact Dr Marta Blangiardo: preferred method of application is online via our website at (please select “Job Search” then enter the job title or vacancy reference number into “Keywords”). Please complete and upload an application form as directed quoting reference number SM137-15AL.

Alternatively, if you are unable to apply online, please email to request an application form. Closing date: 22nd July 2015 (midnight BST)

Back log

Last week I went to Madrid to examine a PhD (I've mentioned this in another post). The thesis was focussed on a mixture of computer science and health economics $-$ in particular, much of the work was about developing suitable algorithms for running efficiently Markov models using extensions of tools such as Influence Diagrams

I have done some work on related issues when I was doing my own PhD, back in the 1900's, so I was interested in this work and it was good to being "forced" to read about it, now.

The main innovation is the development of algorithms and a specific software, called OpenMarkov. From my perspective, this can be a very good tool, especially when compared with Excel, which is still used by many practitioners to develop their Markov models for economic evaluation.

The main advantage over Excel is that OpenMarkov allows the user to specify a graphical structure for the model and then define a set of conditional probability tables to determine the transition probabilities from one state to another. Interestingly, it is also possible to use (some) probability distributions to represent these, which is good to then perform probabilistic sensitivity analysis (PSA).
What is currently missing is the possibility of propagating evidence to estimate the value of the main parameters (ie the transition probabilities, or functions thereof). So, you have to "know" what the values or distributions are for the transition probabilities when running the model. In this sense, I see OpenMarkov (at least in its current version!) as an "advanced version" of Excel-based models.

This of course has implications in terms of PSA, since while you can allow for multiple parameters to be modelled using a probability distribution, you're likely to miss on the potential underlying correlation. A full Bayesian model (for example like those described in BMHE) would overcome this problem $-$ perhaps at the expense of increasing the model complexity. But, as I said in my talk, if you have complex problems and you want to model them efficiently, then you probably shouldn't be too surprised that the models are complex and suitable tools are needed...

All in all, I did like the idea of OpenMarkov quite a lot, though $-$ particularly, I thought that there may be quite some scope for integrating it with R/BCEA (or similar tools), which would be very useful in many applied cases (as Markov models are extremely popular in health economics!). I may even try and play around with it, if I find a good student willing to do so...

Monday, 8 June 2015

Survival of the fittest (health economic model)

To make up for the fact that we've missed a couple of slots over the past months in our seminar series, we thought we organised a more structured event. 

So, our next seminar will in fact be a workshop and will be held at UCL on 7 July from 1.30pm to 4.30pm, room 102 on the first floor of 1-19 Torrington Place

The title is "Statistical issues in modelling survival data for health economic evaluation" and we have an exciting line-up of speakers. 

Nick Latimer will first introduce the general issue of statistical modelling of survival data and extrapolation in health economics and then discuss the interesting issue of treatment switching. Chris Jackson will present and discuss flexible methods to perform extrapolation of survival data and the combination of evidence. Finally, Patricia Guyot will then discuss methods to reconstruct individual survival data from published survival curves.

The tentative timetable of the event is the following:
  • 1.30-1.50: Nick Latimer (HEDS, ScHARR. University of Sheffield): Standard survival analysis techniques - methods typically used in HTA
  • 1.50-2.35: Chris Jackson (MRC Biostatistics Unit, Cambridge): Improving long-term survival estimation through flexible models, combining evidence and accessible software
  • 2.35-2.50: Coffee break
  • 2.50-3.15: Nick Latimer (HEDS, ScHARR. University of Sheffield): Methods for adjusting survival estimates in the presence of treatment switching
  • 3.15-4.00: Patricia Guyot (Mapi, Utrecht, Netherlands): Reconstructing Kaplan Meier data from published survival curves
  • 4.00-4.30Discussion
If you didn't know about this from our mailing list and still would like to attend, please drop me an email, so we can arrange and avoid space issues!

Video seminar

Later this week, I'm off to Madrid to examine a PhD candidate at UNED (that's the Spanish Open University). This is interesting work on probabilistic network for health economic evaluation, so it should be good.

As part of my trip, I'll also give a seminar in the Master in Advanced Artificial Intelligence programme $-$ I'll sort-of use the talk I gave at the RSS workshop on Bayesian methods in health economics, although I have modified and included some thoughts from my talk at the webinar.

The coolest thing is that they will broadcast the talk on Thursday at 5pm (Spain time): here's the link. I'll then link to the file from our research group webpage $-$ I think this kind of things can be quite useful, although it does mean I'll have to be on my best behaviour while giving the talk...

Monday, 1 June 2015

My talk @ the London Machine Learning Meetup

This Wednesday I've been invited to give a talk at the London Machine Learning Meetup $-$ I don't have a lot of experience of these meetings but I'm told that the audience is typically industry practitioners and some academics, ranging from novices to experienced Machine Learning experts. 

I will give my introduction to INLA (although I've made a few changes to the slides I presented in Rotterdam and then in Girona a while back). Apparently, 150 have said that they are going with 90 more on a waiting list of some sort, so I better rehearse the talk!

Thursday, 28 May 2015

Beta unblockers

A couple of weeks ago, we've uploaded the new version of BCEA on CRAN, to include the function implementing our method for the computation of the EVPPI based on INLA-SPDE $-$ I've also already mentioned this here.

While this is a stable version, we are still continuously testing many of the functions and so I thought I'd keep a beta version on the BCEA website (I've uploaded both a .tar.gz and a .zip file). I will continue to modify this beta version in the next few months, including minor changes/improvements $-$ I don't think these will really affect the basic use of BCEA, but may be relevant in terms of advanced users, for example for customisation of the graphs.

So, BCEA users out there: do tell us what you really, really want $-$ we may even try and implement it for you...

Thursday, 21 May 2015

Bayes 2015

This week I'm in Basel for Bayes 2015. As usual lots of interesting talks and a very healthy mix of perspectives $-$ if perhaps a bit less so than usual in terms of topics. I like this conference as it's always very helpful to get interesting ideas $-$ often people do work in areas that are mostly unrelated with what I am working on, but there are commonalities and ideas for potential collaborations.

In keeping with last year's theme (when we tried hard to find Bayes' tomb in Bunhill Fields), almost by chance I stumbled on Jacob Bernoulli's tomb, yesterday during the city tour.

We had already decided that next year's conference will be in Leuven (but the news is now the social even has been also decided: a beer sampling tour) and we've also "volunteered" Virgilio to have the 2017 edition in Castilla-La Mancha $-$ I guess we'll need to find an angle to prove that Don Quixote was a Bayesian...