Gianluca Baio's blog: October 2015

Friday, 30 October 2015

Bayes 2016

We're finalising the details for our next Bayes Pharma conference $-$ this time we're going to Belgium, in Leuven.

We've now opened the call for abstracts, which can be sent by email (to this address), providing Title + Authors + Max 300 words, before March 1st 2016. We'll then work quickly to notify acceptance by around March 14th 2016.

The programme sounds interesting (I want to say "as usual" $-$ I know there's a bit of conflict there, given I'm in the Scientific Committee, but I do think so!), with the following invited speakers:

Greg Campbell, FDA
Mike Daniels, U Texas
Kyle Wathen, Johnson and Johnson
Tarek Haddad, Medtronic
Martin Posch, U Vienna
Alberto Sorrentino, Genova
Robert Noble, GSK

Looking forward to this already!

Thursday, 29 October 2015

Our new R package

As part of the work she's doing for her PhD, Christina has done some (fairly major, I'd say!) review of the literature about prevalence studies on PCOS $-$ that's a rather serious, albeit probably fair to say quite under-researched area.

When it came to analysing the data she had collected, naturally I directed her towards doing some Bayesian modelling. In many cases, these are not too complicated $-$ often the outcome is binary and so "fixed" or "random" effect models are fairly simple to structure and run. One interesting point was that, because there often wasn't very good or comprehensive evidence, setting up the model using some reasonable (and, crucially, fairly easy to elicit from clinicians) prior information did help in obtaining more stable estimates.

So, because we (she) have spent quite a lot of time working on this, I thought it would be good to structure all this into a R package. All of our models are actually run using JAGS as interfaced using the package R2jags and, I think, the nice idea is that in R the user can specify the kind of model they want to use. Our package, which incidentally is called bmeta, then builds a suitable model file for the assumptions selected in terms of outcome data and priors and then runs it via R2jags. The model file that is generated is automatically saved on the user's computer and can then be re-used as a template or modified as necessary (eg to include different priors or more complex structures).

Currently, Christina has implemented 22 models (ie combinations of data model and prior, including variations of fixed vs random effects) and in the package we have also implemented several graphical diagnostics, including:

forest plots to visualise the level of pooling of the data
funnel plots to examine publication bias
diagnostics plots to examine convergence of the underlying MCMC algorithm

The package will be on CRAN in the next couple of days, but it's already downloadable from this webpage. We'll also put some more structured manual/guide shortly.

Tuesday, 20 October 2015

Solicited

Quite a while ago, I have received an email by Samantha R. from Udemy pointing me towards this article, discussing the "difference between data science and statistics" (I have to confess that I don't really know Udemy, apart from having looked at that article and, despite having quickly searched for her, I wasn't able to find any link or additional information). Given he has asked me to comment on the article, which I do now with over a month delay $-$ apologies Samantha, if you're reading this!

So: I have to say that, while I don't think it's fair or wise to just discard the whole of "data science" as a re-branding of statistics, I don't agree 100% with some of the points raised in the article. For example, I am not sure I buy the distinction between statistics as a discipline of the old world and data science (DS) as one for the modern world. Certainly, if a fundamental connotation of DS is computing, then obviously it will be relevant to the modern world, where cheap(er) and (very) powerful computers are available. But I certainly do not think that this does not apply to statistics too.

I am not sure about the distinction between "dealing" and "analysing" data, either. In my day-to-day job as a (proud!) statistician, I do have to do lots of dealing with data $-$ one of the most obvious example is our work on administrative databases (eg THIN for our work on the Regression Discontinuity Design in epidemiology); eventually, the dataset becomes very rich and with lots of potential for interesting analysis. But how we get there is an equally long (if not longer!) process in which we do lots of dealing with the "dirt".

The third point I'm really not convinced by is when Samantha says that "Statistics, on the other hand, has not changed significantly in response to new technology. The field continues to emphasize theory, and introductory statistics courses focus more on hypothesis testing than statistical computing." Seems to me that this is far from true and actually we do place a lot more emphasis on computing than we used to 10-15 years ago in our introductory courses. And computation is playing a more and more central role in the development of statistics $-$ with amazing results, a couple I'm more familiar with: Stan and INLA. I would definitely see these developments as statistics $-$ definitely not DS.

In general, I think that the main premise of DS (as I understand it) that data should be queried to tell you things takes away basically all the fun of my job, which is about modelling, making assumptions which you need to carefully justify so that other people are persuaded that they are reasonable.

Still, I think there's plenty of data for statisticians and data scientists to co-inhabit this world and I most certainly don't take a "Daily Mail-esque" view that data scientists are coming to take our jobs and stealing our women. I think am allowed to say this, as somebody who has actually come to a different country to steal their statistical jobs $-$ at least I had the decency of bringing my own woman with me (well, actually Marta did bring her man with her as when we moved to London she was the one with a job. But that's another story...).

Friday, 16 October 2015

Not so NICE...

Earlier today I caught this bit on the news $-$ that's the story of the latest NICE deliberation on ataluren, a treatment for Duchenne muscular dystrophy. That's a rare and horrible condition $-$ no doubt about it. The story is mainly that NICE has preliminary decided not to recommend ataluren for reimbursement (the full Evaluation consultation document is here).

I thought the report in the news was not great and, crucially (it seems to me), it missed the point. The single case of any individual affected is heart-breaking, but the media are not doing a great service to the public in picturing (more or less) NICE, or as they say "the drug watchdog", as having rejected the new drug because it's too expensive.

ITV reporting quotes the father of one of the children affected as saying:

How do we tell Archie he is not allowed a drug that will keep him walking and living for longer because NHS England and drug companies cannot agree on a price?

That's the real problem, I think $-$ the presumption (which most of the media do nothing to explore or explicitly state) is that the drug will keep the affected children walking and living longer. Trouble is that this is not a certainty at all $-$ on the basis of the evidence currently available. The Evaluation consultation document says (page 34)

There were no statistically significant differences in quality of life betweenthe ataluren and placebo groups. The company stated there was apositive trend towards improved quality of life with ataluren 40 mg/kg dailyin the physical functioning subscale. The company submission alsodescribed a positive effect on school functioning and a negative trend inemotional and social subscales.

So the point, I think, is that if the treatment was associated with much more definitive evidence, then the discussion would be totally different. What has not been mentioned, at least not that I have seen, is that the estimated total cost per person per year of
treatment with ataluren of £220,256 is affected by the uncertainty in the evidence and assumptions enconded in the model presented for assessment. And it is this uncertainty that needs to be assessed and carefully considered...

Tuesday, 6 October 2015

PhDeidippides

Anthony (who's doing good work in his PhD project) also doubles as a runner and has written a nice post for the Significance website.

Clearly (just look at the numbers!), for many people this is a serious issue $-$ the fact that you can't run officially a big Marathon such as London's unless you've been lucky enough to win your place through a ballot.

I have to say my only experience with long-distance running was a few years back at Florence's Marathon, for which I was not officially registered $-$ for that matter, I wasn't even planning on finishing it, just do a bit of the whole thing and then go back home, so I guess it didn't really matter that I didn't get a medal or something...

I'm not sure that guaranteeing a place to somebody who's been turned down a sufficient number of times in a row would solve the problem, though $-$ people must get fed up with the wait?

Gianluca Baio's blog