Monday, 11 November 2013


Despite the map here, I'm not going to talk about yet another fraction of the former Soviet Empire which is taken the form of a people's republic, possibly with witty British Ambassadors.

In fact, I'm going to talk about the Stan workshop that I have be to, earlier today, which was held at Imperial College. My friend Lea organised it and Mike Betancourt (who's actually in my department at UCL) run the show (brilliantly, it has to be said).

In the morning, Mike gave a brief overview of MCMC and introduced the basics of Hamiltonian Monte Carlo (I think this by Radford Neal is just a great introduction to the topic). Then in the afternoon he concentrated on Stan and rstan in particular (which, unsurprisingly, is the R interface to the actual HMC engine).

I think this was kind of the first of a potential series of similar talks/workshops and I found it very useful. Of course it's always difficult to strike a balance between how in depth you want to go with the theory and the examples, so for instance, I think a little more on the actual NUTS algorithm would have been helpful $-$ but as I said, I know full well how hard it is to do this, so well done, Mike!


  1. The link to the introduction (by Neil Radford) doesn't work...

  2. Thanks --- the correct link is this:

  3. That's Radford NEAL, not Neil Radford :-) A better link is this one:

    It's the final version of the chapter as it appeared in the Brooks et al. MCMC Handbook. There's also a great intro to MCMC methods by Charles Geyer that discusses issues like effective sample size. (We estimate effective sample size a little differently in Stan; the basic idea is that we use the same cross-chain plus within-chain variance estimate as used for R-hat --- there are details in the Stan manual.)

    All of the details of NUTS are in Hoffman and Gelman's arXiv paper; the final version will be out in JMLR soon:

    Michael generalized it to the geometric case for RHMC (coming to Stan reasonably soon):

    The other tricky business is adaptation of the step size and estimation of the mass matrix, both of which happen during warmup iterations in Stan. Step size is relatively easy to tune via a target acceptance rate. Michael and Mark Girolami have a paper coming out on hierarchical modeling with HMC and NUTS that provides a better target for the acceptance rate than the one that's standard for multivariate normals. So the default is going to be adjusted upward (I think to 0.8) in the next release of Stan.

    Mass matrix estimation is trickier because you don't want to overfit to early iterations before you hit the high-mass volume of the posterior. We allow either a fixed unit mass matrix, a diagonal mass matrix, or a full dense mass matrix. Radford discusses the effects of mass matrix on HMC. The dense and diagonal estimates are regularized point estimates from windows of samples. Michael's been tweaking this, too, and it's likely to be a bit different in Stan 2.1.

  4. That's a) kind of embarrassing; b) hilarious! Yes: I meant Radford Neal $-$ I don't know how it came out this way!
    In any case, thanks for the extensive comment!