An Ensemble MCMC Sampler for Black-Box Distributions

Gregor Boehl
Uni Bonn

Sampling?

Sampling.

  • drawing samples from probability distribution $\pi(x)$
  • important when interested in $$ E_\pi \left[ h(x) \right] = \int h(x) \pi(x) dx $$
  • approximate $E_\pi[h(x)]$ via $N$ samples from $\pi$: $$ E_\pi \left[ h(x) \right] \approx \frac{1}{N} \sum_i^N h(x_i) $$

Example:
Estimation of (DSGE) models

  • posterior of (DSGE) model (parameters $x$, data $Y$): $$p(x|Y) = \frac{p(Y|x) p(x)}{p(Y)}$$
  • interested in (measure of) posterior
  • can evaluate density $\pi(x) \propto p(x|Y)$, rest unknown


Problem: posterior sample required

Example: Posterior of DSGE model

Example: Effects of QE


simulations drawn from the distribution

This Paper: DIME MCMC

    Gradient-free global multi-start MCMC sampler with:

  • endogenous & adaptive proposal distribution
  • embarrassingly parallel
  • robust for multimodal distributions
  • good/fast convergence

  • "Swiss Army knife of MCMC sampling"

Literature

  • Disclaimer.
  • General: Metropolis et al. (1953), Geman² (1984), Chib & Ramamurthy (2010), Herbst & Schorfheide (2015)
  • (Astro-)Physics: Ter Braak (2006), Vrugt et al. (2009), Goodman & Weare (2010), Foreman-Mackey et al. (2013)
  • Theory: Roberts et al. (1997), Haario et al. (2001), Roberts & Rosenthal (2001), Robert & Casella (2004), Roberts & Rosenthal (2007)

Outline

  1. Two benchmarking distributions
  2. Random Walk Metropolis Hastings: Problems
  3. Astrophysics: Differential Evolution MCMC
  4. The Solution: DIME MCMC

A toy problem

$$ \pi_M(x) = \lambda P(X=x) + (1-\lambda) P(Y=x) $$
  • $\lambda \in (0,1)$
  • $X \sim \mathcal{N}_n(+\mu,\sigma I_n)$
  • $Y \sim \mathcal{N}_n(-\mu,\sigma I_n)$
  • $n=35$ dimensions
  • $\mu=(m/2,0,\cdots,0)'$

A toy problem

$$ \pi_M(x) = \lambda P(X=x) + (1-\lambda) P(Y=x) $$

A toy problem

$$ \pi_M(x) = \lambda P(X=x) + (1-\lambda) P(Y=x) $$

A benchmark model

  • model of Smets & Wouters (2007)
  • medium scale DSGE
  • Features:
    capital, capital adjustment, capital utilization, habit formation, sticky prices, ...
  • Observables:
    Output, inflation, investment, consumption, wages, labor, FFR

A benchmark model


Rare footage of S&W posterior

A benchmark model


Very rare footage of S&W posterior

A benchmark model


Extremely rare footage of S&W posterior

Random Walk MH

  • most fundamental MCMC method
  • generate proposal $Y_i$: $$ {Y_i} = X_i + \varepsilon_i $$
  • proposal distribution $\varepsilon_i \sim \mathcal{N}(0, \Sigma)$
  • accept proposal with $$ P (X_{i+1} = {Y}_{i}) = \min \left\{1, \frac{\pi({Y}_{i})}{\pi(X_{i})}\right\} $$

RWMH

RWMH: Problem No 1

inefficient proposal distribution

  • posterior is large
  • posterior may be discontinuous
  • posterior is a black box
  • posterior may be assymetic/streched/...

▶ choice of proposal distribution difficult

RWMH: Problem No 2

Parallellization

  • posterior may be expensive to evaluate
  • RWMH chain is state-dependent/recursive


▶ No chance to run chains in parallel

RWMH: Problem No 3

multimodal distributions

RWMH: Problem No 4

convergence

  • chain may take a long time to "find" typical set
  • related to (lack of) good proposal distribution
  • related to (lack of) parallellization

Differential-Evolution MCMC

  • consider Ensemble of $n_c$ chains
  • for each chain $j$ draw two reference chains $\{k,l\}$
  • $$ {Y}_{i,j} = X_{i,j} + \gamma (X_{i,k} - X_{i,l}) + \epsilon_{i,j} $$
  • with $k \neq j$ and $l \neq k \land l \neq j$
  • accept/reject (as before) $$ P (X_{i+1,j} = {Y}_{i,j}) = \min \left\{1, \frac{\pi({Y}_{i,j})}{\pi(X_{i,j})}\right\} $$

DE-MCMC

DE-MCMC

DE-MCMC: (no) Problem No 1

inefficient proposal distribution

  • proposal distribution now endogenous
  • proposal distribution adapts to shape of posterior
  • proposal distribution now affine-invariant

DE-MCMC: (no) Problem No 2

parallellization

  • trivial to parallellize (after modification)
  • chains now "exchange" information

DE-MCMC: (still) Problem 3

multimodal distributions

DE-MCMC: (still) Problem 3

multimodal distributions

DE-MCMC: (still) Problem 3

multimodal distributions

DE-MCMC: (still) Problem 4

convergence

DIME MCMC

  • for each chain, draw local/global kernel with probability $\chi$
  • local kernel: DE MCMC (in essence)
  • global kernel: Independece sampling (sort of)
  • use chosen kernel to generate ${Y}_{i,j}$
  • accept/reject with $$ P (X_{i+1,j} = {Y}_{i,j}) = \min \left\{1, \frac{\pi({Y}_{i,j})}{\pi(X_{i,j})}\omega_{i,j}\right\} $$

DIME: global kernel

  • generate draws: $$ \begin{align} {Y}_{i,j} &\sim t_\nu \left(\mu_i, \Sigma_i\right)\\ \omega_{i,j} &= \frac{f^t(X_{i,j})}{f^t({Y}_{i,j})} \end{align} $$
  • update proposal distribution:
    $$ \scriptstyle \begin{align} \mu_i =& \left(\frac{W_{i-1}}{W_i}\right) \mu_{i-1} + \left(\frac{w_i}{W_i}\right) \mu_i^\mathbf{X}, \\ \Sigma_i =& \left(\frac{W_{i-1}}{W_i}\right) \Sigma_{i-1} + \left(\frac{w_i}{W_i}\right) \Sigma_i^\mathbf{X},\\ W_{i} =& W_{i-1} + w_i. \end{align} $$ $$ \scriptstyle \begin{align} w_i &= a_i \sum_j^{n_c} \pi(X_{i,j}) \\ a_i &= \frac{1}{n_c}\sum_j^{n_c} \mathbf{1}_{\left\{X_{i,j} \neq {Y}_{i-1,j}\right\}} \left(X_{i,j}\right) \end{align} $$

DIME MCMC

DIME MCMC: (no) Problem No 1

inefficient proposal distribution

  • proposal distribution now endogenous
  • proposal distribution adapts to shape of posterior
  • proposal distribution now affine-invariant


Inherits from DE-MCMC

DIME MCMC: (no) Problem No 2

parallellization

  • trivial to parallellize (after modification)
  • chains now "exchange" information


Inherits from DE-MCMC

DIME MCMC: (no) Problem 3

multimodal distributions

DIME MCMC: (no) Problem 3

multimodal distributions

DIME MCMC: (no) Problem 3

multimodal distributions

DIME MCMC: (no) Problem 3

multimodal distributions

DIME MCMC: (no) Problem 4

convergence

Implementations

Application: HANK

  • "HANK" (Heterogeneous Agent New Keynesian model):
    • cross sectional distribution of households
    • precautionary savings
    • expensive to simulate/evaluate likelihood
  • estimation takes about 7 days on 48 cores
  • Result: Precautionary motive plays minor role for propagation of business cycle shocks

Thank you!

  • Title?
  • "Swiss Army knife of MCMC sampling":
    • endogenous & adaptive proposal distribution
    • embarrassingly parallelizable
    • robust for multimodal distributions
    • fast convergence