Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

STA 360/602L: Module 3.8

MCMC and Gibbs sampling II

Dr. Olanrewaju Michael Akande

1 / 15

Example: bivariate normal

  • Consider

    (θ1θ2)N[(00),(1ρρ1)]

    where ρ is known (and is the correlation between θ1 and θ2).

2 / 15

Example: bivariate normal

  • Consider

    (θ1θ2)N[(00),(1ρρ1)]

    where ρ is known (and is the correlation between θ1 and θ2).

  • We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.

2 / 15

Example: bivariate normal

  • Consider

    (θ1θ2)N[(00),(1ρρ1)]

    where ρ is known (and is the correlation between θ1 and θ2).

  • We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.

  • For this density, turns out that we have

    θ1|θ2N(ρθ2,1ρ2)

    and

    θ2|θ1N(ρθ1,1ρ2)

2 / 15

Example: bivariate normal

  • Consider

    (θ1θ2)N[(00),(1ρρ1)]

    where ρ is known (and is the correlation between θ1 and θ2).

  • We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.

  • For this density, turns out that we have

    θ1|θ2N(ρθ2,1ρ2)

    and

    θ2|θ1N(ρθ1,1ρ2)

  • While we can easily sample directly from this distribution (using the mvtnorm or MASS packages in R), let's instead use the Gibbs sampler to draw samples from it.

2 / 15

Bivariate normal

First, a few examples of the bivariate normal distribution.

(θ1θ2)N[(00),(1001)]

3 / 15

Bivariate normal

(θ1θ2)N[(00),(1001)]

4 / 15

Bivariate normal

(θ1θ2)N[(02),(10.50.52)]

5 / 15

Bivariate normal

(θ1θ2)N[(02),(10.50.52)]

6 / 15

Bivariate normal

(θ1θ2)N[(11),(10.90.91.5)]

7 / 15

Bivariate normal

(θ1θ2)N[(11),(10.90.91.5)]

8 / 15

Back to the example

  • Again, we have

    θ1|θ2N(ρθ2,1ρ2);    θ2|θ1N(ρθ1,1ρ2)

  • Here's a code to do Gibbs sampling using those full conditionals:

    rho <- #set correlation
    S <- #set number of MCMC samples
    thetamat <- matrix(0,nrow=S,ncol=2)
    theta <- c(10,10) #initialize values of theta
    for (s in 1:S) {
    theta[1] <- rnorm(1,rho*theta[2],sqrt(1-rho^2)) #sample theta1
    theta[2] <- rnorm(1,rho*theta[1],sqrt(1-rho^2)) #sample theta2
    thetamat[s,] <- theta
    }
  • Here's a code to do sample directly instead:

    library(mvtnorm)
    rho <- #set correlation; no need to set again once you've used previous code
    S <- #set number of MCMC samples; no need to set again once you've used previous code
    Mu <- c(0,0)
    Sigma <- matrix(c(1,rho,rho,1),ncol=2)
    thetamat_direct <- rmvnorm(S, mean = Mu,sigma = Sigma)
-->
9 / 15

More code

See how the chain actually evolves with an overlay on the true density:

rho <- #set correlation
Sigma <- matrix(c(1,rho,rho,1),ncol=2); Mu <- c(0,0)
x.points <- seq(-3,3,length.out=100)
y.points <- x.points
z <- matrix(0,nrow=100,ncol=100)
for (i in 1:100) {
for (j in 1:100) {
z[i,j] <- dmvnorm(c(x.points[i],y.points[j]),mean=Mu,sigma=Sigma)
}
}
contour(x.points,y.points,z,xlim=c(-3,10),ylim=c(-3,10),col="orange2",
xlab=expression(theta[1]),ylab=expression(theta[2]))
S <- #set number of MCMC samples;
thetamat <- matrix(0,nrow=S,ncol=2)
theta <- c(10,10)
points(x=theta[1],y=theta[2],col="black",pch=2)
for (s in 1:S) {
theta[1] <- rnorm(1,rho*theta[2],sqrt(1-rho^2))
theta[2] <- rnorm(1,rho*theta[1],sqrt(1-rho^2))
thetamat[s,] <- theta
if(s < 20){
points(x=theta[1],y=theta[2],col="red4",pch=16); Sys.sleep(1)
} else {
points(x=theta[1],y=theta[2],col="green4",pch=16); Sys.sleep(0.1)
}
}
10 / 15

MCMC

  • Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).
11 / 15

MCMC

  • Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).

    • Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.

    • Monte Carlo: simulation.

11 / 15

MCMC

  • Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).

    • Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.

    • Monte Carlo: simulation.

  • MCMC provides an approach for generating samples from posterior distributions.

11 / 15

MCMC

  • Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).

    • Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.

    • Monte Carlo: simulation.

  • MCMC provides an approach for generating samples from posterior distributions.

  • From these samples, we can obtain summaries (including summaries of functions) of the posterior distribution for θ, our parameter of interest.

11 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
12 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.

  • Let θ(0) be an initial value used to start the chain (should not be sensitive).

12 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.

  • Let θ(0) be an initial value used to start the chain (should not be sensitive).

  • MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s1), but not on θ(1),,θ(s2).

12 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.

  • Let θ(0) be an initial value used to start the chain (should not be sensitive).

  • MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s1), but not on θ(1),,θ(s2).

  • This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.

12 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.

  • Let θ(0) be an initial value used to start the chain (should not be sensitive).

  • MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s1), but not on θ(1),,θ(s2).

  • This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.

  • The theory of Markov Chains (structure, convergence, reversibility, detailed balance, stationarity, etc) is well beyond the scope of this course so we will not dive into it.

12 / 15

How does MCMC work?

  • Let θ(s)=(θ(s)1,,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.

  • Let θ(0) be an initial value used to start the chain (should not be sensitive).

  • MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s1), but not on θ(1),,θ(s2).

  • This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.

  • The theory of Markov Chains (structure, convergence, reversibility, detailed balance, stationarity, etc) is well beyond the scope of this course so we will not dive into it.

  • If you are interested, consider taking courses on stochastic process.

12 / 15

Properties

  • Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.
13 / 15

Properties

  • Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.

  • The chain will wander around our parameter space, only remembering where it had been in the last draw.

13 / 15

Properties

  • Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.

  • The chain will wander around our parameter space, only remembering where it had been in the last draw.

  • We want to have our MCMC sample size, S, big enough so that we can

    • Move out of areas of low probability into regions of high probability (convergence)

    • Move between high probability regions (good mixing)

    • Know our Markov chain is stationary in time (the distribution of samples is the same for all samples, regardless of location in the chain)

13 / 15

Properties

  • Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.

  • The chain will wander around our parameter space, only remembering where it had been in the last draw.

  • We want to have our MCMC sample size, S, big enough so that we can

    • Move out of areas of low probability into regions of high probability (convergence)

    • Move between high probability regions (good mixing)

    • Know our Markov chain is stationary in time (the distribution of samples is the same for all samples, regardless of location in the chain)

  • At the start of the sampling, the samples are not from the posterior distribution. It is necessary to discard the initial samples as a burn-in to allow convergence. We'll talk more about that in the next class.

13 / 15

Different flavors of MCMC

  • The most commonly used MCMC algorithms are:

    • Metropolis sampling (Metropolis et al., 1953).
    • Metropolis-Hastings (MH) (Hastings, 1970).
    • Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith, 1990).
  • Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174).

14 / 15

Different flavors of MCMC

  • The most commonly used MCMC algorithms are:

    • Metropolis sampling (Metropolis et al., 1953).
    • Metropolis-Hastings (MH) (Hastings, 1970).
    • Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith, 1990).
  • Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174). the first two

  • Overview of MH - Chib & Greenberg (1995, The American Statistician).
14 / 15

Different flavors of MCMC

  • The most commonly used MCMC algorithms are:

    • Metropolis sampling (Metropolis et al., 1953).
    • Metropolis-Hastings (MH) (Hastings, 1970).
    • Gibbs sampling (Geman & Geman, 1984; Gelfand & Smith, 1990).
  • Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174). the first two

  • Overview of MH - Chib & Greenberg (1995, The American Statistician).

  • We will get to Metropolis and Metropolis-Hastings later in the course.

14 / 15

What's next?

Move on to the readings for the next module!

15 / 15

Example: bivariate normal

  • Consider

    (θ1θ2)N[(00),(1ρρ1)]

    where ρ is known (and is the correlation between θ1 and θ2).

2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow