Consider
(θ1θ2)∼N[(00),(1ρρ1)]
where ρ is known (and is the correlation between θ1 and θ2).
Consider
(θ1θ2)∼N[(00),(1ρρ1)]
where ρ is known (and is the correlation between θ1 and θ2).
We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.
Consider
(θ1θ2)∼N[(00),(1ρρ1)]
where ρ is known (and is the correlation between θ1 and θ2).
We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.
For this density, turns out that we have
θ1|θ2∼N(ρθ2,1−ρ2)
and
θ2|θ1∼N(ρθ1,1−ρ2)
Consider
(θ1θ2)∼N[(00),(1ρρ1)]
where ρ is known (and is the correlation between θ1 and θ2).
We will review details of the multivariate normal distribution very soon but for now, let's use this example to explore Gibbs sampling.
For this density, turns out that we have
θ1|θ2∼N(ρθ2,1−ρ2)
and
θ2|θ1∼N(ρθ1,1−ρ2)
While we can easily sample directly from this distribution (using the mvtnorm
or MASS
packages in R), let's instead use the Gibbs sampler to draw samples from it.
First, a few examples of the bivariate normal distribution.
(θ1θ2)∼N[(00),(1001)]
(θ1θ2)∼N[(00),(1001)]
(θ1θ2)∼N[(02),(10.50.52)]
(θ1θ2)∼N[(02),(10.50.52)]
(θ1θ2)∼N[(1−1),(10.90.91.5)]
(θ1θ2)∼N[(1−1),(10.90.91.5)]
Again, we have
θ1|θ2∼N(ρθ2,1−ρ2); θ2|θ1∼N(ρθ1,1−ρ2)
Here's a code to do Gibbs sampling using those full conditionals:
rho <- #set correlationS <- #set number of MCMC samplesthetamat <- matrix(0,nrow=S,ncol=2)theta <- c(10,10) #initialize values of thetafor (s in 1:S) {theta[1] <- rnorm(1,rho*theta[2],sqrt(1-rho^2)) #sample theta1theta[2] <- rnorm(1,rho*theta[1],sqrt(1-rho^2)) #sample theta2thetamat[s,] <- theta}
Here's a code to do sample directly instead:
library(mvtnorm)rho <- #set correlation; no need to set again once you've used previous codeS <- #set number of MCMC samples; no need to set again once you've used previous codeMu <- c(0,0)Sigma <- matrix(c(1,rho,rho,1),ncol=2)thetamat_direct <- rmvnorm(S, mean = Mu,sigma = Sigma)
See how the chain actually evolves with an overlay on the true density:
rho <- #set correlationSigma <- matrix(c(1,rho,rho,1),ncol=2); Mu <- c(0,0)x.points <- seq(-3,3,length.out=100)y.points <- x.pointsz <- matrix(0,nrow=100,ncol=100)for (i in 1:100) { for (j in 1:100) { z[i,j] <- dmvnorm(c(x.points[i],y.points[j]),mean=Mu,sigma=Sigma) } }contour(x.points,y.points,z,xlim=c(-3,10),ylim=c(-3,10),col="orange2", xlab=expression(theta[1]),ylab=expression(theta[2]))S <- #set number of MCMC samples;thetamat <- matrix(0,nrow=S,ncol=2)theta <- c(10,10)points(x=theta[1],y=theta[2],col="black",pch=2)for (s in 1:S) { theta[1] <- rnorm(1,rho*theta[2],sqrt(1-rho^2)) theta[2] <- rnorm(1,rho*theta[1],sqrt(1-rho^2)) thetamat[s,] <- theta if(s < 20){ points(x=theta[1],y=theta[2],col="red4",pch=16); Sys.sleep(1) } else { points(x=theta[1],y=theta[2],col="green4",pch=16); Sys.sleep(0.1) }}
Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).
Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.
Monte Carlo: simulation.
Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).
Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.
Monte Carlo: simulation.
MCMC provides an approach for generating samples from posterior distributions.
Gibbs sampling is one of several flavors of Markov chain Monte Carlo (MCMC).
Markov chain: a stochastic process in which future states are independent of past states conditional on the present state.
Monte Carlo: simulation.
MCMC provides an approach for generating samples from posterior distributions.
From these samples, we can obtain summaries (including summaries of functions) of the posterior distribution for θ, our parameter of interest.
Let θ(s)=(θ(s)1,…,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
Let θ(0) be an initial value used to start the chain (should not be sensitive).
Let θ(s)=(θ(s)1,…,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
Let θ(0) be an initial value used to start the chain (should not be sensitive).
MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s−1), but not on θ(1),…,θ(s−2).
Let θ(s)=(θ(s)1,…,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
Let θ(0) be an initial value used to start the chain (should not be sensitive).
MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s−1), but not on θ(1),…,θ(s−2).
This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.
Let θ(s)=(θ(s)1,…,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
Let θ(0) be an initial value used to start the chain (should not be sensitive).
MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s−1), but not on θ(1),…,θ(s−2).
This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.
The theory of Markov Chains (structure, convergence, reversibility, detailed balance, stationarity, etc) is well beyond the scope of this course so we will not dive into it.
Let θ(s)=(θ(s)1,…,θ(s)p) denote the value of the p×1 vector of parameters at iteration s.
Let θ(0) be an initial value used to start the chain (should not be sensitive).
MCMC generates θ(s) from a distribution that depends on the data and potentially on θ(s−1), but not on θ(1),…,θ(s−2).
This results in a Markov chain with stationary distribution π(θ|Y) under some conditions on the sampling distribution.
The theory of Markov Chains (structure, convergence, reversibility, detailed balance, stationarity, etc) is well beyond the scope of this course so we will not dive into it.
If you are interested, consider taking courses on stochastic process.
Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.
The chain will wander around our parameter space, only remembering where it had been in the last draw.
Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.
The chain will wander around our parameter space, only remembering where it had been in the last draw.
We want to have our MCMC sample size, S, big enough so that we can
Move out of areas of low probability into regions of high probability (convergence)
Move between high probability regions (good mixing)
Know our Markov chain is stationary in time (the distribution of samples is the same for all samples, regardless of location in the chain)
Note: Our Markov chain is a collection of draws of θ that are (slightly we hope!) dependent on the previous draw.
The chain will wander around our parameter space, only remembering where it had been in the last draw.
We want to have our MCMC sample size, S, big enough so that we can
Move out of areas of low probability into regions of high probability (convergence)
Move between high probability regions (good mixing)
Know our Markov chain is stationary in time (the distribution of samples is the same for all samples, regardless of location in the chain)
At the start of the sampling, the samples are not from the posterior distribution. It is necessary to discard the initial samples as a burn-in to allow convergence. We'll talk more about that in the next class.
The most commonly used MCMC algorithms are:
Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174).
The most commonly used MCMC algorithms are:
Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174). the first two
The most commonly used MCMC algorithms are:
Overview of Gibbs - Casella & George (1992, The American Statistician, 46, 167-174). the first two
Overview of MH - Chib & Greenberg (1995, The American Statistician).
We will get to Metropolis and Metropolis-Hastings later in the course.
Consider
(θ1θ2)∼N[(00),(1ρρ1)]
where ρ is known (and is the correlation between θ1 and θ2).
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |