Monte Carlo integration is very key for Bayesian computation and using simulations in general.
While we will focus on using Monte Carlo integration for Bayesian inference, the development is general and applies to any pdf/pmf p(θ).
Monte Carlo integration is very key for Bayesian computation and using simulations in general.
While we will focus on using Monte Carlo integration for Bayesian inference, the development is general and applies to any pdf/pmf p(θ).
For our purposes, we will want to evaluate expectations of the form
H=∫h(θ)⋅p(θ)dθ,
for many different functions h(.) (usually scalar for us).
ˉh=1mm∑i=1h(θi).
Procedure:
ˉh=1mm∑i=1h(θi).
In this course, p(θ) would often be the posterior distribution π(θ|y).
We have E[h(θi)]=H.
Assuming E[h2(θi)]<∞, so that the variance of each h(θi) is finite, we have
We have E[h(θi)]=H.
Assuming E[h2(θi)]<∞, so that the variance of each h(θi) is finite, we have
CLT: ˉh−H is is asymptotically normal, with asymptotic variance
1m∫(h(θ)−H)2p(θ)dθ,
which can be approximated by
vm=1m2m∑i=1(h(θi)−ˉh)2.
We have E[h(θi)]=H.
Assuming E[h2(θi)]<∞, so that the variance of each h(θi) is finite, we have
CLT: ˉh−H is is asymptotically normal, with asymptotic variance
1m∫(h(θ)−H)2p(θ)dθ,
which can be approximated by
vm=1m2m∑i=1(h(θi)−ˉh)2.
√vm is often called the Monte Carlo standard error.
That is, generally, taking large Monte Carlo sample sizes m (in the thousands or tens of thousands) can yield very precise, and cheaply computed, numerical approximations to mathematically difficult integrals.
What this means for us: we can approximate just about any aspect of the posterior distribution with a large enough Monte Carlo sample.
For samples θ1,…,θm drawn iid from π(θ|y), as m→∞, we have
ˉθ=1mm∑i=1θi→E[θ|y]
ˆσθ=1m−1m∑i=1(θi−ˉθ)2→V[θ|y]
For samples θ1,…,θm drawn iid from π(θ|y), as m→∞, we have
ˉθ=1mm∑i=1θi→E[θ|y]
ˆσθ=1m−1m∑i=1(θi−ˉθ)2→V[θ|y]
1mm∑i=11[θi≤c]=#θi≤cm→Pr
For samples \theta_1, \ldots, \theta_m drawn iid from \pi(\theta|y), as m \rightarrow \infty, we have
\bar{\theta} = \dfrac{1}{m} \sum\limits_{i=1}^m \theta_i \rightarrow \mathbb{E}[\theta | y]
\hat{\sigma}_{\theta} = \dfrac{1}{m-1} \sum\limits_{i=1}^m (\theta_i - \bar{\theta})^2 \rightarrow \mathbb{V}[\theta | y]
\dfrac{1}{m} \sum\limits_{i=1}^m \mathbb{1}[\theta_i \leq c] = \dfrac{\# \theta_i \leq c}{m} \rightarrow \Pr[\theta \leq c| y]
[\dfrac{\alpha}{2}\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m), (1-\dfrac{\alpha}{2})\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m)] \rightarrow 100 \times (1-\alpha)% quantile-based credible interval.
Suppose we randomly sample two "new" women, one with degree and one without.
To what extent do we expect the one without the degree to have more kids than the other, e.g. \tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}?
Suppose we randomly sample two "new" women, one with degree and one without.
To what extent do we expect the one without the degree to have more kids than the other, e.g. \tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}?
Using R,
set.seed(01222020)a <- 2; b <- 1; #priorn1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #datay1_pred <- rnbinom(100000,size=(a+sumy1),mu=(a+sumy1)/(b+n1))y2_pred <- rnbinom(10000,size=(a+sumy2),mu=(a+sumy2)/(b+n2))mean(y1_pred > y2_pred)
## [1] 0.48218
mean(y1_pred == y2_pred)
## [1] 0.21842
That is, \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48 and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22.
Notice that strong evidence of difference between two populations does not really imply the difference in predictions is large.
This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.
Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.
This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.
Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.
We can do so indirectly using the following Monte Carlo procedure:
\begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}
This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.
Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.
We can do so indirectly using the following Monte Carlo procedure:
\begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}
The sequence \{(\theta, y_{n+1})^{(1)}, \ldots, (\theta, y_{n+1})^{(m)}\} constitutes m independent samples from the joint posterior of (\theta, Y_{n+1}).
This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.
Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.
We can do so indirectly using the following Monte Carlo procedure:
\begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}
The sequence \{(\theta, y_{n+1})^{(1)}, \ldots, (\theta, y_{n+1})^{(m)}\} constitutes m independent samples from the joint posterior of (\theta, Y_{n+1}).
In fact, \{ y_{n+1}^{(1)}, \ldots, y_{n+1}^{(m)}\} are independent draws from the posterior predictive distribution we care about.
Let's re-compute \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) using this method.
Using R,
set.seed(01222020)a <- 2; b <- 1; #priorn1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #datatheta1_pred <- rgamma(10000,219,112); theta2_pred <- rgamma(10000,68,45)y1_pred <- rpois(10000,theta1_pred); y2_pred <- rpois(10000,theta2_pred)mean(y1_pred > y2_pred)
## [1] 0.4765
mean(y1_pred == y2_pred)
## [1] 0.2167
Again, \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48 and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |