Processing math: 54%
+ - 0:00:00
Notes for current slide
Notes for next slide

STA 360/602L: Module 3.1

Monte Carlo approximation and sampling

Dr. Olanrewaju Michael Akande

1 / 11

Monte Carlo approximation

  • Monte Carlo integration is very key for Bayesian computation and using simulations in general.
2 / 11

Monte Carlo approximation

  • Monte Carlo integration is very key for Bayesian computation and using simulations in general.

  • While we will focus on using Monte Carlo integration for Bayesian inference, the development is general and applies to any pdf/pmf p(θ).

2 / 11

Monte Carlo approximation

  • Monte Carlo integration is very key for Bayesian computation and using simulations in general.

  • While we will focus on using Monte Carlo integration for Bayesian inference, the development is general and applies to any pdf/pmf p(θ).

  • For our purposes, we will want to evaluate expectations of the form

    H=h(θ)p(θ)dθ,

    for many different functions h(.) (usually scalar for us).

2 / 11

Monte Carlo approximation

  • Procedure:
    1. Generate a random sample θ1,,θmindp(θ).
3 / 11

Monte Carlo approximation

  • Procedure:
    1. Generate a random sample θ1,,θmindp(θ).
    2. Estimate H using

      ˉh=1mmi=1h(θi).

3 / 11

Monte Carlo approximation

  • Procedure:

    1. Generate a random sample θ1,,θmindp(θ).
    2. Estimate H using

      ˉh=1mmi=1h(θi).

  • In this course, p(θ) would often be the posterior distribution π(θ|y).

3 / 11

Monte Carlo approximation

  • We have E[h(θi)]=H.
4 / 11

Monte Carlo approximation

  • We have E[h(θi)]=H.

  • Assuming E[h2(θi)]<, so that the variance of each h(θi) is finite, we have

    1. LLN: ˉha.s.H.
4 / 11

Monte Carlo approximation

  • We have E[h(θi)]=H.

  • Assuming E[h2(θi)]<, so that the variance of each h(θi) is finite, we have

    1. LLN: ˉha.s.H.
    2. CLT: ˉhH is is asymptotically normal, with asymptotic variance

      1m(h(θ)H)2p(θ)dθ,

      which can be approximated by

      vm=1m2mi=1(h(θi)ˉh)2.

4 / 11

Monte Carlo approximation

  • We have E[h(θi)]=H.

  • Assuming E[h2(θi)]<, so that the variance of each h(θi) is finite, we have

    1. LLN: ˉha.s.H.
    2. CLT: ˉhH is is asymptotically normal, with asymptotic variance

      1m(h(θ)H)2p(θ)dθ,

      which can be approximated by

      vm=1m2mi=1(h(θi)ˉh)2.

  • vm is often called the Monte Carlo standard error.

4 / 11

Monte Carlo approximation

  • That is, generally, taking large Monte Carlo sample sizes m (in the thousands or tens of thousands) can yield very precise, and cheaply computed, numerical approximations to mathematically difficult integrals.
5 / 11

Monte Carlo approximation

  • That is, generally, taking large Monte Carlo sample sizes m (in the thousands or tens of thousands) can yield very precise, and cheaply computed, numerical approximations to mathematically difficult integrals.

  • What this means for us: we can approximate just about any aspect of the posterior distribution with a large enough Monte Carlo sample.

5 / 11

Monte Carlo approximation

  • For samples θ1,,θm drawn iid from π(θ|y), as m, we have
6 / 11

Monte Carlo approximation

  • For samples θ1,,θm drawn iid from π(θ|y), as m, we have
    • ˉθ=1mmi=1θiE[θ|y]
6 / 11

Monte Carlo approximation

  • For samples θ1,,θm drawn iid from π(θ|y), as m, we have

    • ˉθ=1mmi=1θiE[θ|y]

    • ˆσθ=1m1mi=1(θiˉθ)2V[θ|y]

6 / 11

Monte Carlo approximation

  • For samples θ1,,θm drawn iid from π(θ|y), as m, we have

    • ˉθ=1mmi=1θiE[θ|y]

    • ˆσθ=1m1mi=1(θiˉθ)2V[θ|y]

    • 1mmi=11[θic]=#θicmPr

6 / 11

Monte Carlo approximation

  • For samples \theta_1, \ldots, \theta_m drawn iid from \pi(\theta|y), as m \rightarrow \infty, we have

    • \bar{\theta} = \dfrac{1}{m} \sum\limits_{i=1}^m \theta_i \rightarrow \mathbb{E}[\theta | y]

    • \hat{\sigma}_{\theta} = \dfrac{1}{m-1} \sum\limits_{i=1}^m (\theta_i - \bar{\theta})^2 \rightarrow \mathbb{V}[\theta | y]

    • \dfrac{1}{m} \sum\limits_{i=1}^m \mathbb{1}[\theta_i \leq c] = \dfrac{\# \theta_i \leq c}{m} \rightarrow \Pr[\theta \leq c| y]

    • [\dfrac{\alpha}{2}\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m), (1-\dfrac{\alpha}{2})\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m)] \rightarrow 100 \times (1-\alpha)% quantile-based credible interval.

6 / 11

Back to birth rates

  • Suppose we randomly sample two "new" women, one with degree and one without.
7 / 11

Back to birth rates

  • Suppose we randomly sample two "new" women, one with degree and one without.

  • To what extent do we expect the one without the degree to have more kids than the other, e.g. \tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}?

7 / 11

Back to birth rates

  • Suppose we randomly sample two "new" women, one with degree and one without.

  • To what extent do we expect the one without the degree to have more kids than the other, e.g. \tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}?

  • Using R,

    set.seed(01222020)
    a <- 2; b <- 1; #prior
    n1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #data
    y1_pred <- rnbinom(100000,size=(a+sumy1),mu=(a+sumy1)/(b+n1))
    y2_pred <- rnbinom(10000,size=(a+sumy2),mu=(a+sumy2)/(b+n2))
    mean(y1_pred > y2_pred)
    ## [1] 0.48218
    mean(y1_pred == y2_pred)
    ## [1] 0.21842
7 / 11

Back to birth rates

  • That is, \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48 and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22.
8 / 11

Back to birth rates

  • That is, \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48 and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22.

  • Notice that strong evidence of difference between two populations does not really imply the difference in predictions is large.

8 / 11

Monte Carlo approximation

  • This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.
9 / 11

Monte Carlo approximation

  • This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.

  • Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.

9 / 11

Monte Carlo approximation

  • This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.

  • Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.

  • We can do so indirectly using the following Monte Carlo procedure:

    \begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}

9 / 11

Monte Carlo approximation

  • This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.

  • Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.

  • We can do so indirectly using the following Monte Carlo procedure:

    \begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}

  • The sequence \{(\theta, y_{n+1})^{(1)}, \ldots, (\theta, y_{n+1})^{(m)}\} constitutes m independent samples from the joint posterior of (\theta, Y_{n+1}).

9 / 11

Monte Carlo approximation

  • This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.

  • Quite often, we are able to sample from p(y_i| \theta) and \pi(\theta | \{y_i\}) but not from p(y_{n+1}|y_{1:n}) directly.

  • We can do so indirectly using the following Monte Carlo procedure:

    \begin{split} \textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\ \textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\ & \ \ \vdots \hspace{133pt} \vdots \\ \textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\ \end{split}

  • The sequence \{(\theta, y_{n+1})^{(1)}, \ldots, (\theta, y_{n+1})^{(m)}\} constitutes m independent samples from the joint posterior of (\theta, Y_{n+1}).

  • In fact, \{ y_{n+1}^{(1)}, \ldots, y_{n+1}^{(m)}\} are independent draws from the posterior predictive distribution we care about.

9 / 11

Back to birth rates

  • Let's re-compute \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) using this method.
10 / 11

Back to birth rates

  • Let's re-compute \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) using this method.

  • Using R,

    set.seed(01222020)
    a <- 2; b <- 1; #prior
    n1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #data
    theta1_pred <- rgamma(10000,219,112); theta2_pred <- rgamma(10000,68,45)
    y1_pred <- rpois(10000,theta1_pred); y2_pred <- rpois(10000,theta2_pred)
    mean(y1_pred > y2_pred)
    ## [1] 0.4765
    mean(y1_pred == y2_pred)
    ## [1] 0.2167
  • Again, \Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48 and \Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22.

10 / 11

What's next?

Move on to the readings for the next module!

11 / 11

Monte Carlo approximation

  • Monte Carlo integration is very key for Bayesian computation and using simulations in general.
2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow