STA 360/602L: Module 3.1

class: center, middle, inverse, title-slide

# STA 360/602L: Module 3.1
## Monte Carlo approximation and sampling
### Dr. Olanrewaju Michael Akande

---

## Monte Carlo approximation

- Monte Carlo integration is very key for Bayesian computation and using simulations in general.

- While we will focus on using Monte Carlo integration for Bayesian inference, the development is general and applies to any pdf/pmf `$p(\theta)$`.

- For our purposes, we will want to evaluate expectations of the form
.block[
`$$H = \int h(\theta) \cdot p(\theta) d\theta,$$`
]

for many different functions `$h(.)$` (usually scalar for us).

---
## Monte Carlo approximation

- Procedure:
  1. Generate a random sample `$\theta_1, \ldots, \theta_m \overset{ind}{\sim} p(\theta)$`.
  
--
  2. Estimate `$H$` using
.block[
`$$\bar{h} = \dfrac{1}{m} \sum_{i=1}^m h(\theta_i).$$`
]

- In this course, `$p(\theta)$` would often be the posterior distribution `$\pi(\theta|y)$`.

---
## Monte Carlo approximation

- We have `$\mathbb{E}[h(\theta_i)] = H$`.

- Assuming `$\mathbb{E}[h^2(\theta_i)] < \infty$`, so that the variance of each `$h(\theta_i)$` is finite, we have

1. .hlight[LLN]: `$\bar{h} \overset{a.s.}{\rightarrow} H$`.
  
--
  2. .hlight[CLT]: `$\bar{h} - H$` is is asymptotically normal, with asymptotic variance
.block[
`$$\dfrac{1}{m} \int (h(\theta)-H)^2 p(\theta) d\theta,$$`
]

which can be approximated by
.block[
`$$v_m = \dfrac{1}{m^2} \sum_{i=1}^m (h(\theta_i)-\bar{h})^2.$$`
]

- `$\sqrt{v_m}$` is often called the .hlight[Monte Carlo standard error].

---
## Monte Carlo approximation

- That is, generally, taking large Monte Carlo sample sizes `$m$` (in the thousands or tens of thousands) can yield very precise, and cheaply computed, numerical approximations to mathematically difficult integrals.

- .hlight[What this means for us]: we can approximate just about any aspect of the posterior distribution with a large enough Monte Carlo sample.

---
## Monte Carlo approximation

- For samples `$\theta_1, \ldots, \theta_m$` drawn iid from `$\pi(\theta|y)$`, as `$m \rightarrow \infty$`, we have

--
  + `$\bar{\theta} = \dfrac{1}{m} \sum\limits_{i=1}^m \theta_i \rightarrow \mathbb{E}[\theta | y]$`
  
--

+ `$\hat{\sigma}_{\theta} = \dfrac{1}{m-1} \sum\limits_{i=1}^m (\theta_i - \bar{\theta})^2 \rightarrow \mathbb{V}[\theta | y]$`

+ `$\dfrac{1}{m} \sum\limits_{i=1}^m \mathbb{1}[\theta_i \leq c] = \dfrac{\# \theta_i \leq c}{m} \rightarrow \Pr[\theta \leq c| y]$`
  
--

+ `$[\dfrac{\alpha}{2}\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m), (1-\dfrac{\alpha}{2})\textrm{th} \ \textrm{percentile of } (\theta_1, \ldots, \theta_m)]$` `$\rightarrow$` `$100 \times (1-\alpha)%$` quantile-based credible interval.

---
## Back to birth rates

- Suppose we randomly sample two "new" women, one with degree and one without.

- To what extent do we expect the one without the degree to have more kids than the other, e.g. `$\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}$`?

- Using R, 
    
    ```r
    set.seed(01222020)
    a <- 2; b <- 1; #prior
    n1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #data
    y1_pred <- rnbinom(100000,size=(a+sumy1),mu=(a+sumy1)/(b+n1))
    y2_pred <- rnbinom(10000,size=(a+sumy2),mu=(a+sumy2)/(b+n2))
    mean(y1_pred > y2_pred)
    ```
    
    ```
    ## [1] 0.48218
    ```
    
    ```r
    mean(y1_pred == y2_pred)
    ```
    
    ```
    ## [1] 0.21842
    ```

---
## Back to birth rates

- That is, `$\Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48$` and `$\Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22$`.

- Notice that strong evidence of difference between two populations does not really imply the difference in predictions is large.

---
## Monte Carlo approximation

- This general idea of using samples to "approximate" averages (expectations) is also useful when trying to approximate posterior predictive distributions.

- Quite often, we are able to sample from `$p(y_i| \theta)$` and `$\pi(\theta | \{y_i\})$` but not from `$p(y_{n+1}|y_{1:n})$` directly.

- We can do so indirectly using the following Monte Carlo procedure:
.block[
.small[
$$
`\begin{split}
\textrm{sample } \theta^{(1)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(1)} \sim f(y_{n+1}| \theta^{(1)})\\
\textrm{sample } \theta^{(2)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(2)} \sim f(y_{n+1}| \theta^{(2)})\\
& \ \ \vdots \hspace{133pt} \vdots \\
\textrm{sample } \theta^{(m)} & \sim \pi(\theta | \{y_i\}), \ \textrm{ then sample } y_{n+1}^{(m)} \sim f(y_{n+1}| \theta^{(m)}).\\
\end{split}`
$$
]
]

- The sequence `$\{(\theta, y_{n+1})^{(1)}, \ldots, (\theta, y_{n+1})^{(m)}\}$` constitutes `$m$` independent samples from the joint posterior of `$(\theta, Y_{n+1})$`.

- In fact, `$\{ y_{n+1}^{(1)}, \ldots, y_{n+1}^{(m)}\}$` are independent draws from the posterior predictive distribution we care about.

---
## Back to birth rates

- Let's re-compute `$\Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2})$` and `$\Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2})$` using this method.

- Using R, 
    
    ```r
    set.seed(01222020)
    a <- 2; b <- 1; #prior
    n1 <- 111; sumy1 <- 217; n2 <- 44; sumy2 <- 66 #data
    theta1_pred <- rgamma(10000,219,112); theta2_pred <- rgamma(10000,68,45)
    y1_pred <- rpois(10000,theta1_pred); y2_pred <- rpois(10000,theta2_pred)
    mean(y1_pred > y2_pred)
    ```
    
    ```
    ## [1] 0.4765
    ```
    
    ```r
    mean(y1_pred == y2_pred)
    ```
    
    ```
    ## [1] 0.2167
    ```

- Again, `$\Pr(\tilde{y}_1 > \tilde{y}_2 | y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.48$` and `$\Pr(\tilde{y}_1 = \tilde{y}_2| y_{11},\ldots,y_{1n_1},y_{21},\ldots,y_{2n_2}) \approx 0.22$`.

---

class: center, middle

# What's next?

### Move on to the readings for the next module!