Due: 11:59pm, July 14, 2020
You all should have R and RStudio installed on your computers by now. If you do not, first install the latest version of R here: https://cran.rstudio.com (remember to select the right installer for your operating system). Next, install the latest version of RStudio here: https://www.rstudio.com/products/rstudio/download/. Scroll down to the “Installers for Supported Platforms” section and find the right installer for your operating system.
You are required to use R Markdown to type up this lab report. If you do not already know how to use R markdown, here is a very basic R Markdown template: https://sta-360-602l-su20.github.io/Course-Website/labs/resources/LabReport.Rmd. Refer to the resources tab of the course website (here: https://sta-360-602l-su20.github.io/Course-Website/resources/ ) for links to help you learn how to use R markdown.
You MUST submit both your .Rmd and .pdf files to the course site on Gradescope here: https://www.gradescope.com/courses/141314//assignments. Make sure to knit to pdf and not html; ask the TA about knitting to pdf if you cannot figure it out. Be sure to submit under the right assignment entry.
You will need the following R packages. If you do not already have them installed, please do so first using the install.packages
function.
So far we have only worked with the uniform density where the boundaries of the support are known. That is, so far, if \(X \sim \textrm{Unif}(a,b)\), then \(a\) and \(b\) are known. This lab will take you through the process of building a generative model using a conjugate prior for the uniform density on an interval \([0, \theta]\), with some parameter \(\theta\). The model we will encounter restricts, or truncates, the support of some of the parameters. Using these truncated distributions, we will write and implement a Gibbs sampling algorithm to infer the value of the boundary parameter \(\theta\) given noisy measurements from a uniform interval.
Truncated distributions are usually described by a density function accompanied with an indicator function, which modifies the support of the random variable. For instance, a random variable \(X\) with a uniform density function on an interval \([0, \theta]\) will have density function:
\[ p(x | \theta) = \frac{1}{\theta} \cdot \mathbb{I}[0 < x \leq \theta] \]
Let’s work through the conjuage update for the uniform likelihood to practice working with indicator functions.
As a function of \(\theta\) alone, the uniform likelihood is proportional to a Pareto density function. If \(\theta \sim \textrm{Pareto}(m,k)\), then \[ \pi(\theta | m,k) = \frac{k m^k}{\theta^{k + 1}} \cdot \mathbb{I}[\theta \geq m], \]
For hyperparameters \(m\) and \(k\).
The Pareto prior is conjugate for the uniform likelihood, so that the posterior distribution of \(\theta\) will also be Pareto. What is the form of this posterior distribution?
Since we assume an independent sampling mechanism, we can combine the indicator functions from the likelihood densities of the individual \(X_i\). The product of indicator functions then becomes a single indicator function operating on a combination of the \(X_i\). What does this indicator function look like? What is the posterior support of \(\theta\)?
\[ \begin{aligned} \pi(\theta | X_1, \dots, X_n) &\propto p(x_1, \dots, x_n | \theta) \cdot \pi(\theta) \\ & \propto \left(\prod^n_{i=1}\frac{1}{\theta} \mathbb{I}[0 < x_i \leq \theta] \right) \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[\theta \geq m] \\ & \propto \frac{1}{\theta^n} \mathbb{I}[0 < \textrm{all elements of } \{x_1,\ldots,x_n\} \leq \theta] \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[\theta \geq m] \\ & \propto \frac{1}{\theta^n} \mathbb{I}[0 < \max(x_1, \dots, x_n) \leq \theta] \cdot \frac{1}{\theta^{k + 1}} \mathbb{I}[m \leq \theta] \\ &= \frac{1}{\theta^{k + n + 1}} \cdot \mathbb{I}[0 < \max(x_1, \dots, x_n, m) \leq \theta] \\ &\propto \text{Pareto}\bigg(\max(x_1, \dots, x_n, m), k + n \bigg) \end{aligned} \]
Suppose that we obtain “noisy” measurements of points sampled uniformly from a circular area of unknown radius. Our goal will be to infer the radius of the circular area, \(R\).
Suppose further that we know in advance:
The second and third items on the list above tells us that if we observe data drawn uniformly from the surface of a circle with radius \(R\), each point’s squared distance from the center of the circle will be uniformly distributed on \((0, R)\). Given these facts, let’s think about how we might do inference on \(R\).
It is often helpful to have a picture of what’s going on when we describe a generative model. Here we can simulate and plot our data, overlaying the true radius \(R\) on the sampled points \(r_1, \dots, r_n\):
#
n <- 1000
true_Rsquared <- 5
true_sigma <- 1.25
#
u <- runif(n, 0, true_Rsquared)
r <- u + rnorm(n, sd = true_sigma)
theta <- runif(n, 0, 2*pi)
#
ggplot2::ggplot() +
geom_point(data = data.frame(x = sign(r)*sqrt(abs(r))*cos(theta), y = sign(r)*sqrt(abs(r))*sin(theta)),
aes(x = x, y = y), shape = 1) +
geom_path(data = data.frame(R = true_Rsquared) %>%
plyr::ddply(.(R), function(d){
data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
}),
aes(x = x, y = y), alpha = 1, colour = "red") +
coord_fixed()
Let \(r_i\) denote the noisy squared radius of observed data point \(i\). Given the noiseless squared radial positions of the points, we assume that the observations \(r_i\) are normally distributed, each with mean \(u_i\) and variance \(\sigma^2\). A model for the data generating process might look something like this: \[ \begin{aligned} r_i | \mu_i, \sigma^2 &\sim N(u_i, \sigma^2) \\ u_i | R^2 &\overset{iid}\sim U(0, R^2), \\ \end{aligned} \] with priors: \[ \begin{aligned} R^2 | m,k &\sim Pa(m, k) \\ 1 / \sigma^2 | \alpha, \beta &\sim Ga(\alpha, \beta) \end{aligned} \]
Here we have used the conjugate prior, the Pareto distribution, for the uniform likelihood. We have also used the conjugate Gamma prior for the precision parameter.
Write \(r = (r_1, \ldots, r_n)\). The generative model implies that the likelihood takes the form
\[ \begin{aligned} L(r ; u, \sigma^2) &= \left(\frac{1}{2 \pi \sigma^2}\right)^{n/2} \textrm{exp} \left\{-\frac{1}{2 \sigma^2}\sum_{i=1}^n (r_i - u_i)^2\right\} \end{aligned} \]
Which parameters are independent of one another? How does this inform the way we can decompose the joint distribution of the parameters and the data? What does this decomposition imply for a Gibbs sampling algorithm that produces draws from the posterior distribution of \(R^2\)?
\[ \begin{aligned} p(r, u, 1/\sigma^2, R^2 | m, k, \alpha, \beta) &= p(r | u, 1/\sigma^2, R^2, m, k) \cdot p(u | R^2) \cdot p(R^2 | m, k) \cdot p(1/\sigma^2 | \alpha, \beta) \end{aligned} \]
In this case, we can explicitly write out the full conditional distributions for the model parameters \(u, \theta\) and \(1/\sigma^2\). What are they? Make sure to keep track of indicator functions that may truncate the support of a density function.
\[ u_i | r, \sigma^2, R^2 \sim \text{Truncated Normal}\left(r_i, \sigma^2, 0, R^2 \right) \]
\[ 1 / \sigma^2 | r, u, R^2 \sim Ga\left(n/2 + \alpha, \frac{1}{2}\sum_{i=1}^n (r_i - u_i)^2 + \beta \right) \]
\[ R^2 | u, r, \sigma^2 \sim Pa\left(\max(u_1, \dots, u_n, m), k + n \right) \]
If you are seeing the truncated normal distribution for the first time, take a few minutes to read https://en.wikipedia.org/wiki/Truncated_normal_distribution.
Now we are ready to write a Gibbs sampler using truncated full conditional distributions to make inferences about the radius parameter \(R\).
# hyper-parameters
m <- 3
k <- 1
alpha <- 5/2
beta <- 5/2
#
rpareto <- function(m, k, trunc = NULL){
p <- m*(1 - runif(1))^(-1/k)
if(!is.null(trunc)){
while(p > trunc){
p <- m*(1 - runif(1))^(-1/k)
}
}
return(p)
}
#
uni_pareto_gibbs <- function(S, r, m, k, alpha, beta, burn_in = min(1000, S / 2), thin = 5){
# Reparametrize X matrix to squared radius values
Rsq <- r
n <- length(Rsq)
R <- rep(1, S)
U <- matrix(0, nrow = S, ncol = n)
U[1, ] <- runif(n, 0, R)
sigma <- rep(1, S)
#
U_curr <- U[1, ]
R_curr <- R[1]
sigma_curr <- sigma[1]
for(s in 1:S){
# Sample from full conditional of the inner radius
R_curr <- rpareto(max(c(U_curr, m)), k + n)
R[s] <- R_curr
# Sample from full conditional of U values
U_curr <- truncnorm::rtruncnorm(n, a = 0, b = R_curr, mean = Rsq, sd = sigma_curr)
U[s, ] <- U_curr
# Sample from full conditional of sigma
sigma_curr <- #complete this line
sigma[s] <- sigma_curr
}
return(list(R = R[seq(burn_in, S, by = thin)],
U = U[seq(burn_in, S, by = thin), ],
sigma = sigma[seq(burn_in, S, by = thin)]))
}
#
gibbs_samps <- uni_pareto_gibbs(S = 100000, r, m, k, alpha, beta, burn_in=2000)
Let’s visualize our samples along with the data and the true radius \(R\):
ggplot2::ggplot() +
geom_point(data = data.frame(x = sign(r)*sqrt(abs(r))*cos(theta), y = sign(r)*sqrt(abs(r))*sin(theta)),
aes(x = x, y = y), shape = 1) +
geom_path(data = data.frame(R = gibbs_samps$R) %>%
plyr::ddply(.(R), function(d){
data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
}),
aes(x = x, y = y), alpha = 0.005, colour = "blue") +
geom_path(data = data.frame(R = true_Rsquared) %>%
plyr::ddply(.(R), function(d){
data.frame(x = sqrt(d$R)*cos(seq(0, 2*pi, length.out = 100)),
y = sqrt(d$R)*sin(seq(0, 2*pi, length.out = 100)))
}),
aes(x = x, y = y), alpha = 1, colour = "red") +
coord_fixed()
cowplot
package if you want) to visualize the bivariate posterior density of \(R^2\) and \(\sigma^2\). Indicate the truth on the plot. Comment on this bivariate plot.10 points: 4 points for question 1; 2 points each for questions 2,3 and 4.
This lab was created by Jordan Bryan and Becky Tang.