As we've seen so far, Bayesian inference is based on posterior distributions, that is,
π(θ|y)=π(θ)⋅p(y|θ)∫Θπ(˜θ)⋅p(y|˜θ)d˜θ=π(θ)⋅L(θ|y)L(y),
where y=(y1,…,yn).
As we've seen so far, Bayesian inference is based on posterior distributions, that is,
π(θ|y)=π(θ)⋅p(y|θ)∫Θπ(˜θ)⋅p(y|˜θ)d˜θ=π(θ)⋅L(θ|y)L(y),
where y=(y1,…,yn).
Good news: we have the numerator in this expression.
As we've seen so far, Bayesian inference is based on posterior distributions, that is,
π(θ|y)=π(θ)⋅p(y|θ)∫Θπ(˜θ)⋅p(y|˜θ)d˜θ=π(θ)⋅L(θ|y)L(y),
where y=(y1,…,yn).
Good news: we have the numerator in this expression.
As we've seen so far, Bayesian inference is based on posterior distributions, that is,
π(θ|y)=π(θ)⋅p(y|θ)∫Θπ(˜θ)⋅p(y|˜θ)d˜θ=π(θ)⋅L(θ|y)L(y),
where y=(y1,…,yn).
Good news: we have the numerator in this expression.
Bad news: the denominator is typically not available (may involve high dimensional integral)!
How have we been getting by? Conjugacy! For conjugate priors, the posterior distribution of θ is available analytically.
As we've seen so far, Bayesian inference is based on posterior distributions, that is,
π(θ|y)=π(θ)⋅p(y|θ)∫Θπ(˜θ)⋅p(y|˜θ)d˜θ=π(θ)⋅L(θ|y)L(y),
where y=(y1,…,yn).
Good news: we have the numerator in this expression.
Bad news: the denominator is typically not available (may involve high dimensional integral)!
How have we been getting by? Conjugacy! For conjugate priors, the posterior distribution of θ is available analytically.
What if a conjugate prior does not represent our prior information well, or we have a more complex model, and our posterior is no longer in a convenient distributional form?
Prior | Likelihood | Posterior |
---|---|---|
beta | binomial | beta |
gamma | Poisson | gamma |
gamma | exponential | gamma |
normal-gamma | normal | normal-gamma |
beta | negative-binomial | beta |
beta | geometric | beta |
Prior | Likelihood | Posterior |
---|---|---|
beta | binomial | beta |
gamma | Poisson | gamma |
gamma | exponential | gamma |
normal-gamma | normal | normal-gamma |
beta | negative-binomial | beta |
beta | geometric | beta |
Prior | Likelihood | Posterior |
---|---|---|
beta | binomial | beta |
gamma | Poisson | gamma |
gamma | exponential | gamma |
normal-gamma | normal | normal-gamma |
beta | negative-binomial | beta |
beta | geometric | beta |
There are a few more we have not covered yet, for example, the Dirichlet-multinomial model.
However, clearly, we cannot restrict ourselves to conjugate models only.
π(μ|τ)=N(μ0,1κ0τ).π(τ) =Gamma(ν02,ν02τ0)
For example, for conjugacy in the normal model, we had
π(μ|τ)=N(μ0,1κ0τ).π(τ) =Gamma(ν02,ν02τ0)
Suppose we instead wish to specify our uncertainty about μ as independent of τ, that is, we want π(μ,τ)=π(μ)π(τ). For example,
π(μ)=N(μ0,σ20).π(τ) =Gamma(ν02,ν02τ0).
For example, for conjugacy in the normal model, we had
π(μ|τ)=N(μ0,1κ0τ).π(τ) =Gamma(ν02,ν02τ0)
Suppose we instead wish to specify our uncertainty about μ as independent of τ, that is, we want π(μ,τ)=π(μ)π(τ). For example,
π(μ)=N(μ0,σ20).π(τ) =Gamma(ν02,ν02τ0).
When σ20 is not proportional to 1τ, the marginal density of τ is not a gamma density (or a density we can easily sample from).
For example, for conjugacy in the normal model, we had
π(μ|τ)=N(μ0,1κ0τ).π(τ) =Gamma(ν02,ν02τ0)
Suppose we instead wish to specify our uncertainty about μ as independent of τ, that is, we want π(μ,τ)=π(μ)π(τ). For example,
π(μ)=N(μ0,σ20).π(τ) =Gamma(ν02,ν02τ0).
When σ20 is not proportional to 1τ, the marginal density of τ is not a gamma density (or a density we can easily sample from).
Side note: for conjugacy, the joint posterior should also be a product of two independent Normal and Gamma densities in μ and τ respectively.
In general, conjugate priors are not available for generalized linear models (GLMs) other than the normal linear model.
One can potentially rely on an asymptotic normal approximation.
In general, conjugate priors are not available for generalized linear models (GLMs) other than the normal linear model.
One can potentially rely on an asymptotic normal approximation.
As n→∞, the posterior distribution is normal centered on MLE.
In general, conjugate priors are not available for generalized linear models (GLMs) other than the normal linear model.
One can potentially rely on an asymptotic normal approximation.
As n→∞, the posterior distribution is normal centered on MLE.
However, even for moderate sample sizes, asymptotic approximations may be inaccurate.
In general, conjugate priors are not available for generalized linear models (GLMs) other than the normal linear model.
One can potentially rely on an asymptotic normal approximation.
As n→∞, the posterior distribution is normal centered on MLE.
However, even for moderate sample sizes, asymptotic approximations may be inaccurate.
In logistic regression for example, for rare outcomes or rare binary exposures, posterior can be highly skewed.
In general, conjugate priors are not available for generalized linear models (GLMs) other than the normal linear model.
One can potentially rely on an asymptotic normal approximation.
As n→∞, the posterior distribution is normal centered on MLE.
However, even for moderate sample sizes, asymptotic approximations may be inaccurate.
In logistic regression for example, for rare outcomes or rare binary exposures, posterior can be highly skewed.
It is appealing to avoid any reliance on large sample assumptions and base inferences on exact posterior.
Even though we may not be able to sample from the marginal posterior of a particular parameter when using a non-conjugate prior, sometimes, we may still be able to sample from conditional distributions of those parameters given all other parameters and the data.
These conditional distributions, known as full conditionals, will be very important for us.
Even though we may not be able to sample from the marginal posterior of a particular parameter when using a non-conjugate prior, sometimes, we may still be able to sample from conditional distributions of those parameters given all other parameters and the data.
These conditional distributions, known as full conditionals, will be very important for us.
In our normal example with
μ∼N(μ0,σ20).τ ∼Gamma(ν02,ν02τ0),
turns out we will not be able sample easily from τ|Y,
Even though we may not be able to sample from the marginal posterior of a particular parameter when using a non-conjugate prior, sometimes, we may still be able to sample from conditional distributions of those parameters given all other parameters and the data.
These conditional distributions, known as full conditionals, will be very important for us.
In our normal example with
μ∼N(μ0,σ20).τ ∼Gamma(ν02,ν02τ0),
turns out we will not be able sample easily from τ|Y,
However, as you will see, we will be able to sample from τ|μ,Y. That is the full conditional for τ.
Even though we may not be able to sample from the marginal posterior of a particular parameter when using a non-conjugate prior, sometimes, we may still be able to sample from conditional distributions of those parameters given all other parameters and the data.
These conditional distributions, known as full conditionals, will be very important for us.
In our normal example with
μ∼N(μ0,σ20).τ ∼Gamma(ν02,ν02τ0),
turns out we will not be able sample easily from τ|Y,
However, as you will see, we will be able to sample from τ|μ,Y. That is the full conditional for τ.
By the way, note that we already know the full conditional for μ, i.e., μ|τ,Y from previous modules.
Goal: try to take advantage of those full conditional distributions (without sampling directly from the marginal posteriors) to obtain samples from the said marginal posteriors.
In our example, with π(μ)=N(μ0,σ20), we have
μ|Y,τ∼N(μn,τ−1n),
Goal: try to take advantage of those full conditional distributions (without sampling directly from the marginal posteriors) to obtain samples from the said marginal posteriors.
In our example, with π(μ)=N(μ0,σ20), we have
μ|Y,τ∼N(μn,τ−1n),
Goal: try to take advantage of those full conditional distributions (without sampling directly from the marginal posteriors) to obtain samples from the said marginal posteriors.
In our example, with π(μ)=N(μ0,σ20), we have
μ|Y,τ∼N(μn,τ−1n),
Review results from previous modules on the normal model if you are not sure why this holds.
Goal: try to take advantage of those full conditional distributions (without sampling directly from the marginal posteriors) to obtain samples from the said marginal posteriors.
In our example, with π(μ)=N(μ0,σ20), we have
μ|Y,τ∼N(μn,τ−1n),
Review results from previous modules on the normal model if you are not sure why this holds.
Let's see if we can figure out the other full conditional τ|μ,Y.
\begin{split} p(\tau| \mu,Y) & \boldsymbol{=} \dfrac{\Pr[\tau,\mu,Y]}{\Pr[\mu,Y]} = \dfrac{p(y|\mu,\tau)\pi(\mu,\tau)}{p[\mu,y]}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ & \boldsymbol{=} \dfrac{p(y|\mu,\tau)\pi(\mu)\pi(\tau)}{p[\mu,y]}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ & \boldsymbol{\propto} p(y|\mu,\tau)\pi(\tau)\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ & \boldsymbol{\propto} \underbrace{\tau^{\dfrac{n}{2}} \ \textrm{exp}\left\{-\dfrac{1}{2} \tau \sum^n_{i=1} (y_i - \mu)^2 \right\}}_{\propto \ p(y|\mu,\tau)} \times \underbrace{\tau^{\dfrac{\nu_0}{2}-1} \textrm{exp}\left\{-\dfrac{\tau\nu_0}{2\tau_0}\right\}}_{\propto \ \pi(\tau)}\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ & \boldsymbol{=} \underbrace{\tau^{\dfrac{\nu_0 + n}{2} - 1} \ \textrm{exp}\left\{-\dfrac{1}{2} \tau \left[ \dfrac{\nu_0}{\tau_0} + \sum^n_{i=1} (y_i - \mu)^2 \right] \right\}}_{\textrm{Gamma Kernel}}. \end{split}
\begin{split} p(\tau| \mu,Y) & \boldsymbol{\propto} \underbrace{\tau^{\dfrac{\nu_0 + n}{2} - 1} \ \textrm{exp}\left\{-\dfrac{1}{2} \tau \left[ \dfrac{\nu_0}{\tau_0} + \sum^n_{i=1} (y_i - \mu)^2 \right] \right\}}_{\textrm{Gamma Kernel}}\\ & \boldsymbol{=} \textrm{Gamma}\left(\dfrac{\nu_n}{2}, \dfrac{\nu_n}{2 \tau_n(\mu)}\right) \ \ \ \textrm{OR} \ \ \ \textrm{Gamma}\left(\dfrac{\nu_n}{2}, \dfrac{\nu_n\sigma_n^2(\mu)}{2}\right), \end{split}
where
\begin{split} \nu_n & = \nu_0 + n\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ \sigma_n^2(\mu) & = \dfrac{1}{\nu_n} \left[ \dfrac{\nu_0}{\tau_0} + \sum^n_{i=1} (y_i - \mu)^2 \right] = \dfrac{1}{\nu_n}\left[ \dfrac{\nu_0}{\tau_0} + ns^2_n(\mu) \right]\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \textrm{OR} \ \ \tau_n(\mu) & = \dfrac{\nu_n}{\left[ \dfrac{\nu_0}{\tau_0} + \sum^n_{i=1} (y_i - \mu)^2 \right]} = \dfrac{\nu_n}{\left[ \dfrac{\nu_0}{\tau_0} + ns^2_n(\mu) \right]};\\ & \ \ \ \ \ \ \ \ \ \ \ \ \\ \textrm{with} \ \ \ s^2_n(\mu) & = \dfrac{1}{n} \sum^n_{i=1} (y_i - \mu)^2.\\ \end{split}
Now we have two full conditional distributions but what we really need is to sample from \pi(\tau | Y).
Actually, if we could sample from \pi(\mu, \tau| Y), we already know that the draws for \mu and \tau will be from the two marginal posterior distributions. So, we just need a scheme to sample from \pi(\mu, \tau| Y).
Now we have two full conditional distributions but what we really need is to sample from \pi(\tau | Y).
Actually, if we could sample from \pi(\mu, \tau| Y), we already know that the draws for \mu and \tau will be from the two marginal posterior distributions. So, we just need a scheme to sample from \pi(\mu, \tau| Y).
Suppose we had a single sample, say \tau^{(1)} from the marginal posterior distribution \pi(\tau | Y). Then we could sample
\mu^{(1)} \sim p(\mu | \tau^{(1)}, Y).
Now we have two full conditional distributions but what we really need is to sample from \pi(\tau | Y).
Actually, if we could sample from \pi(\mu, \tau| Y), we already know that the draws for \mu and \tau will be from the two marginal posterior distributions. So, we just need a scheme to sample from \pi(\mu, \tau| Y).
Suppose we had a single sample, say \tau^{(1)} from the marginal posterior distribution \pi(\tau | Y). Then we could sample
\mu^{(1)} \sim p(\mu | \tau^{(1)}, Y).
This is what we did in the last class, so that the pair \{\mu^{(1)},\tau^{(1)}\} is a sample from the joint posterior \pi(\mu, \tau| Y).
Now we have two full conditional distributions but what we really need is to sample from \pi(\tau | Y).
Actually, if we could sample from \pi(\mu, \tau| Y), we already know that the draws for \mu and \tau will be from the two marginal posterior distributions. So, we just need a scheme to sample from \pi(\mu, \tau| Y).
Suppose we had a single sample, say \tau^{(1)} from the marginal posterior distribution \pi(\tau | Y). Then we could sample
\mu^{(1)} \sim p(\mu | \tau^{(1)}, Y).
This is what we did in the last class, so that the pair \{\mu^{(1)},\tau^{(1)}\} is a sample from the joint posterior \pi(\mu, \tau| Y).
\Rightarrow \ \mu^{(1)} can be considered a sample from the marginal distribution of \mu, which again means we can use it to sample
\tau^{(2)} \sim p(\tau | \mu^{(1)}, Y),
and so forth.
So, we can use two full conditional distributions to generate samples from the joint distribution, once we have a starting value \tau^{(1)}.
Formally, this sampling scheme is known as Gibbs sampling.
So, we can use two full conditional distributions to generate samples from the joint distribution, once we have a starting value \tau^{(1)}.
Formally, this sampling scheme is known as Gibbs sampling.
So, we can use two full conditional distributions to generate samples from the joint distribution, once we have a starting value \tau^{(1)}.
Formally, this sampling scheme is known as Gibbs sampling.
So, we can use two full conditional distributions to generate samples from the joint distribution, once we have a starting value \tau^{(1)}.
Formally, this sampling scheme is known as Gibbs sampling.
Method: Iterative conditional sampling
Purpose: Full conditional distributions have known forms, with sampling from the full conditional distributions fairly easy.
So, we can use two full conditional distributions to generate samples from the joint distribution, once we have a starting value \tau^{(1)}.
Formally, this sampling scheme is known as Gibbs sampling.
Method: Iterative conditional sampling
Purpose: Full conditional distributions have known forms, with sampling from the full conditional distributions fairly easy.
More generally, we can use this method to generate samples of \theta = (\theta_1, \ldots, \theta_p), the vector of p parameters of interest, from the joint density.
Procedure:
For iterations s = 1, \ldots, S,
Sample \theta_1^{(s)} from the conditional posterior distribution
\pi(\theta_1 | \theta_2 = \theta_2^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Sample \theta_2^{(s)} from the conditional posterior distribution
\pi(\theta_2 | \theta_1 = \theta_1^{(s)}, \theta_3 = \theta_3^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Similarly, sample \theta_3^{(s)}, \ldots, \theta_p^{(s)} from the conditional posterior distributions given current values of other parameters.
Procedure:
For iterations s = 1, \ldots, S,
Sample \theta_1^{(s)} from the conditional posterior distribution
\pi(\theta_1 | \theta_2 = \theta_2^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Sample \theta_2^{(s)} from the conditional posterior distribution
\pi(\theta_2 | \theta_1 = \theta_1^{(s)}, \theta_3 = \theta_3^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Similarly, sample \theta_3^{(s)}, \ldots, \theta_p^{(s)} from the conditional posterior distributions given current values of other parameters.
Procedure:
For iterations s = 1, \ldots, S,
Sample \theta_1^{(s)} from the conditional posterior distribution
\pi(\theta_1 | \theta_2 = \theta_2^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Sample \theta_2^{(s)} from the conditional posterior distribution
\pi(\theta_2 | \theta_1 = \theta_1^{(s)}, \theta_3 = \theta_3^{(s-1)}, \ldots, \theta_p = \theta_p^{(s-1)}, Y)
Similarly, sample \theta_3^{(s)}, \ldots, \theta_p^{(s)} from the conditional posterior distributions given current values of other parameters.
In the next module, we will look into a simple example of how this works, before going back to the normal model.
As we've seen so far, Bayesian inference is based on posterior distributions, that is,
\begin{split} \pi(\theta | y) & = \frac{\pi(\theta) \cdot p(y|\theta)}{\int_{\Theta}\pi(\tilde{\theta}) \cdot p(y| \tilde{\theta}) \textrm{d}\tilde{\theta}} = \frac{\pi(\theta) \cdot L(\theta | y)}{L(y)}, \end{split}
where y = (y_1, \ldots, y_n).
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |