---
title: "Lecture 05"
subtitle: "Bayesian Model fit and comparison"
author: "Jihong Zhang"
institute: "Educational Statistics and Research Methods"
title-slide-attributes:
data-background-image: ../Images/title_image.png
data-background-size: contain
data-background-opacity: "0.9"
execute:
echo: true
eval: true
format:
html:
code-tools: true
code-line-numbers: false
code-fold: false
code-summary: 'Click here to see R code'
number-offset: 1
fig.width: 10
fig-align: center
message: false
revealjs:
output-file: slides-index.html
logo: ../Images/UA_Logo_Horizontal.png
incremental: true # choose "false "if want to show all together
theme: [serif, ../pp.scss]
footer: <https://jihongzhang.org/posts/2024-01-12-syllabus-adv-multivariate-esrm-6553>
transition: slide
background-transition: fade
slide-number: true
chalkboard: true
number-sections: false
code-line-numbers: true
code-link: true
code-annotations: hover
code-copy: true
highlight-style: arrow
code-block-border-left: true
code-block-background: "#b22222"
bibliography: references.bib
---
## Today's Lecture Objectives
1. Bayesian methods for determining how well a model fits the data (absolute fit)
2. Bayesian methods for determining which model fits better (relative model fit)
3. In previous class...
1. We estimated the empty model and the full model (you can try other constrained model between the empty and full model)
- Empty model: a model without any covariate (predictors)
- Full model: a model with all possible covariates and up-to highest interaction effects
2. We also make the Stan code more efficient by vectorizing the parameters and the data
3. The next question is **how to determine the model is "good" enough**
- "good" has multiple meanings
- a typical criteria is to what degree the model can generate a "simulated" data that is similar to the original data
- or, the likelihood of original data given the parameters of model
## Types of Model fit
- Absolute model fit
- PPMC
- In SEM, RMSEA, chi-square, SRMR, and GFI
- Relative model fit
- information criterion
- Incremental model fit (not frequently used other than SEM)
- A special type of absolute model fit - how a model fits to a saturated model
- In SEM, comparative fit index (CFI), Tucker-Lewis index (TLI)
## Absolute Model Fit: PPMC
**Posterior predictive model checking** [@gelman1996] is a Bayesian method evaluation technique for determining if a model fits the data.
- Absolute model fit: "Does my model fit my data well?"
- Recall that "model is a simplified version of the true data-generation process"
- Thus, the model should be able to reproduce the "data" that is similar to observed data
- In machine learning, this is also called "validation", typically used as a separate "validation data sets"
- Overall idea: if a model fits the data well, then simulated data based on the model will resemble the observed data
------------------------------------------------------------------------
### Ingredients in PPMC:
::: nonincremental
- Original data
- Typically, characterized by some set of statistics (i.e., sample mean, standard deviation, covariance) applied to data
- Simulated data
::: nonincremental
- Generated from partial/all posterior draws in the Markov Chain
- Summarized by the same set of statistics
:::
:::
{fig-align="center" width="600"}
------------------------------------------------------------------------
## PPMC Example: Linear Models
The linear model from our example was:
$$
\text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\
+\beta_4 \text{HeightIN}_p\text{Group2}_p \\
+\beta_5 \text{HeightIN}_p\text{Group3}_p \\
+ e_p
$$
with:
::: nonincremental
- $\text{Group2}_p$ the binary indicator of person $p$ being in group 2
- $\text{Group}3_p$ the binary indicator of person $p$ being in group 3
- $e_p \sim N(0, \sigma_e)$
:::
```{r}
#| echo: false
load(here::here("teaching/2024-01-12-syllabus-adv-multivariate-esrm-6553/Lecture05/Code", "Lecture05.RData"))
```
```{stan filename = "FullModel_New.stan", output.var="display", eval=FALSE, tidy=FALSE}
data{
int<lower=0> N; // number of observations
int<lower=0> P; // number of predictors (plus column for intercept)
matrix[N, P] X; // model.matrix() from R
vector[N] weightLB; // outcome
real sigmaRate; // hyperparameter: prior rate parameter for residual standard deviation
}
parameters {
vector[P] beta; // vector of coefficients for Beta
real<lower=0> sigma; // residual standard deviation
}
model {
sigma ~ exponential(sigmaRate); // prior for sigma
weightLB ~ normal(X*beta, sigma); // linear model
}
```
```{r}
#| eval: true
fit_full_new$summary(variables = c('beta', 'sigma'))
```
__MLE Method:__
```{r}
dat <- read.csv(here::here("teaching", "2024-01-12-syllabus-adv-multivariate-esrm-6553", "Data", "DietData.csv"))
dat$DietGroup <- factor(dat$DietGroup, levels = 1:3)
dat$HeightIN60 <- dat$HeightIN - 60
res <- summary(lm(WeightLB ~ DietGroup*HeightIN60, data=dat))
round(res$coefficients, 3)
sd(resid(res)) # residual error
```
------------------------------------------------------------------------
### PPMC Process
The PPMC Process can be summarized as follows:
1. Select parameters from a single (sampling) iteration of the Markov Chain
2. Using the selected parameters and the model to simulate a data set with the sample size (with same number of observations/variables)
3. From the simulated data set, calculate selected summary statistics (e.g., mean, sd)
4. Repeat steps 1-3 for a fixed number of iterations (perhaps across the whole chain)
5. When done, compare position of observed summary statistics to that of the distribution of summary statistics from simulated data sets (predictive distribution)
------------------------------------------------------------------------
### Step 1: Assemble the posterior draws
For our linear model, let's denote our observed dependent variable as $Y$
- Note that independent variables are not modeled (not explained by statistical formula), so we cannot examine them.
First, let's assemble the posterior draws ($1000 \times 4 \times 7$):
```{r}
posteriorSample = fit_full_new$draws(variables = c('beta','sigma'), format = 'draws_matrix')
posteriorSample
```
------------------------------------------------------------------------
### Step 2: One draw from posterior
Next, let's draw one set of posterior values of parameters at random with replacement:
```{r}
set.seed(1234)
sampleIteration = sample(x = 1:nrow(posteriorSample), size = 1, replace = TRUE)
sampleIteration
posteriorSample[sampleIteration,]
```
Those draws of $\boldsymbol{\beta}$ and $\sigma$ can be used to generate a predicted $\hat{y}$:
$$
\hat{y}_i \sim N(\mathbf{X}\boldsymbol{\beta}_i, \sigma_i)
$$
- $i$ represents the $i$ th draw
------------------------------------------------------------------------
### Step 3: One simulated data
We then generate data based on this sampled iteration and our model distributional assumption:
```{r}
# Sample Size
N = nrow(dat)
# beta
betaVector = matrix(posteriorSample[sampleIteration, 1:6], ncol = 1)
betaVector
# sigma
sigma = posteriorSample[sampleIteration, 7]
# X
FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
X = model.matrix(FullModelFormula, data = dat)
simY = rnorm(n = N, mean = X %*% betaVector, sd = sigma)
head(simY)
```
------------------------------------------------------------------------
### Step 4: Summary Statistics of Simulated Data
Note that we do not want to directly compare simulated data to the observed data.
Instead, we extract some characteristics of simulated/observed data for comparison using summary statistics.
There are some advantages:
1. We may have research interests in only some characteristics of data (whether our model predict the mean of dependent variables)
2. We can QC detailed aspects of the fitting process of model
In this case, for example, we may be interested in whether the model captures the "location" or "scale" of WeightLB
```{r}
mean(dat$WeightLB)
(simMean = mean(simY))
sd(dat$WeightLB)
(simSD = sd(simY))
```
------------------------------------------------------------------------
### Step 5: Looping across all posterior samples
We can repeat step 2-4 for a set number of samples
Optionally, we can choose to use up all iterations we have for Markov Chains ($I = 4000$) in practice
```{r}
#| output-location: slide
I = nrow(posteriorSample)
## create empty simSD and simMean as placeholders
simSD = simMean = rep(NA, I)
for (i in 1:I) {
# beta
betaVector = matrix(posteriorSample[i, 1:6], ncol = 1)
# sigma
sigma = posteriorSample[i, 7]
# X
simY = rnorm(n = N, mean = X %*% betaVector, sd = sigma)
simMean[i] = mean(simY)
simSD[i] = sd(simY)
}
par(mfrow = 1:2)
hist(simMean)
hist(simSD)
```
------------------------------------------------------------------------
### Compare to the observed mean
::: columns
::: {.column width="50%"}
We can now compare our observed mean and standard deviation with that of the sample values.
- Blue line: the average value of predicted WeightLB
- Red line: the observed mean value of WeightLB
- The PDF of predictive values of summary statistics of WeightLB is called `posterior predictive distribution`
:::
::: {.column width="50%"}
PPMC can be checked using visual inspection:
```{r}
library(ggplot2)
ppp <- data.frame(
simMean = simMean,
simSD = simSD
)
ggplot(ppp) +
geom_histogram(aes(x = simMean), fill = "grey", col = "black") +
geom_vline(xintercept = mean(dat$WeightLB), col = "red", size = 1.4) + # red line: location of mean of predicted WeightLB by model
geom_vline(xintercept = mean(simMean), col = "blue", size = 1.4, alpha = 0.5) # blue line: location of mean of WeightLB
```
:::
:::
------------------------------------------------------------------------
### Compare to the observed SD
::: columns
::: {.column width="50%"}
Similarly, let's compare SD to the posterior predictive distribution of SD of WeightLB
- the observed SD is located as the center of posterior predictive distribution (PPD)
- the average mean of PPD is slightly higher than the observed SD
:::
::: {.column width="50%"}
```{r}
ggplot(ppp) +
geom_histogram(aes(x = simSD), fill = "grey", col = "black") +
geom_vline(xintercept = sd(dat$WeightLB), col = "red", size = 1.4) + # red line: location of mean of predicted WeightLB by model
geom_vline(xintercept = mean(simSD), col = "blue", size = 1.4, alpha = 0.5) # blue line: location of mean of WeightLB
```
:::
:::
------------------------------------------------------------------------
## PPMC Characteristics
PPMC methods are very useful
::: nonincremental
- They provide a visual way to determine if the model fits the observed data
- They are the main method of assessing absolute fit in Bayesian models
- Absolute fit assesses if a model fits the data instead of comparing to another model
:::
But, there are some drawbacks to PPMC methods
::: nonincremental
- Almost any statistic can be used
- Some are better than others (mean and SD of outcomes are nice choices for linear regression)
- No standard determining how much misfit is too much
- May be overwhelming to compute depending on your model
:::
**Question**: Can PPMC be used for models with maximum likelihood estimation or ordinary least squares?
------------------------------------------------------------------------
## Posterior Predictive P-values
::: columns
::: {.column width="50%"}
We can summarize the PPMC using a type of "p-value"
> Personally, I don't like the name "p-value", sounds like we are trying to justify our results using significance testing
Different from the frequentist "p-value" (if the null hypothesis is true, the probability of the observed data existing)
- The PPP-value: the proportion of times the statistic from the simulated data exceeds that of the observed data
- Useful to determine how far off a statistic is from its posterior predictive distribution
:::
::: {.column width="50%"}
If these p-values were:
1. near 0 or 1, indicating your model is far off your data
2. near .5, indicating your model fits your data in terms of the statistics you examined
The PPP-value for mean:
```{r}
mean(simMean > mean(dat$WeightLB))
```
The PPP-value for SD:
```{r}
mean(simSD > sd(dat$WeightLB))
```
:::
:::
------------------------------------------------------------------------
## Compute PPP-values within Stan
We can use the `generated quantities` block of Stan to compute PPP-values for us:
```{stan output.var='display', eval = FALSE, tidy = FALSE}
generated quantities{
// simulated data
array[N] real weightLB_rep = normal_rng(X*beta, sigma);
// posterior predictive distribution for mean and SD
real mean_weightLB = mean(weightLB);
real sd_weightLB = sd(weightLB);
real mean_weightLB_rep = mean(to_vector(weightLB_rep));
real<lower=0> sd_weightLB_rep = sd(to_vector(weightLB_rep));
// ppp-values for mean and sd
int<lower=0, upper=1> ppp_mean = (mean_weightLB_rep > mean_weightLB);
int<lower=0, upper=1> ppp_sd = (sd_weightLB_rep > sd_weightLB);
}
```
It will give us:
```{r}
fit_full_ppp$summary(variables = c('mean_weightLB_rep', 'sd_weightLB_rep', 'ppp_mean', 'ppp_sd'))
```
------------------------------------------------------------------------
## Advantages or disadvantages of Computing PPP within Stan
::: columns
::: {.column width="60%"}
Pros:
1. Built-in functions of Stan to generate simulated data for example `normal_rng()`, making PPP-values estimated much faster
2. Nice visual inspection tools existed - `bayesplot`
Cons:
1. Not allowed to debug each step in PPMC if something wrong
2. Cannot adjust the statistics and need to re-run the whole MCMC sampling, which is time-consuming
:::
::: {.column width="40%"}
```{r}
#| fig-cap: 'Posterior predictive distribution for mean of weight by chains'
bayesplot::mcmc_dens_chains(fit_full_ppp$draws('mean_weightLB_rep'))
```
:::
:::
------------------------------------------------------------------------
## Relative Model Fit
Relative model fit: used to compare 2 or more competing models in terms of their mode fit. Sometime, it is also called model selection.
- In non-Bayesian models, Information Criteria are often used to make comparisons
- AIC, BIC, DIC etc.
- Typically IC is a function of log-likelihood and penalty
- The model with the lowest IC is the model that fits best
- Bayesian model fit is similar
- Uses an index value
- The model with the lowest index is the model that fits best
- Recent advances in Bayesian model fit use indices that are tied to make cross-validation predictions (inspired by machine learning):
- Fit model leaving one observation out (LOO)
- Calculate statistics related to prediction (for instance, log-likelihood of that observation conditional on model parameters)
- Do for all observations
- New Bayesian indices try to mirror these leave-one-out predictions (but approximate these due to time constraints)
------------------------------------------------------------------------
## Deviance Information Indices
When late 1990s and early 2000s, the **Deviance Information Criterion** was popular for relative Bayesian model fit comparisons. It is proved not as good as LOO or WAIC. But let's have a look at:
$$
\text{DIC} = p_D + \overline{D(\theta)}
$$
where $p_D$ is the estimated number of parameters as follows:
$$p_D = \overline{D(\theta)} - D(\bar\theta)$$and where
$$
D(\theta) = -2 \log(p(y \mid \theta)) + C
$$
C is a constant that cancels out when model comparisons are made
Here.
- $\overline{D(\theta)}$ is the average log likelihood of the data (y) given the parameters ($\theta$) computed across all samples
- $D(\bar\theta)$ is the log likelihood of the data (y) computed at the average of the parameters ($\theta$) computed across all samples
------------------------------------------------------------------------
### DIC in Stan
Some program like JAGS will have provided DIC directly, but Stan does not provides direct ways of calculating DIC:
We can manually calculate approximated DIC by replacing $\bar\theta$ as median of posteriors $\dot\theta$ if the posterior distributions are almost normal.
::: columns
::: {.column width="50%"}
**DIC for Full Model**
```{r}
# extract log-likelihood and parameters
posterior_draws <- fit_full_ppp$draws(c('log_lik', 'beta'), format = 'draws_matrix')
log_like_draws <- rowSums(posterior_draws[,colnames(posterior_draws) %in% paste0("log_lik[", 1:30, "]")])
# find draws of loglikelihood that have parameters equaling to median values
theta_draws <- posterior_draws[, colnames(posterior_draws) %in% paste0("beta[", 1:6, "]")]
theta_ave_iteration = apply(round(theta_draws, 3), 2, \(x) which(x == median(x))) |> unlist() |> as.numeric()
log_like_draws_meantheta = (-2)*mean(log_like_draws[theta_ave_iteration])
mean_log_like_draws = (-2)*mean(log_like_draws)
# compute DIC
DIC_full = mean_log_like_draws - log_like_draws_meantheta + mean_log_like_draws
DIC_full
```
:::
::: {.column width="50%"}
**DIC for Empty Model**
```{r}
# extract log-likelihood and parameters
posterior_draws <- fit_empty_ppp$draws(c('log_lik', 'beta0'), format = 'draws_matrix')
log_like_draws <- rowSums(posterior_draws[,colnames(posterior_draws) %in% paste0("log_lik[", 1:30, "]")])
# find draws of loglikelihood that have parameters equaling to median values
theta_draws <- posterior_draws[, colnames(posterior_draws) == "beta0"]
theta_ave_iteration = which(round(theta_draws,2) == median(round(theta_draws,2)))
log_like_draws_meantheta = (-2)*mean(log_like_draws[theta_ave_iteration])
mean_log_like_draws = (-2)*mean(log_like_draws)
# compute DIC
DIC_empty = mean_log_like_draws - log_like_draws_meantheta + mean_log_like_draws
DIC_empty
```
:::
:::
------------------------------------------------------------------------
### Issues to DIC
The DIC has fallen out of favor recently (not used in Stan)
::: nonincremental
- Has issues when parameters are discrete
- Not fully Bayesian (point estimate of average of parameter values)
- Can give negative values for estimated numbers of parameters in a model
:::
**WAIC** (Widely applicable or Watanabe-Akaike information criterion, Watanabe, 2010) corrects some of the problems with DIC:
::: nonincremental
- Fully Bayesian (uses entire posterior distribution)
- Asymptotically equal to Bayesian cross-validation
- Invariant to parameterization
:::
------------------------------------------------------------------------
## Watanabe-Akaike information criterion (WAIC)
::: nonincremental
- A more frequently used model comparison indices for Bayesian analysis
- Using the `loo` R package, we can calculate WAIC with the `waic()` function:
:::
::: columns
::: {.column width="50%"}
**WAIC for the full model**
```{r}
loo::waic(fit_full_ppp$draws('log_lik'))
```
:::
::: {.column width="50%"}
**WAIC for the empty model**
```{r}
loo::waic(fit_empty_ppp$draws('log_lik'))
```
:::
:::
::: nonincremental
- Here:
- `elpd_waic` is the expected log pointwise predictive density for WAIC
- `p_waic` is the WAIC-version estimated number of parameter, similar to $p(D)$ in DIC, which is a penalty to the likelihood for more parameters
- `waic` is the WAIC index used for model comparisons (lowest value is best fitting; -2\*`elpd_waic`)
- Note that WAIC needs a `log_lik` variable in the model analysis to be calculated correctly. `cmdstan` will automate calculate this variable.
:::
------------------------------------------------------------------------
## LOO: Approximation to Leave-one-out
Big picture:
::: nonincremental
1. Besides WAIC, other comparative fit indices include LOO via Pareto Smoothed Important Sampling (PSIS) via Stan's `LOO` package
2. WAIC/LOO can be used for model comparison with lowest value suggesting better model fit
3. Different from DIC, LOO via PSIS attempts to "approximate" the process of leave-one-out cross-validation (LOO-CV) using a sampling based-approach
::: nonincremental
- Gives a finite-sample approximation
- Implemented in Stan
- Can quickly compare models
- Gives warnings when it may be less reliable to use
:::
4. The details of computation of LOO are very technical, but are nicely compiled in Vehtari, Gelman, and Gabry (2017).
:::
------------------------------------------------------------------------
### LOO: Leave-one-out in Stan
Using `loo` package, we can calculate efficient approximate leave-one-out cross-validation (LOO-CV)
::: columns
::: {.column width="50%"}
Full model's LOO:
```{r}
#| echo: false
library(loo)
```
```{r}
#| warning: false
full_loo_res <- fit_full_ppp$loo('log_lik', save_psis = TRUE)
full_loo_res$estimates
```
:::
::: {.column width="50%"}
Empty model's LOO:
```{r}
empty_loo_res <- fit_empty_ppp$loo('log_lik', save_psis = TRUE)
empty_loo_res$estimates
```
:::
:::
Here:
- `elpd_loo` is the expected log pointwise predictive density for LOO (recall that posterior predictive distribution has some uncertainty around the mean value...)
- `p_loo` is the LOO calculation of number of model parameters (a penalty to the likelihood for more parameters)
- `looic` is the LOO index used for model comparisons — lowest value suggests best fitting -2`elpd_loo`
------------------------------------------------------------------------
### LOO: Comparing Model with LOO
Alternative, you can use the built-in function `loo_compare` in `loo` package to compare alternative models:
```{r}
loo::loo_compare(list(empty = fit_empty_ppp$loo(), full = fit_full_ppp$loo()))
```
This function calculats the standard error of the difference in `elpd` (expected log pointwise predictive density) between models
::: nonincremental
- The SE gives an indication of the standard error in the estimate (relative to the size)
- Can use this to downweight the choice of models when the standard error is high (Open Question: how high is high?)
- $$
0 \in \text{elpd_diff}\ \pm\ 1.96*\text{se_diff}
$$
- Note: `elpd_diff` is `looic` divided by -2 (on the log-likelihood scale, not the deviance scale)
- Here, we interpret the results as the model with full predictors is preferred to the model with empty predictors.
- The "confidence interval" of the `elpd_diff` does not include 0, indicating we can be fairly certain of this result
:::
------------------------------------------------------------------------
### LOO: Pareto smoothed importance sampling (PSIS)
::: nonincremental
- Estimated a vector of Pareto shape parameters $k$ for individuals which represents the reliability of sampling:
- $k < .5$ (good) suggests the estimate converges quickly
- $.5 < k < .7$ (ok) suggests the estimate converges slowly
- $.7 < k < 1$ (bad) suggests bad performance
- $k > 1$ (very bad)
- PSIS screens all cases that have bad diagnostic values. The percentage may tells some information regarding reliability of Bayesian estimation.
- Criteria: Estimated tail shape parameter
- This is new area, please see more references [@vehtari2016]
:::
```{r}
pareto_k_table(full_loo_res)
```
------------------------------------------------------------------------
## Compile the relative model fit indices
Below shows the all three model comparison fit indices for the empty model and the full model.
```{r}
data.frame(
Model = c("Full Model", "Empty Model"),
DIC = c(DIC_full, DIC_empty),
WAIC = c(WAIC_full, WAIC_empty),
LOOIC = c(LOO_full, LOO_empty)
)
```
::: nonincremental
- All fit indices suggest the full model is better than the empty model
- For simple models, the results of DIC/WAIC/LOOIC should be consistent
- In some situations that they are inconsistent, please use WAIC/LOO as standards.
:::
------------------------------------------------------------------------
## General points about Bayesian Model Comparison
::: nonincremental
- Note, WAIC and LOO will converge as sample size increases (WAIC is asymptotic value of LOO)
- Latent variable models present challenges (could be your dissertation project)
- Need log likelihood with latent variable being integrated out
- Missing data present challenges
- Need log likelihood with missing data integrated out
- Generally, using LOO is recommended (but providing both is appropriate)
:::
------------------------------------------------------------------------
## Cutting-Edge Research Field: Approximation Algorithm
::: nonincremental
- MCMC sampling could be very computational time-consuming.
- For number of parameters up to thousands or even millions, it could be a challenge to get the estimation.
- Thus, approximation algorithms has been developed, such as Laplace Approximation, Variational Inference [@dhaka2020; @yao2018], [Pathfinder methods](https://statmodeling.stat.columbia.edu/2021/08/10/pathfinder-a-parallel-quasi-newton-algorithm-for-reaching-regions-of-high-probability-mass/) [@zhang2021].
:::
Quoted from [Gelman's blog](https://statmodeling.stat.columbia.edu/2023/02/08/implementing-laplace-approximation-in-stan-whats-happening-under-the-hood/):
> ::: columns
> ::: {.column width="50%"}
> **Advantages of Laplace:**
>
> ::: nonincremental
> - Relatively cheap to compute, given that we already have a mode-finder in Stan.
> - Easy to understand, use, and communicate.
> - Works reasonably well in many examples (wherever the posterior can be well approximated by a normal distribution).
> - Easy to take draws from the normal approximation, also easy to compute importance ratios and use [Pareto-smoothed importance sampling](http://www.stat.columbia.edu/~gelman/research/unpublished/psis4.pdf).
> :::
> :::
>
> ::: {.column width="50%"}
> **Limitations of Laplace:**
>
> ::: nonincremental
> - Sometimes the normal approx is pretty bad (funnels, multimodal distributions, long tails).
>
> - Sometimes the joint mode is useless or does not even exist (funnels, etc.), in which case the model itself would need to be altered in some way to get a stable mode.
> :::
> :::
> :::
```{r}
#| message: false
#| warning: false
#| code-fold: true
#| error: false
#| eval: false
# Stan's LBFGS algorithm
fit_full_optim <- mod_full_ppp$optimize(data = data_full_new, seed = 1234, jacobian = TRUE)
fit_full_laplace <- mod_full_ppp$laplace(data = data_full_new, mode = fit_full_optim, draws = 4000)
# Run 'variational' method to use ADVI to approximate posterior
fit_full_vb <- mod_full_ppp$variational(data = data_full_new, seed = 1234, draws = 4000)
# Run 'pathfinder' method, a new alternative to the variational method
fit_pf <- mod_full_ppp$pathfinder(data = data_full_new, seed = 1234, draws = 4000)
```
------------------------------------------------------------------------
### Approximation Algorithm vs. Full MCMC
Though they are very quicker to converge, the accuracy of these approximation algorithms largely depend on the complexity of modeling (number of parameters, prior distributions, latent variables etc.)
```{r}
#| code-fold: true
#| fig-cap: "Mean and 95% Credit Interval of aproximation algorithms and MCMC"
summ_list <- lapply(c(fit_full_ppp, fit_full_laplace, fit_full_vb, fit_pf), \(x)
x$summary(c("beta", "sigma"))[c('variable', 'mean',"q5", "q95")])
summ_list[[1]]$Algorithm = "MCMC"
summ_list[[2]]$Algorithm = "Laplace Approx."
summ_list[[3]]$Algorithm = "Variational Inference"
summ_list[[4]]$Algorithm = "Pathfinder"
summ_forplot <- Reduce(rbind, summ_list)
summ_forplot$Algorithm = factor(summ_forplot$Algorithm, levels = c("MCMC", "Laplace Approx.", "Pathfinder", "Variational Inference"))
ggplot(summ_forplot) +
geom_col(aes(x = variable, y = mean, fill = Algorithm), position = position_dodge()) +
geom_errorbar(aes(x = variable, ymax = q95, ymin = q5, fill = Algorithm),position = position_dodge2(padding = 0.6)) +
labs(x = '', y = '') +
theme_classic()
```
------------------------------------------------------------------------
```{r}
library(tidyverse)
rbind(
cbind(data.frame(Algorithm = "Laplace Approx."), fit_full_laplace$draws(paste0("beta[", 1:6, "]"), format = "draws_df")[, 1:6]),
cbind(data.frame(Algorithm = "Variational Inference"), fit_full_vb$draws(paste0("beta[", 1:6, "]"), format = "draws_df")[, 1:6]),
cbind(data.frame(Algorithm = "Pathfinder"), fit_pf$draws(paste0("beta[", 1:6, "]"))[, 1:6]),
cbind(data.frame(Algorithm = "MCMC"), fit_full_ppp$draws(paste0("beta[", 1:6, "]"), format = "draws_df")[, 1:6])
) |>
mutate(Algorithm = factor(Algorithm, levels = c("MCMC", "Laplace Approx.", "Pathfinder", "Variational Inference"))) |>
pivot_longer(starts_with("beta"), names_to = "Parameter", values_to = "Draws") |>
ggplot() +
geom_density(aes(x = Draws, fill = Algorithm), alpha = 0.4) +
facet_wrap(~Parameter, scales = "free")
```
------------------------------------------------------------------------
## Wrapping up
The three lectures using linear models was built to show nearly all parts needed in a Bayesian analysis
- MCMC specifications
- Prior specifications
- Assessing MCMC convergence
- Reporting MCMC results
- Determining if a model fits the data (absolute fit)
- Determining which model fits the data better (relative fit)
All of these topics will be with us when we start model complicated models in our future lecture.
------------------------------------------------------------------------
## Next Class
1. Generalized measurement models
## Reference