Lecture 04

Linear Regression Model with Stan II

Jihong Zhang

Educational Statistics and Research Methods

Today’s Lecture Objectives

Explain Stan Syntax for linear regression model
Computing Functions of Model Parameters

Download R file DietDataExample2.R

In previous class…

R code for data read in

library(cmdstanr)
library(bayesplot)
library(tidyr)
library(dplyr)
library(kableExtra)
color_scheme_set('brightblue')
dat <- read.csv(here::here("teaching", "2024-01-12-syllabus-adv-multivariate-esrm-6553", "Lecture03", "Code", "DietData.csv"))
dat$DietGroup <- factor(dat$DietGroup, levels = 1:3)
dat$HeightIN60 <- dat$HeightIN - 60
kable( rbind(head(dat), tail(dat)) ) |>
  kable_classic_2() |>
  kable_styling(full_width = F, font_size = 15)

	Respondent	DietGroup	HeightIN	WeightLB	HeightIN60
1	1	1	56	140	-4
2	2	1	60	155	0
3	3	1	64	143	4
4	4	1	68	161	8
5	5	1	72	139	12
6	6	1	54	159	-6
25	25	3	70	259	10
26	26	3	52	201	-8
27	27	3	59	228	-1
28	28	3	64	245	4
29	29	3	65	241	5
30	30	3	72	269	12

Introduce the empty model
Example: Post-Diet Weights
- WeightLB (Dependent Variable): The respondents’ weight in pounds
- HeightIN: The respondents’ height in inches
- DietGroup: 1, 2, 3 representing the group to which a respondent was assigned
The empty model has two parameters to be estimated: (1) \(\beta_0\), (2) \(\sigma_e\)
The posterior mean/median of \(\beta_0\) should be mean of WeightLB
The posterior mean/median of \(\sigma_e\) should be sd of WeightLB

Making `Stan` Code Short and Efficient

The Stan syntax from our previous model was lengthy:

A declared variable for each parameter
The linear combination of coefficients by multiplying predictors

Stan has built-in features to shorten syntax:

Matrices/Vectors
Matrix products
Multivariate distribution (initially for prior distributions)
Built-in Functions (sum() better than +=)

Note: if you are interested in Efficiency tuning in Stan, look at this Charpter for more details.

Linear Models without Matrices

The linear model from our example was:

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

with:

\(\text{Group2}_p\) the binary indicator of person \(p\) being in group 2
\(\text{Group}3_p\) the binary indicator of person \(p\) being in group 3
\(e_p \sim N(0, \sigma_e)\)

Path Diagram of the Full Model

graph LR;
  HeightIN60 --> WeightLB;
  DietGroup2 --> WeightLB;
  DietGroup3 --> WeightLB;
  HeightIN60xDietGroup2 --> WeightLB;
  HeightIN60xDietGroup3 --> WeightLB;

Figure 1

Linear Models with Matrices

Model (predictor) matrix with the size 30 (rows) \(\times\) 6 (columns)

\[ \mathbf{X} = \begin{bmatrix}1 & -4 & 0 & 0 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 12 & 0 & 1 & 0 & 12 \end{bmatrix} \]

Coefficients vectors with the size 6 (rows) \(\times\) 1 (column):

\[ \mathbf{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \end{bmatrix} \]

model.matrix creates a design (or model) matrix (\(X\)), e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.

FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
model.matrix(FullModelFormula, data = dat) |> unique()

   (Intercept) HeightIN60 DietGroup2 DietGroup3 HeightIN60:DietGroup2
1            1         -4          0          0                     0
2            1          0          0          0                     0
3            1          4          0          0                     0
4            1          8          0          0                     0
5            1         12          0          0                     0
6            1         -6          0          0                     0
7            1          2          0          0                     0
8            1          5          0          0                     0
10           1         10          0          0                     0
11           1         -4          1          0                    -4
12           1          0          1          0                     0
13           1          4          1          0                     4
14           1          8          1          0                     8
15           1         12          1          0                    12
16           1         -6          1          0                    -6
17           1          2          1          0                     2
18           1          5          1          0                     5
20           1         10          1          0                    10
21           1         -6          0          1                     0
22           1         -2          0          1                     0
23           1          2          0          1                     0
24           1          6          0          1                     0
25           1         10          0          1                     0
26           1         -8          0          1                     0
27           1         -1          0          1                     0
28           1          4          0          1                     0
29           1          5          0          1                     0
30           1         12          0          1                     0
   HeightIN60:DietGroup3
1                      0
2                      0
3                      0
4                      0
5                      0
6                      0
7                      0
8                      0
10                     0
11                     0
12                     0
13                     0
14                     0
15                     0
16                     0
17                     0
18                     0
20                     0
21                    -6
22                    -2
23                     2
24                     6
25                    10
26                    -8
27                    -1
28                     4
29                     5
30                    12

Linear Models with Matrices (Cont.)

We then rewrite the equation from

to:

\[ \mathbf{WeightLB} = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]

Where:

\(\mathbf{WeightLB}\) is the vector of outcome (N \(\times\) 1)
\(\mathbf{X}\) is the model (predictor) matrix (N \(\times\) P for P - 1 predictors)
\(\boldsymbol{\beta}\) is the coefficients vector (P \(\times\) 1)
\(\mathbf{e}\) is the vector for residuals (N \(\times\) 1)

Example: Predicted Values and \(\text{R}^2\)

Similar to Monte Carlo Simulation, given matrices \(P\) and \(\boldsymbol{\beta}\)

Code

set.seed(1234)
fit_lm <- lm(formula = FullModelFormula, data = dat)

beta = coefficients(fit_lm)
P = length(beta)
X = model.matrix(FullModelFormula, data = dat)
head(X%*%beta)

Calculating \(R^2\) and adjusted \(R^2\):

rss = crossprod(dat$WeightLB - X%*%beta) # residual sum of squares
tss = crossprod(dat$WeightLB - mean(dat$WeightLB)) # total sum of squares
R2 = 1 - rss / tss
R2.adjust = 1 - (rss/(nrow(dat)-P)) / (tss/((nrow(dat)-1)))
data.frame(
  R2, # r-square
  R2.adjust # adjusted. r-square
)

         R2 R2.adjust
1 0.9786724 0.9742292

lm function: \(R^2\) and adjusted \(R^2\):

summary_lm <- summary(fit_lm)
summary_lm$r.squared

[1] 0.9786724

summary_lm$adj.r.squared

[1] 0.9742292

Vectorize prior distributions

Previously, we defined a normal distribution for each regression coefficient \[ \beta_0 \sim normal(0, 1) \\ \vdots \\ \beta_p \sim normal(0, 1) \]

They are all univariate normal distribution
Issue: Each parameter had a prior that was independent of the other parameter; then the correlation between betas is low and cannot be changed.

For example, the code shows two betas with univariate normal distribution have low correlation (r = -0.025)

set.seed(1234)
beta0 = rnorm(100, 0, 1)
beta1 = rnorm(100, 0, 1)
cor(beta0, beta1)

[1] -0.02538285

Vectorize prior distributions (Cont.)

When combining all parameters into a vector, a natural extension is a multivariate normal distribution, so that the betas have a pre-defined correlation strength

The syntax shows the two betas generated by the multivariate normal distribution with correlation of .5

set.seed(1234)
sigma_of_betas = matrix(c(1, 0.5, 0.5, 1), ncol = 2)
betas = mvtnorm::rmvnorm(100, mean = c(0, 0), sigma = sigma_of_betas)
beta0 = betas[,1]
beta1 = betas[,2]
cor(beta0, beta1)

[1] 0.5453899

Back to the stan code, if we want to have more control over beta’s prior distribution, we need to specify:

Mean vector of betas (meanBeta; size P \(\times\) 1)
- Put all prior means for those coefficients into a vector
Covariance matrix for betas (covBeta; size P \(\times\) P)
- Put all prior variances into the diagonal; zeros for off diagonal; ’cause we are not sure the potential correlation between betas

Syntax Changes: Data Section

Old syntax without matrix:

data{
    int<lower=0> N;
    vector[N] weightLB;
    vector[N] height60IN;
    vector[N] group2;
    vector[N] group3;
    vector[N] heightXgroup2;
    vector[N] heightXgroup3;
}

New syntax with matrix:

data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: rate parameter for residual standard deviation
}

Syntax Changes: Parameters Section

Old syntax without matrix:

parameters {
  real beta0;
  real betaHeight;
  real betaGroup2;
  real betaGroup3;
  real betaHxG2;
  real betaHxG3;
  real<lower=0> sigma;
}

New syntax with matrix:

parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}

Syntax Changes: Prior Distributions Definition

Old syntax without matrix:

model {
  beta0 ~ normal(0,100);
  betaHeight ~ normal(0,100);
  betaGroup2 ~ normal(0,100);
  betaGroup3 ~ normal(0,100);
  betaHxG2 ~ normal(0,100);
  betaHxG3 ~ normal(0,100);
  sigma ~ exponential(.1); // prior for sigma
  weightLB ~ normal(
    beta0 + betaHeight * height60IN + betaGroup2 * group2 +
    betaGroup3*group3 + betaHxG2*heightXgroup2 +
    betaHxG3*heightXgroup3, sigma);
}

New syntax with matrix:

multi_normal() is the multivariate normal sampling in Stan, similar to rmvnorm() in R; For uninformative, we did not need to specify
exponential() is the exponential distribution sampling in Stan, similar to rexp() in R

model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}

A little more about exponential distribution

The mean of the exp. distribution is \(\frac{1}{\lambda}\), where \(\lambda\) is called rate parameter
The variance of the exp. distribution is \(\frac{1}{\lambda^2}\)
It is typically positive skewed (skewness is 2)
Question: which hyperparameter rate \(\lambda\) is most informative/uninformative

Code

library(tidyr)
library(dplyr)
library(ggplot2)
rate_list = seq(0.1, 1, 0.2)
pdf_points = sapply(rate_list, \(x) dexp(seq(0, 20, 0.01), x)) |> as.data.frame()
colnames(pdf_points) <- rate_list
pdf_points$x = seq(0, 20, 0.01)
pdf_points %>%
  pivot_longer(-x, values_to = 'y') %>%
  mutate(
    sigmaRate = factor(name, levels = rate_list)
    ) %>%
  ggplot() +
  geom_path(aes(x = x, y = y, color = sigmaRate, group = sigmaRate), size = 1.2) +
  scale_x_continuous(limits = c(0, 20)) +
  labs(x = "Sigma")

Figure 2: PDF for the exponential distribution by varied rate parameters

Since we talked about Exponential distribution…

Let’s dive deeper into Laplace distribution. It is sometimes called double-exponential distribution. Exponential distribution is positive part of Laplace distribution.

\[ \text{PDF}_{exp.} = \lambda e^{-\lambda x} \]

\[ \text{PDF}_{laplace} = \frac{1}{2b} e^{-\frac{|x - u|}{b}} \]

Thus, we know that for x > 0, exponential distribution is a special case of Laplace distribution with scale parameter \(b\) as \(\frac{1}{\lambda}\) and location parameter as 0.
Laplace-based distribution, Cauchy, and Horseshoe distribution all belong to so-called “shrinkage” priors.

Shrinkage priors will be very useful for high-dimensional data (say P = 1000) and variable selection

Code

library(LaplacesDemon)
b_list = 1 / rate_list * 2
pdf_points = sapply(b_list, \(x) dlaplace(seq(-20, 20, 0.01), scale = x, location = 0)) |> as.data.frame()
colnames(pdf_points) <- round(b_list, 2)
pdf_points$x = seq(-20, 20, 0.01)
pdf_points %>%
  pivot_longer(-x, values_to = 'y') %>%
  mutate(
    scale = factor(name, levels = round(b_list, 2))
    ) %>%
  ggplot() +
  geom_path(aes(x = x, y = y, color = scale, group = scale), size = 1.2) +
  scale_x_continuous(limits = c(-20, 20)) +
  labs(x = "")

Compare results and computational time

Code

code_path <- here::here("teaching", "2024-01-12-syllabus-adv-multivariate-esrm-6553", "Lecture04", "Code")
mod_full_old <- cmdstan_model(paste0(code_path, "/FullModel_Old.stan"))
data_full_old <- list(
  N = nrow(dat),
  weightLB = dat$WeightLB,
  height60IN = dat$HeightIN60,
  group2 = as.numeric(dat$DietGroup == 2),
  group3 = as.numeric(dat$DietGroup == 3),
  heightXgroup2 = as.numeric(dat$DietGroup == 2) * dat$HeightIN60,
  heightXgroup3 = as.numeric(dat$DietGroup == 3) * dat$HeightIN60
)
fit_full_old <- mod_full_old$sample(
  data = data_full_old,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)

fit_full_old$summary()[, -c(9, 10)]

# A tibble: 8 × 8
  variable      mean  median    sd   mad     q5     q95  rhat
  <chr>        <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__       -76.7   -76.3   2.17  2.00  -80.8  -73.8    1.00
2 beta0      148.    148.    3.26  3.20  142.   153.     1.00
3 betaHeight  -0.370  -0.378 0.495 0.488  -1.17   0.437  1.00
4 betaGroup2 -24.2   -24.1   4.59  4.52  -31.6  -16.4    1.00
5 betaGroup3  81.2    81.1   4.31  4.36   74.6   88.5    1.00
6 betaHxG2     2.46    2.45  0.690 0.678   1.34   3.59   1.00
7 betaHxG3     3.56    3.55  0.658 0.648   2.48   4.66   1.00
8 sigma        8.26    8.10  1.23  1.13    6.52  10.5    1.00

Code

mod_full_new <- cmdstan_model("Code/FullModel_New.stan")
FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
X = model.matrix(FullModelFormula, data = dat)
data_full_new <- list(
  N = nrow(dat),
  P = ncol(X),
  X = X,
  weightLB = dat$WeightLB,
  sigmaRate = 0.1
)
fit_full_new <- mod_full_new$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4
)

fit_full_new$summary()[, -c(9, 10)]

# A tibble: 8 × 8
  variable    mean  median    sd   mad     q5     q95  rhat
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__     -76.6   -76.3   2.15  2.00  -80.6  -73.9    1.00
2 beta[1]  148.    148.    3.27  3.23  142.   153.     1.00
3 beta[2]   -0.374  -0.372 0.483 0.479  -1.16   0.414  1.00
4 beta[3]  -24.1   -24.2   4.59  4.45  -31.6  -16.5    1.00
5 beta[4]   81.3    81.3   4.44  4.40   74.1   88.5    1.00
6 beta[5]    2.47    2.48  0.683 0.676   1.32   3.56   1.00
7 beta[6]    3.57    3.57  0.646 0.643   2.52   4.62   1.00
8 sigma      8.25    8.10  1.26  1.20    6.50  10.5    1.00

Note that if you omit an explicit prior for a parameter it implies a uniform prior across that parameter’s domain.

mod_full_new$print(line_numbers = TRUE)

 1: data{
 2:   int<lower=0> N;         // number of observations
 3:   int<lower=0> P;         // number of predictors (plus column for intercept)
 4:   matrix[N, P] X;         // model.matrix() from R 
 5:   vector[N] weightLB;     // outcome
 6:   real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
 7: }
 8: parameters {
 9:   vector[P] beta;         // vector of coefficients for Beta
10:   real<lower=0> sigma;    // residual standard deviation
11: }
12: model {
13:   sigma ~ exponential(sigmaRate);         // prior for sigma
14:   weightLB ~ normal(X*beta, sigma);       // linear model
15: }
16:

mod_full_old$print(line_numbers = TRUE)

 1: data{
 2:     int<lower=0> N;
 3:     vector[N] weightLB;
 4:     vector[N] height60IN;
 5:     vector[N] group2;
 6:     vector[N] group3;
 7:     vector[N] heightXgroup2;
 8:     vector[N] heightXgroup3;
 9: }
10: parameters {
11:   real beta0;
12:   real betaHeight;
13:   real betaGroup2;
14:   real betaGroup3;
15:   real betaHxG2;
16:   real betaHxG3;
17:   real<lower=0> sigma;
18: }
19: model {
20:   sigma ~ exponential(.1); // prior for sigma
21:   weightLB ~ normal(
22:     beta0 + betaHeight * height60IN + betaGroup2 * group2 + 
23:     betaGroup3*group3 + betaHxG2*heightXgroup2 +
24:     betaHxG3*heightXgroup3, sigma);
25: }
26:

The differences between two method:

betaGroup3 has the largest differences between two methods

cbind(fit_full_old$summary()[,1], fit_full_old$summary()[, -c(1, 9, 10)] - fit_full_new$summary()[, -c(1, 9, 10)])

    variable         mean     median            sd          mad        q5
1       lp__ -0.007672400  0.0063000  0.0242926098  0.007931910 -0.115880
2      beta0  0.071524500  0.1155000 -0.0135811270 -0.029652000 -0.102200
3 betaHeight  0.004062205 -0.0067305  0.0121231572  0.008502340 -0.018165
4 betaGroup2 -0.013670300  0.0513000 -0.0005405792  0.069904590 -0.037665
5 betaGroup3 -0.103896750 -0.2107500 -0.1281290904 -0.041068020  0.468720
6   betaHxG2 -0.019823355 -0.0341300  0.0072874047  0.001823598  0.020645
7   betaHxG3 -0.010911025 -0.0213900  0.0116900063  0.005040840 -0.041349
8      sigma  0.011585872  0.0005500 -0.0326167204 -0.068377512  0.021184
          q95          rhat
1  0.03527000  2.381442e-03
2  0.00740000  1.487790e-04
3  0.02306265  1.225408e-03
4  0.07863000  4.347532e-05
5 -0.04479000 -4.122054e-05
6  0.02261500  5.967715e-04
7  0.03879350  7.265786e-05
8 -0.00486500  2.854078e-04

Compare computational time

The Stan code with matrix has faster computation:

fit_full_old$time()

$total
[1] 0.2250798

$chains
  chain_id warmup sampling total
1        1  0.053    0.041 0.094
2        2  0.061    0.048 0.109
3        3  0.055    0.051 0.106
4        4  0.054    0.042 0.096

fit_full_new$time()

$total
[1] 0.204052

$chains
  chain_id warmup sampling total
1        1  0.031    0.027 0.058
2        2  0.036    0.032 0.068
3        3  0.037    0.034 0.071
4        4  0.032    0.031 0.063

Pros: With matrices, there is less syntax to write
- Model is equivalent
- More efficient for sampling (sample from matrix space)
- More flexible: modify matrix elements in R instead of individual parameters in Stan
Cons: Output, however, is not labeled with respect to parameters
- May have to label output

Computing Functions of Parameters

Often, we need to compute some linear or non-linear function of parameters in a linear model
- Missing effects - beta for diet group 2 and 3
- Model fit indices: \(R^2\)
- Transformed effects - residual variance \(\sigma^2\)
In non-Bayesian (frequentist) analyses, there are often formed with the point estimates of parameters (with standard errors - second derivative of likelihood function)
For Bayesian analyses, however, we seek to build the posterior distribution for any function of parameters
- This means applying the function to all posterior samples
- It is especially useful when you want to propose your new statistic

Example: Need Slope for Diet Group 2

Recall our model:

Here, \(\beta_1\) denotes the average change in \(\text{WeightLB}\) with one-unit increase in \(\text{HeightIN}\) for members in the reference group— Diet Group 1.

Question: What about the slope for members in Diet Group 2.

Typically, we can calculate by hand by assign \(\text{Group2}\) as 1 and all effects regarding \(\text{HeightIN}\):

\[ \beta_{\text{group2}}*\text{HeightIN} = (\beta_1 + \beta_4*1 + \beta_5*0)*\text{HeightIN} \]

\[ \beta_{\text{group2}}= \beta_1 +\beta_4 \]
Similarly, the intercept for Group2 - the average mean of \(\text{WeightLB}\) is \(\beta_0 + \beta_2\).

Computing slope for Diet Group 2

Our task: Create posterior distribution for Diet Group 2

We must do so for each iteration we’ve kept from our MCMC chain
A somewhat tedious way to do this is after using Stan

fit_full_new$summary()

# A tibble: 8 × 10
  variable    mean  median    sd   mad     q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -76.6   -76.3   2.15  2.00  -80.6  -73.9    1.00    1401.    1886.
2 beta[1]  148.    148.    3.27  3.23  142.   153.     1.00    1402.    1838.
3 beta[2]   -0.374  -0.372 0.483 0.479  -1.16   0.414  1.00    1522.    2019.
4 beta[3]  -24.1   -24.2   4.59  4.45  -31.6  -16.5    1.00    1746.    2190.
5 beta[4]   81.3    81.3   4.44  4.40   74.1   88.5    1.00    1538.    2091.
6 beta[5]    2.47    2.48  0.683 0.676   1.32   3.56   1.00    1805.    2118.
7 beta[6]    3.57    3.57  0.646 0.643   2.52   4.62   1.00    1678.    2218.
8 sigma      8.25    8.10  1.26  1.20    6.50  10.5    1.00    2265.    2212.

beta_group2 <- fit_full_new$draws("beta[2]")  + fit_full_new$draws("beta[5]")
summary(beta_group2)

# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 beta[2]   2.10   2.09 0.478 0.478  1.32  2.88  1.00    4032.    3211.

Computing slope within Stan

Stan can compute these values for us-with the “generated quantities” section of the syntax

Stan code

data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  real slopeG2;
  slopeG2 = beta[2] + beta[5];
}

The generated quantities block computes values that do not affect the posterior distributions of the parameters–they are computed after the sampling from each iteration

The values are then added to the Stan object and can be seen in the summary
- They can also be plotted using bayesplot package

mod_full_compute <- cmdstan_model("Code/FullModel_compute.stan")
fit_full_compute <- mod_full_compute$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)

fit_full_compute$summary('slopeG2')

# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 slopeG2   2.10   2.09 0.478 0.478  1.32  2.88  1.00    4032.    3211.

bayesplot::mcmc_dens_chains(fit_full_compute$draws('slopeG2'))

Alternative way of computing the slope with Matrix

This is a little more complicated but more flexible method.

That is, we can make use of matrix operation and form a contrast matrix

Contrasts are linear combinations of parameters
- You may have used these in R using glht package

For use, we form a contrast matrix that is size of \(C \times P\) where C is the number of contrasts:

The entries of this matrix are the values that multiplying the coefficients
- For \((\beta_1 + \beta_2)\) this would be:
  - a “1” in the corresponding entry for \(\beta_1\)
  - a “1” in the corresponding entry for \(\beta_4\)
  - “0”s elsewhere
- \[ \mathbf{C} = \begin{bmatrix} 0\ \mathbf{1}\ 0\ 0\ \mathbf{1}\ 0 \end{bmatrix} \]
Then, the contrast matrix is multiplied by the coefficients vector to form the values:
- \(\mathbf{C} * \beta\)

Contrasts in Stan

Stan code

data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
  int<lower=0> nContrasts;
  matrix[nContrasts, P] contrast; // C matrix
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
}

R code

mod_full_contrast <- cmdstan_model("Code/FullModel_contrast.stan")
contrast_dat <- list(
  nContrasts = 2,
  contrast = matrix(
    c(0,1,0,0,1,0, # slope for diet group2
      1,0,1,0,0,0),# intercept for diet group 2
    nrow = 2, byrow = TRUE
  )
)
fit_full_contrast <- mod_full_contrast$sample(
  data = c(data_full_new, contrast_dat),
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)

fit_full_contrast$summary('computedEffects')[, -c(9, 10)]

# A tibble: 2 × 8
  variable             mean median    sd   mad     q5    q95  rhat
  <chr>               <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
1 computedEffects[1]   2.10   2.09 0.478 0.478   1.32   2.88  1.00
2 computedEffects[2] 123.   123.   3.16  3.02  118.   129.    1.00

bayesplot::mcmc_hist(fit_full_contrast$draws('computedEffects'))

Computing \(\text{R}^2\)

We can use the generated quantities section to build a posterior distribution for \(\text{R}^2\)

There are several formulas for \(\text{R}^2\), we will use the following:

\[ \text{R}^2 = 1-\frac{RSS}{TSS} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)} \] Where:

RSS is the residual sum of squares
TSS is the total sum of squares of dependent variable
\(\hat{y}_p\) is the predicted values: \(\hat{y}_p = \mathbf{X}\boldsymbol{\beta}\)
\(\bar{y}_p\) is the mean value of dependent variable: \(\bar{y}_p = \frac{\Sigma_{p=1}^{N}y_p}{N}\)

Notice: RSS depends on sampled parameters, so we will use this to build our posterior distribution for \(\text{R}^2\)

For adjusted \(\text{R}^2\), we use the following:

\[ \text{adj.R}^2 = 1-\frac{RSS/(N-P)}{TSS/(N-1)} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)}*\frac{N-P}{N-1} \]

Then, we can calculate the how to calculate \(\text{adj.R}^2\) by \(\text{R}^2\):

\[ \text{adj.R}^2 = 1-(1-\text{R}^2)*\frac{N-P}{N-1} = \frac{(P-1)+(N-1)R^2}{N-P} \]

`Stan` code for Computing \(\text{R}^2\)

Stan code

generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
  // compute R2
  real rss;
  real tss;
  real R2;
  real R2adj;
  {// anything in these brackets will not appear in summary table
    vector[N] pred = X*beta;
    rss = dot_self(weightLB-pred); // dot_self is stan function for matrix square
    tss = dot_self(weightLB-mean(weightLB));
  }
  R2 = 1-rss/tss;
  R2adj = 1-(rss/(N-P))/(tss/(N-1));
}

Recall that our lm function provides \(\text{R}^2\) as 0.9787 and adjusted \(\text{R}^2\) as 0.9742

fit_full_contrast$summary(c('rss', 'tss', 'R2','R2adj'))[, -c(9, 10)]

# A tibble: 4 × 8
  variable      mean    median        sd       mad        q5       q95  rhat
  <chr>        <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
1 rss       1941.     1877.    285.      226.       1616.     2499.     1.00
2 tss      71112     71112       0         0       71112     71112     NA   
3 R2           0.973     0.974   0.00401   0.00317     0.965     0.977  1.00
4 R2adj        0.967     0.968   0.00484   0.00384     0.958     0.973  1.00

bayesplot::mcmc_hist(fit_full_contrast$draws(c('R2', 'R2adj')))

Get posterior mode

# Create the function.
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Calculate the mode using the user function.
getmode(fit_full_contrast$draws('R2'))

[1] 0.974917

getmode(fit_full_contrast$draws('R2adj'))

[1] 0.96674

Wrapping up

Today we further added generated quantities into our Bayesian toolset:

How to make Stan use less syntax using matrices
How to form posterior distributions for functions of parameters

We will use both of these features in psychometric models.

Exercise for today’s class

The simulated data below contains one continuous outcome (y), one continuous predictor (x), and one group variable with two levels (group). The true data-generating parameters are:

Intercept: \(\beta_0 = 10\)
Slope for x: \(\beta_1 = 2\)
Group 2 effect: \(\beta_2 = 5\)
Residual SD: \(\sigma = 3\)

Run the following code to generate the simulated data:

set.seed(42)
N <- 100
x <- rnorm(N, mean = 5, sd = 2)
group <- factor(sample(c(1, 2), N, replace = TRUE))
y <- 10 + 2 * x + 5 * (as.numeric(group) == 2) + rnorm(N, 0, 3)
sim_dat <- data.frame(y = y, x = x, group = group)
head(sim_dat)

         y        x group
1 30.36174 7.741917     2
2 18.08657 3.870604     2
3 24.95402 5.726257     1
4 26.71051 6.265725     2
5 20.21354 5.808537     1
6 15.86074 4.787751     1

Questions

Model 1 (without matrix): Write a Stan model string using individual beta parameters (old syntax). Use default (uniform) priors for all regression coefficients and an exponential(0.1) prior for \(\sigma\). Compile with cmdstan_model(write_stan_file(Model1_String)) and sample from the posterior.
Model 2 (with matrix): Rewrite Model 1 using matrix notation (new syntax) with model.matrix(). Compile with cmdstan_model(write_stan_file(Model2_String)) and sample from the posterior.
Compare the posterior means from both models. Do they recover the true parameters (\(\beta_0=10\), \(\beta_1=2\), \(\beta_2=5\), \(\sigma=3\))?

Answer: Model 1 (without matrix)

Click to see answer: Model 1 Stan string and sampling

Model1_String <- "
data {
  int<lower=0> N;
  vector[N] y;
  vector[N] x;
  vector[N] group2;
}
parameters {
  real beta0;
  real betaX;
  real betaGroup2;
  real<lower=0> sigma;
}
model {
  // default (uniform) priors for regression coefficients
  sigma ~ exponential(0.1);
  y ~ normal(beta0 + betaX * x + betaGroup2 * group2, sigma);
}
"

mod1 <- cmdstan_model(write_stan_file(Model1_String))
data_mod1 <- list(
  N    = nrow(sim_dat),
  y      = sim_dat$y,
  x      = sim_dat$x,
  group2 = as.numeric(sim_dat$group == 2)
)
fit_mod1 <- mod1$sample(
  data             = data_mod1,
  seed             = 42,
  chains           = 4,
  parallel_chains  = 4,
  refresh          = 0
)

Click to see answer: Model 1 summary

fit_mod1$summary(c("beta0", "betaX", "betaGroup2", "sigma"))[, -c(9, 10)]

# A tibble: 4 × 8
  variable    mean median    sd   mad    q5   q95  rhat
  <chr>      <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 beta0       9.73   9.74 0.810 0.817  8.40 11.1   1.00
2 betaX       2.10   2.10 0.138 0.139  1.87  2.33  1.00
3 betaGroup2  4.43   4.41 0.570 0.565  3.51  5.38  1.00
4 sigma       2.84   2.84 0.206 0.204  2.53  3.20  1.00

Answer: Model 2 (with matrix)

Click to see answer: Model 2 Stan string and sampling

Model2_String <- "
data {
  int<lower=0> N;
  int<lower=0> P;
  matrix[N, P] X;
  vector[N] y;
  real sigmaRate;
}
parameters {
  vector[P] beta;
  real<lower=0> sigma;
}
model {
  // default (uniform) priors for regression coefficients
  sigma ~ exponential(sigmaRate);
  y ~ normal(X * beta, sigma);
}
"

mod2 <- cmdstan_model(write_stan_file(Model2_String))
ExFormula <- as.formula("y ~ x + group")
X_sim <- model.matrix(ExFormula, data = sim_dat)
data_mod2 <- list(
  N         = nrow(sim_dat),
  P         = ncol(X_sim),
  X         = X_sim,
  y         = sim_dat$y,
  sigmaRate = 0.1
)
fit_mod2 <- mod2$sample(
  data            = data_mod2,
  seed            = 42,
  chains          = 4,
  parallel_chains = 4,
  refresh         = 0
)

Click to see answer: Model 2 summary

fit_mod2$summary(c("beta", "sigma"))[, -c(9, 10)]

# A tibble: 4 × 8
  variable  mean median    sd   mad    q5   q95  rhat
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 beta[1]   9.73   9.74 0.810 0.812  8.42 11.0   1.00
2 beta[2]   2.10   2.10 0.138 0.136  1.87  2.33  1.00
3 beta[3]   4.42   4.42 0.575 0.573  3.47  5.36  1.00
4 sigma     2.83   2.82 0.207 0.205  2.52  3.19  1.00

Next Class

Bayesian Model fit
Bayesian Model Comparison

	Respondent	DietGroup	HeightIN	WeightLB	HeightIN60
1	1	1	56	140	-4
2	2	1	60	155	0
3	3	1	64	143	4
4	4	1	68	161	8
5	5	1	72	139	12
6	6	1	54	159	-6
25	25	3	70	259	10
26	26	3	52	201	-8
27	27	3	59	228	-1
28	28	3	64	245	4
29	29	3	65	241	5
30	30	3	72	269	12

	Respondent	DietGroup	HeightIN	WeightLB	HeightIN60
1	1	1	56	140	-4
2	2	1	60	155	0
3	3	1	64	143	4
4	4	1	68	161	8
5	5	1	72	139	12
6	6	1	54	159	-6
25	25	3	70	259	10
26	26	3	52	201	-8
27	27	3	59	228	-1
28	28	3	64	245	4
29	29	3	65	241	5
30	30	3	72	269	12

Lecture 04

Today’s Lecture Objectives

In previous class…

Making Stan Code Short and Efficient

Linear Models without Matrices

Path Diagram of the Full Model

Linear Models with Matrices

Linear Models with Matrices (Cont.)

Example: Predicted Values and \(\text{R}^2\)

Vectorize prior distributions

Vectorize prior distributions (Cont.)

Syntax Changes: Data Section

Syntax Changes: Parameters Section

Syntax Changes: Prior Distributions Definition

A little more about exponential distribution

Since we talked about Exponential distribution…

Compare results and computational time

Compare computational time

Computing Functions of Parameters

Example: Need Slope for Diet Group 2

Computing slope for Diet Group 2

Computing slope within Stan

Alternative way of computing the slope with Matrix

Contrasts in Stan

Computing \(\text{R}^2\)

Stan code for Computing \(\text{R}^2\)

Get posterior mode

Wrapping up

Exercise for today’s class

Answer: Model 1 (without matrix)

Answer: Model 2 (with matrix)

Next Class

Making `Stan` Code Short and Efficient

`Stan` code for Computing \(\text{R}^2\)

	Respondent	DietGroup	HeightIN	WeightLB	HeightIN60
1	1	1	56	140	-4
2	2	1	60	155	0
3	3	1	64	143	4
4	4	1	68	161	8
5	5	1	72	139	12
6	6	1	54	159	-6
25	25	3	70	259	10
26	26	3	52	201	-8
27	27	3	59	228	-1
28	28	3	64	245	4
29	29	3	65	241	5
30	30	3	72	269	12