Lecture 04

Linear Regression Model with Stan II

Jihong Zhang

Educational Statistics and Research Methods

Today’s Lecture Objectives

  1. Explain Stan Syntax for linear regression model

  2. Computing Functions of Model Parameters

    Download R file DietDataExample2.R

In previous class…

R code for data read in
library(cmdstanr)
library(bayesplot)
library(tidyr)
library(dplyr)
library(kableExtra)
color_scheme_set('brightblue')
dat <- read.csv(here::here("teaching", "2024-01-12-syllabus-adv-multivariate-esrm-6553", "Lecture03", "Code", "DietData.csv"))
dat$DietGroup <- factor(dat$DietGroup, levels = 1:3)
dat$HeightIN60 <- dat$HeightIN - 60
kable( rbind(head(dat), tail(dat)) ) |> 
  kable_classic_2() |> 
  kable_styling(full_width = F, font_size = 15)
Respondent DietGroup HeightIN WeightLB HeightIN60
1 1 1 56 140 -4
2 2 1 60 155 0
3 3 1 64 143 4
4 4 1 68 161 8
5 5 1 72 139 12
6 6 1 54 159 -6
25 25 3 70 259 10
26 26 3 52 201 -8
27 27 3 59 228 -1
28 28 3 64 245 4
29 29 3 65 241 5
30 30 3 72 269 12
  1. Introduce the empty model
  2. Example: Post-Diet Weights
    • WeightLB (Dependent Variable): The respondents’ weight in pounds
    • HeightIN: The respondents’ height in inches
    • DietGroup: 1, 2, 3 representing the group to which a respondent was assigned
  3. The empty model has two parameters to be estimated: (1) \(\beta_0\), (2) \(\sigma_e\)
  4. The posterior mean/median of \(\beta_0\) should be mean of WeightLB
  5. The posterior mean/median of \(\sigma_e\) should be sd of WeightLB

Making Stan Code Short and Efficient

The Stan syntax from our previous model was lengthy:

  • A declared variable for each parameter
  • The linear combination of coefficients by multiplying predictors

Stan has built-in features to shorten syntax:

  • Matrices/Vectors
  • Matrix products
  • Multivariate distribution (initially for prior distributions)
  • Built-in Functions (sum() better than +=)

Note: if you are interested in Efficiency tuning in Stan, look at this Charpter for more details.

Linear Models without Matrices

The linear model from our example was:

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

with:

  • \(\text{Group2}_p\) the binary indicator of person \(p\) being in group 2

  • \(\text{Group}3_p\) the binary indicator of person \(p\) being in group 3

  • \(e_p \sim N(0, \sigma_e)\)

Path Diagram of the Full Model

graph LR;
  HeightIN60 --> WeightLB;
  DietGroup2 --> WeightLB;
  DietGroup3 --> WeightLB;
  HeightIN60xDietGroup2 --> WeightLB;
  HeightIN60xDietGroup3 --> WeightLB;
Figure 1

Linear Models with Matrices

Model (predictor) matrix with the size 30 (rows) \(\times\) 6 (columns)

\[ \mathbf{X} = \begin{bmatrix}1 & -4 & 0 & 0 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & 12 & 0 & 1 & 0 & 12 \end{bmatrix} \]

Coefficients vectors with the size 6 (rows) \(\times\) 1 (column):

\[ \mathbf{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \end{bmatrix} \]

model.matrix creates a design (or model) matrix (\(X\)), e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.

FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
model.matrix(FullModelFormula, data = dat) |> unique()
   (Intercept) HeightIN60 DietGroup2 DietGroup3 HeightIN60:DietGroup2
1            1         -4          0          0                     0
2            1          0          0          0                     0
3            1          4          0          0                     0
4            1          8          0          0                     0
5            1         12          0          0                     0
6            1         -6          0          0                     0
7            1          2          0          0                     0
8            1          5          0          0                     0
10           1         10          0          0                     0
11           1         -4          1          0                    -4
12           1          0          1          0                     0
13           1          4          1          0                     4
14           1          8          1          0                     8
15           1         12          1          0                    12
16           1         -6          1          0                    -6
17           1          2          1          0                     2
18           1          5          1          0                     5
20           1         10          1          0                    10
21           1         -6          0          1                     0
22           1         -2          0          1                     0
23           1          2          0          1                     0
24           1          6          0          1                     0
25           1         10          0          1                     0
26           1         -8          0          1                     0
27           1         -1          0          1                     0
28           1          4          0          1                     0
29           1          5          0          1                     0
30           1         12          0          1                     0
   HeightIN60:DietGroup3
1                      0
2                      0
3                      0
4                      0
5                      0
6                      0
7                      0
8                      0
10                     0
11                     0
12                     0
13                     0
14                     0
15                     0
16                     0
17                     0
18                     0
20                     0
21                    -6
22                    -2
23                     2
24                     6
25                    10
26                    -8
27                    -1
28                     4
29                     5
30                    12

Linear Models with Matrices (Cont.)

We then rewrite the equation from

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

to:

\[ \mathbf{WeightLB} = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]

Where:

  • \(\mathbf{WeightLB}\) is the vector of outcome (N \(\times\) 1)

  • \(\mathbf{X}\) is the model (predictor) matrix (N \(\times\) P for P - 1 predictors)

  • \(\boldsymbol{\beta}\) is the coefficients vector (P \(\times\) 1)

  • \(\mathbf{e}\) is the vector for residuals (N \(\times\) 1)

Example: Predicted Values and \(\text{R}^2\)

Similar to Monte Carlo Simulation, given matrices \(P\) and \(\boldsymbol{\beta}\)

Code
set.seed(1234)
fit_lm <- lm(formula = FullModelFormula, data = dat)

beta = coefficients(fit_lm)
P = length(beta)
X = model.matrix(FullModelFormula, data = dat)
head(X%*%beta)
      [,1]
1 149.0639
2 147.5566
3 146.0493
4 144.5419
5 143.0346
6 149.8176

Calculating \(R^2\) and adjusted \(R^2\):

rss = crossprod(dat$WeightLB - X%*%beta) # residual sum of squares
tss = crossprod(dat$WeightLB - mean(dat$WeightLB)) # total sum of squares
R2 = 1 - rss / tss
R2.adjust = 1 - (rss/(nrow(dat)-P)) / (tss/((nrow(dat)-1)))
data.frame(
  R2, # r-square
  R2.adjust # adjusted. r-square
)
         R2 R2.adjust
1 0.9786724 0.9742292

lm function: \(R^2\) and adjusted \(R^2\):

summary_lm <- summary(fit_lm)
summary_lm$r.squared
[1] 0.9786724
summary_lm$adj.r.squared
[1] 0.9742292

Vectorize prior distributions

Previously, we defined a normal distribution for each regression coefficient \[ \beta_0 \sim normal(0, 1) \\ \vdots \\ \beta_p \sim normal(0, 1) \]

  • They are all univariate normal distribution
  • Issue: Each parameter had a prior that was independent of the other parameter; then the correlation between betas is low and cannot be changed.

For example, the code shows two betas with univariate normal distribution have low correlation (r = -0.025)

set.seed(1234)
beta0 = rnorm(100, 0, 1)
beta1 = rnorm(100, 0, 1)
cor(beta0, beta1)
[1] -0.02538285

Vectorize prior distributions (Cont.)

When combining all parameters into a vector, a natural extension is a multivariate normal distribution, so that the betas have a pre-defined correlation strength

  • The syntax shows the two betas generated by the multivariate normal distribution with correlation of .5
set.seed(1234)
sigma_of_betas = matrix(c(1, 0.5, 0.5, 1), ncol = 2)
betas = mvtnorm::rmvnorm(100, mean = c(0, 0), sigma = sigma_of_betas)
beta0 = betas[,1]
beta1 = betas[,2]
cor(beta0, beta1)
[1] 0.5453899

Back to the stan code, we need to specify:

  • Mean vector of betas (meanBeta; size P \(\times\) 1)
    • Put all prior means for those coefficients into a vector
  • Covariance matrix for betas (covBeta; size P \(\times\) P)
    • Put all prior variances into the diagonal; zeros for off diagonal; ’cause we are not sure the potential correlation between betas

Syntax Changes: Data Section

  • Old syntax without matrix:
data{
    int<lower=0> N;
    vector[N] weightLB;
    vector[N] height60IN;
    vector[N] group2;
    vector[N] group3;
    vector[N] heightXgroup2;
    vector[N] heightXgroup3;
}
  • New syntax with matrix:
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: rate parameter for residual standard deviation
}

Syntax Changes: Parameters Section

  • Old syntax without matrix:
parameters {
  real beta0;
  real betaHeight;
  real betaGroup2;
  real betaGroup3;
  real betaHxG2;
  real betaHxG3;
  real<lower=0> sigma;
}
  • New syntax with matrix:
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}

Syntax Changes: Prior Distributions Definition

  • Old syntax without matrix:
model {
  beta0 ~ normal(0,100);
  betaHeight ~ normal(0,100);
  betaGroup2 ~ normal(0,100);
  betaGroup3 ~ normal(0,100);
  betaHxG2 ~ normal(0,100);
  betaHxG3 ~ normal(0,100);
  sigma ~ exponential(.1); // prior for sigma
  weightLB ~ normal(
    beta0 + betaHeight * height60IN + betaGroup2 * group2 + 
    betaGroup3*group3 + betaHxG2*heightXgroup2 +
    betaHxG3*heightXgroup3, sigma);
}

New syntax with matrix:

  • multi_normal() is the multivariate normal sampling in Stan, similar to rmvnorm() in R; For uninformative, we did not need to specify
  • exponential() is the exponential distribution sampling in Stan, similar to rexp() in R
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}

A little more about exponential distribution

  • The mean of the exp. distribution is \(\frac{1}{\lambda}\), where \(\lambda\) is called rate parameter
  • The variance of the exp. distribution is \(\frac{1}{\lambda^2}\)
  • It is typically positive skewed (skewness is 2)
  • Question: which hyperparameter rate \(\lambda\) is most informative/uninformative
library(tidyr)
library(dplyr)
library(ggplot2)
rate_list = seq(0.1, 1, 0.2)
pdf_points = sapply(rate_list, \(x) dexp(seq(0, 20, 0.01), x)) |> as.data.frame()
colnames(pdf_points) <- rate_list
pdf_points$x = seq(0, 20, 0.01)
pdf_points %>% 
  pivot_longer(-x, values_to = 'y') %>% 
  mutate(
    sigmaRate = factor(name, levels = rate_list)
    ) %>% 
  ggplot() +
  geom_path(aes(x = x, y = y, color = sigmaRate, group = sigmaRate), size = 1.2) +
  scale_x_continuous(limits = c(0, 20)) +
  labs(x = "Sigma")
Figure 2: PDF for the exponential distribution by varied rate parameters

Since we talked about Exponential distribution…

Let’s dive deeper into Laplace distribution. It is sometimes called double-exponential distribution. Exponential distribution is positive part of Laplace distribution.

\[ \text{PDF}_{exp.} = \lambda e^{-\lambda x} \]

\[ \text{PDF}_{laplace} = \frac{1}{2b} e^{-\frac{|x - u|}{b}} \]

  • Thus, we know that for x > 0, exponential distribution is a special case of Laplace distribution with scale parameter \(b\) as \(\frac{1}{\lambda}\) and location parameter as 0.

  • Laplace-based distribution, Cauchy, and Horseshoe distribution all belong to so-called “shrinkage” priors.

  • Shrinkage priors will be very useful for high-dimensional data (say P = 1000) and variable selection
library(LaplacesDemon)
b_list = 1 / rate_list * 2
pdf_points = sapply(b_list, \(x) dlaplace(seq(-20, 20, 0.01), scale = x, location = 0)) |> as.data.frame()
colnames(pdf_points) <- round(b_list, 2)
pdf_points$x = seq(-20, 20, 0.01)
pdf_points %>% 
  pivot_longer(-x, values_to = 'y') %>% 
  mutate(
    scale = factor(name, levels = round(b_list, 2))
    ) %>% 
  ggplot() +
  geom_path(aes(x = x, y = y, color = scale, group = scale), size = 1.2) +
  scale_x_continuous(limits = c(-20, 20)) +
  labs(x = "")

Compare results and computational time

Code
mod_full_old <- cmdstan_model("Code/FullModel_Old.stan")
data_full_old <- list(
  N = nrow(dat),
  weightLB = dat$WeightLB,
  height60IN = dat$HeightIN60,
  group2 = as.numeric(dat$DietGroup == 2),
  group3 = as.numeric(dat$DietGroup == 3),
  heightXgroup2 = as.numeric(dat$DietGroup == 2) * dat$HeightIN60,
  heightXgroup3 = as.numeric(dat$DietGroup == 3) * dat$HeightIN60
)
fit_full_old <- mod_full_old$sample(
  data = data_full_old,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_old$summary()[, -c(9, 10)]
# A tibble: 8 × 8
  variable      mean  median    sd   mad     q5     q95  rhat
  <chr>        <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__       -76.6   -76.2   2.13  1.95  -80.6  -73.8    1.01
2 beta0      148.    148.    3.20  3.13  142.   153.     1.00
3 betaHeight  -0.364  -0.363 0.489 0.470  -1.16   0.446  1.00
4 betaGroup2 -24.2   -24.2   4.54  4.42  -31.6  -16.8    1.00
5 betaGroup3  81.4    81.4   4.24  4.16   74.5   88.5    1.00
6 betaHxG2     2.47    2.46  0.701 0.670   1.34   3.61   1.00
7 betaHxG3     3.53    3.54  0.643 0.624   2.45   4.57   1.00
8 sigma        8.25    8.12  1.22  1.17    6.48  10.5    1.00
Code
mod_full_new <- cmdstan_model("Code/FullModel_New.stan")
FullModelFormula = as.formula("WeightLB ~ HeightIN60 + DietGroup + HeightIN60*DietGroup")
X = model.matrix(FullModelFormula, data = dat)
data_full_new <- list(
  N = nrow(dat),
  P = ncol(X),
  X = X, 
  weightLB = dat$WeightLB,
  sigmaRate = 0.1
)
fit_full_new <- mod_full_new$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4
)
fit_full_new$summary()[, -c(9, 10)]
# A tibble: 8 × 8
  variable    mean  median    sd   mad     q5     q95  rhat
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
1 lp__     -76.6   -76.2   2.16  1.95  -80.8  -73.8    1.00
2 beta[1]  148.    148.    3.19  3.08  142.   153.     1.00
3 beta[2]   -0.379  -0.372 0.495 0.471  -1.20   0.405  1.00
4 beta[3]  -24.3   -24.3   4.51  4.37  -31.8  -16.8    1.00
5 beta[4]   81.2    81.2   4.27  4.18   74.4   88.1    1.00
6 beta[5]    2.48    2.49  0.707 0.687   1.36   3.62   1.00
7 beta[6]    3.57    3.56  0.660 0.633   2.49   4.64   1.00
8 sigma      8.24    8.09  1.25  1.18    6.50  10.5    1.00

The differences between two method:

  • betaGroup3 has the largest differences between two methods
cbind(fit_full_old$summary()[,1], fit_full_old$summary()[, -c(1, 9, 10)] - fit_full_new$summary()[, -c(1, 9, 10)])
    variable         mean     median           sd          mad         q5
1       lp__  0.015618675 -0.0550000 -0.023326021 -0.004892580  0.2645100
2      beta0 -0.112772250 -0.0700000  0.001166413  0.045960600 -0.1153500
3 betaHeight  0.015533644  0.0089945 -0.005855640 -0.001316475  0.0423085
4 betaGroup2  0.136471188  0.0847000  0.029303048  0.048406890  0.2121050
5 betaGroup3  0.176476525  0.1883500 -0.028798191 -0.024462900  0.1808050
6   betaHxG2 -0.017625369 -0.0331000 -0.006077692 -0.016871988 -0.0219895
7   betaHxG3 -0.042820269 -0.0212450 -0.016735330 -0.009325554 -0.0386880
8      sigma  0.004710577  0.0330500 -0.029157760 -0.011460498 -0.0182685
         q95          rhat
1 -0.0272600  0.0042598650
2 -0.3451500 -0.0007086869
3  0.0408371 -0.0006499651
4  0.0097750 -0.0009373594
5  0.3091200  0.0002512441
6 -0.0121660  0.0001402769
7 -0.0733165  0.0001885706
8 -0.0385700  0.0017376955

Compare computational time

  • The Stan code with matrix has faster computation:
fit_full_old$time()
$total
[1] 0.2092421

$chains
  chain_id warmup sampling total
1        1  0.046    0.043 0.089
2        2  0.049    0.039 0.088
3        3  0.051    0.042 0.093
4        4  0.053    0.041 0.094
fit_full_new$time()
$total
[1] 0.1789031

$chains
  chain_id warmup sampling total
1        1  0.030    0.028 0.058
2        2  0.033    0.031 0.064
3        3  0.032    0.031 0.063
4        4  0.034    0.028 0.062
  • Pros: With matrices, there is less syntax to write

    • Model is equivalent
    • More efficient for sampling (sample from matrix space)
    • More flexible: modify matrix elements in R instead of individual parameters in Stan
  • Cons: Output, however, is not labeled with respect to parameters

    • May have to label output

Computing Functions of Parameters

  • Often, we need to compute some linear or non-linear function of parameters in a linear model

    • Missing effects - beta for diet group 2 and 3

    • Model fit indices: \(R^2\)

    • Transformed effects - residual variance \(\sigma^2\)

  • In non-Bayesian (frequentist) analyses, there are often formed with the point estimates of parameters (with standard errors - second derivative of likelihood function)

  • For Bayesian analyses, however, we seek to build the posterior distribution for any function of parameters

    • This means applying the function to all posterior samples
    • It is especially useful when you want to propose your new statistic

Example: Need Slope for Diet Group 2

Recall our model:

\[ \text{WeightLB}_p = \beta_0 + \beta_1 \text{HeightIN}_p + \beta_2 \text{Group2}_p + \beta_3\text{Group3}_p \\ +\beta_4 \text{HeightIN}_p\text{Group2}_p \\ +\beta_5 \text{HeightIN}_p\text{Group3}_p \\ + e_p \]

Here, \(\beta_1\) denotes the average change in \(\text{WeightLB}\) with one-unit increase in \(\text{HeightIN}\) for members in the reference group— Diet Group 1.

Question: What about the slope for members in Diet Group 2.

  • Typically, we can calculate by hand by assign \(\text{Group2}\) as 1 and all effects regarding \(\text{HeightIN}\):

    \[ \beta_{\text{group2}}*\text{HeightIN} = (\beta_1 + \beta_4*1 + \beta_5*0)*\text{HeightIN} \]

    \[ \beta_{\text{group2}}= \beta_1 +\beta_4 \]

  • Similarly, the intercept for Group2 - the average mean of \(\text{WeightLB}\) is \(\beta_0 + \beta_2\).

Computing slope for Diet Group 2

Our task: Create posterior distribution for Diet Group 2

  • We must do so for each iteration we’ve kept from our MCMC chain

  • A somewhat tedious way to do this is after using Stan

fit_full_new$summary()
# A tibble: 8 × 10
  variable    mean  median    sd   mad     q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 lp__     -76.6   -76.2   2.16  1.95  -80.8  -73.8    1.00    1368.    1812.
2 beta[1]  148.    148.    3.19  3.08  142.   153.     1.00    1346.    1830.
3 beta[2]   -0.379  -0.372 0.495 0.471  -1.20   0.405  1.00    1506.    1877.
4 beta[3]  -24.3   -24.3   4.51  4.37  -31.8  -16.8    1.00    1600.    2171.
5 beta[4]   81.2    81.2   4.27  4.18   74.4   88.1    1.00    1669.    1945.
6 beta[5]    2.48    2.49  0.707 0.687   1.36   3.62   1.00    1803.    2270.
7 beta[6]    3.57    3.56  0.660 0.633   2.49   4.64   1.00    1770.    1698.
8 sigma      8.24    8.09  1.25  1.18    6.50  10.5    1.00    2206.    2616.
beta_group2 <- fit_full_new$draws("beta[2]")  + fit_full_new$draws("beta[5]")
summary(beta_group2)
# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 beta[2]   2.10   2.11 0.492 0.473  1.31  2.92  1.00    4049.    3289.

Computing slope within Stan

Stan can compute these values for us-with the “generated quantities” section of the syntax

Stan code
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  real slopeG2;
  slopeG2 = beta[2] + beta[5];
}

The generated quantities block computes values that do not affect the posterior distributions of the parameters–they are computed after the sampling from each iteration

  • The values are then added to the Stan object and can be seen in the summary

    • They can also be plotted using bayesplot package
mod_full_compute <- cmdstan_model("Code/FullModel_compute.stan")
fit_full_compute <- mod_full_compute$sample(
  data = data_full_new,
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_compute$summary('slopeG2')
# A tibble: 1 × 10
  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 slopeG2   2.10   2.11 0.492 0.473  1.31  2.92  1.00    4049.    3289.
bayesplot::mcmc_dens_chains(fit_full_compute$draws('slopeG2'))

Alternative way of computing the slope with Matrix

This is a little more complicated but more flexible method.

That is, we can make use of matrix operation and form a contrast matrix

  • Contrasts are linear combinations of parameters

    • You may have used these in R using glht package

For use, we form a contrast matrix that is size of \(C \times P\) where C is the number of contrasts:

  • The entries of this matrix are the values that multiplying the coefficients

    • For \((\beta_1 + \beta_2)\) this would be:

      • a “1” in the corresponding entry for \(\beta_1\)

      • a “1” in the corresponding entry for \(\beta_4\)

      • “0”s elsewhere

    • \[ \mathbf{C} = \begin{bmatrix} 0\ \mathbf{1}\ 0\ 0\ \mathbf{1}\ 0 \end{bmatrix} \]

  • Then, the contrast matrix is multiplied by the coefficients vector to form the values:

    • \(\mathbf{C} * \beta\)

Contrasts in Stan

Stan code
data{
  int<lower=0> N;         // number of observations
  int<lower=0> P;         // number of predictors (plus column for intercept)
  matrix[N, P] X;         // model.matrix() from R 
  vector[N] weightLB;     // outcome
  real sigmaRate;         // hyperparameter: prior rate parameter for residual standard deviation
  int<lower=0> nContrasts;
  matrix[nContrasts, P] contrast; // C matrix 
}
parameters {
  vector[P] beta;         // vector of coefficients for Beta
  real<lower=0> sigma;    // residual standard deviation
}
model {
  sigma ~ exponential(sigmaRate);         // prior for sigma
  weightLB ~ normal(X*beta, sigma);       // linear model
}
generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
}
R code
mod_full_contrast <- cmdstan_model("Code/FullModel_contrast.stan")
contrast_dat <- list(
  nContrasts = 2,
  contrast = matrix(
    c(0,1,0,0,1,0, # slope for diet group2
      1,0,1,0,0,0),# intercept for diet group 2
    nrow = 2, byrow = TRUE
  )
)
fit_full_contrast <- mod_full_contrast$sample(
  data = c(data_full_new, contrast_dat),
  seed = 1234,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
fit_full_contrast$summary('computedEffects')[, -c(9, 10)]
# A tibble: 2 × 8
  variable             mean median    sd   mad     q5    q95  rhat
  <chr>               <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
1 computedEffects[1]   2.10   2.11 0.492 0.473   1.31   2.92  1.00
2 computedEffects[2] 123.   123.   3.19  3.05  118.   129.    1.00
bayesplot::mcmc_hist(fit_full_contrast$draws('computedEffects'))

Computing \(\text{R}^2\)

We can use the generated quantities section to build a posterior distribution for \(\text{R}^2\)

There are several formulas for \(\text{R}^2\), we will use the following:

\[ \text{R}^2 = 1-\frac{RSS}{TSS} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)} \] Where:

  1. RSS is the residual sum of squares
  2. TSS is the total sum of squares of dependent variable
  3. \(\hat{y}_p\) is the predicted values: \(\hat{y}_p = \mathbf{X}\boldsymbol{\beta}\)
  4. \(\bar{y}_p\) is the mean value of dependent variable: \(\bar{y}_p = \frac{\Sigma_{p=1}^{N}y_p}{N}\)

Notice: RSS depends on sampled parameters, so we will use this to build our posterior distribution for \(\text{R}^2\)

For adjusted \(\text{R}^2\), we use the following:

\[ \text{adj.R}^2 = 1-\frac{RSS/(N-P)}{TSS/(N-1)} = 1- \frac{\Sigma_{p=1}^{N}(y_p -\hat{y}_p)}{\Sigma_{p=1}^{N}(y_p -\bar{y}_p)}*\frac{N-P}{N-1} \]

Then, we can calculate the how to calculate \(\text{adj.R}^2\) by \(\text{R}^2\):

\[ \text{adj.R}^2 = 1-(1-\text{R}^2)*\frac{N-P}{N-1} = \frac{(P-1)+(N-1)R^2}{N-P} \]

Stan code for Computing \(\text{R}^2\)

Stan code
generated quantities{
  vector[nContrasts] computedEffects;
  computedEffects = contrast*beta;
  // compute R2
  real rss;
  real tss;
  real R2;
  real R2adj;
  {// anything in these brackets will not appear in summary table
    vector[N] pred = X*beta;
    rss = dot_self(weightLB-pred); // dot_self is stan function for matrix square
    tss = dot_self(weightLB-mean(weightLB));
  }
  R2 = 1-rss/tss;
  R2adj = 1-(rss/(N-P))/(tss/(N-1));
}

Recall that our lm function provides \(\text{R}^2\) as 0.9787 and adjusted \(\text{R}^2\) as 0.9742

fit_full_contrast$summary(c('rss', 'tss', 'R2','R2adj'))[, -c(9, 10)]
# A tibble: 4 × 8
  variable      mean    median        sd       mad        q5       q95  rhat
  <chr>        <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
1 rss       1938.     1867.    287.      223.       1614.     2501.     1.00
2 tss      71112     71112       0         0       71112     71112     NA   
3 R2           0.973     0.974   0.00404   0.00313     0.965     0.977  1.00
4 R2adj        0.967     0.968   0.00488   0.00378     0.957     0.973  1.00
bayesplot::mcmc_hist(fit_full_contrast$draws(c('R2', 'R2adj')))

Get posterior mode

# Create the function.
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Calculate the mode using the user function.
getmode(fit_full_contrast$draws('R2'))
[1] 0.974221
getmode(fit_full_contrast$draws('R2adj'))
[1] 0.96885

Wrapping up

Today we further added generated quantities into our Bayesian toolset:

  • How to make Stan use less syntax using matrices
  • How to form posterior distributions for functions of parameters

We will use both of these features in psychometric models.

Next Class

  1. Bayesian Model fit
  2. Bayesian Model Comparison