Df Sum Sq Mean Sq F value Pr(>F)
group 1 218.42 218.42 264.1 <2e-16 ***
Residuals 38 31.43 0.83
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-08-18
Analysis of Variance (ANOVA) is a statistical method for comparing means across three or more groups.
When comparing only 2 groups, a t-test is typically used (though mathematically equivalent to ANOVA with 2 groups).
When comparing 3 or more groups, ANOVA using the F-test is the appropriate method to avoid inflating Type I error.
ANOVA compares group means by analyzing variance components.
Two independent variance component estimates:
If the between-group variability is substantially larger than the within-group variability, we conclude that groups differ significantly.
Total Sum of Squares (\(SS_{total}\)): Total variability in the outcome variable across all observations.
Sum of Squares Between (\(SS_{between}\)): Variability attributable to differences between group means (systematic variation).
Sum of Squares Within (\(SS_{within}\)): Variability within groups, representing random error (unsystematic variation).
Assume that we want to examine the effect of one drug on one disease (higher score means severity of the disease symptom). There are two group: one control group (placebo) and one treatment group.
The distributions of scores for two groups are like:
Df Sum Sq Mean Sq F value Pr(>F)
group 1 218.42 218.42 264.1 <2e-16 ***
Residuals 38 31.43 0.83
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation: Residuals represents the within-group variability. group represents the between-group variability. With small within-group variability (SS = 31.43) and clear separation between groups, the F-statistic is large and highly significant. The between-group variance substantially exceeds the within-group variance, providing strong evidence that the groups differ.
Df Sum Sq Mean Sq F value Pr(>F)
group 1 113.4 113.43 5.485 0.0245 *
Residuals 38 785.8 20.68
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation: With larger within-group variability (SS = 785.8), the overlap between groups increases. Even though the between-group variance remains the same, it is now smaller relative to the within-group variance. This results in a smaller F-statistic and reduced statistical power to detect group differences.
F-statistic check whether groups are separated on average, while for each group, samples are clustered within each group.
We will talk more details about this in the next lecture
[1] 7.929343 22.774292 30.844412 -3.456977 24.291247 25.060559 14.252600
[8] 14.533681 14.355480 11.099622
Explanation: The rnorm() function generates random values from a normal distribution. The set.seed() function ensures reproducibility by fixing the random number generation sequence.
Context: This example focuses on whether three different teaching methods (labeled as G1, G2, G3) affect students’ test scores.
In total, 40 students are assigned to three teaching groups and one control group. Each group has 10 students.
set.seed(1234)
# Create dataset with EQUAL variances (SD = 5 for all groups)
data <- data.frame(
group = rep(c("G1", "G2", "G3", "Control"), each = 10),
score = c(rnorm(10, 20, 5), rnorm(10, 25, 5), rnorm(10, 30, 5), rnorm(10, 22, 5))
)
# Create dataset with UNEQUAL variances (SD varies from 0.1 to 10 across groups)
data_unequal <- data.frame(
group = rep(c("G1", "G2", "G3", "Control"), each = 10),
score = c(rnorm(10, 20, 10), rnorm(10, 25, 5), rnorm(10, 30, 1), rnorm(10, 22, .1))
)Note: We create two datasets to demonstrate the importance of homogeneity of variance:
data: Equal variances (homoscedastic) - meets ANOVA assumptionsdata_unequal: Unequal variances (heteroscedastic) - violates ANOVA assumptions Df Sum Sq Mean Sq F value Pr(>F)
group 3 724.1 241.37 11.45 2.04e-05 ***
Residuals 36 759.2 21.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
group 3 1413 470.9 19.14 1.37e-07 ***
Residuals 36 886 24.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparison: - Both ANOVAs show significant F-statistics (p < 0.05), indicating group differences - However, the unequal variance dataset violates ANOVA assumptions, making these results potentially unreliable - For data_unequal, we should use Welch’s ANOVA instead: oneway.test(score ~ group, data = data_unequal)
Interactive Exercise
This exercise uses WebR to run R code directly in your browser. Click “Run Code” to execute the code and see the results. You can also modify the code and re-run it to explore different scenarios!
A psychology researcher is investigating the effectiveness of different study techniques on exam performance. The researcher randomly assigns 60 college students to three different study groups:
Each group contains 20 students. The researcher measures final exam scores (0-100 scale).
Your Task: Complete the code below by filling in the missing sections to conduct ANOVA and extract variance components.
Hints for completing the exercise
aov([Outcome_Name] ~ [Group_Name], data = data_strong) to run the analysissummary() on the ANOVA resultComplete Solution with Full Code:
Interpretation (Click to expand)
Variance Components:
When you run the code above, you’ll observe:
Statistical Results:
The F-statistic will be large and highly significant (p < .001).
Conclusion:
The between-group variability is much larger than the within-group variability. This indicates that study technique has a strong and significant effect on exam performance. The F-statistic is very large and highly significant, providing strong evidence that at least two study techniques produce different mean exam scores.
The groups are well-separated, with minimal overlap. Students within each group perform similarly to each other (small \(SS_{within}\)), but there are substantial differences between the average performance of different study technique groups (large \(SS_{between}\)).
Your Task: Complete the code below by filling in the missing sections to conduct ANOVA and extract variance components.
Hints for completing the exercise
aov(score ~ group, data = data_weak) to run the analysissummary() on the ANOVA resultQuestion to consider: How do the variance components in this scenario compare to Scenario A? What does this tell you about the treatment effect?
Complete Solution with Full Code:
Interpretation (Click to expand)
Variance Components:
When you run the code above, you’ll observe:
\(SS_{between}\) and \(MS_{between}\) and relatively small (variability due to different study techniques)
\(SS_{within}\) and \(MS_{within}\) are substantially larger (variability within each group)
Statistical Results:
The F-statistic will be small and not statistically significant (p > .05).
Conclusion:
The between-group variability is much smaller than the within-group variability. This indicates that study technique has little to no effect on exam performance. The F-statistic is small and not statistically significant, providing insufficient evidence to conclude that study techniques differ in their effectiveness.
There is substantial overlap between groups. The variability in exam scores within each study technique group is large (\(SS_{within}\)), overwhelming any small differences that might exist between the average performance of different groups (\(SS_{between}\)). Individual differences among students (captured by \(SS_{within}\)) are more important than the study technique they used.
Understanding the Ratio \(MS_{between} / MS_{within}\)
Large ratio (Scenario A): When \(MS_{between} \gg MS_{within}\), the treatment effect is strong and detectable. Groups are well-separated with minimal within-group variability.
Small ratio (Scenario B): When \(MS_{between} \ll MS_{within}\), the treatment effect is weak and difficult to detect. Groups overlap substantially due to large within-group variability.
The F-statistic is essentially this ratio (adjusted for degrees of freedom): \[F = \frac{MS_{between}}{MS_{within}} = \frac{SS_{between}/df_{between}}{SS_{within}/df_{within}}\]
Practical implication: To detect treatment effects, researchers should:
Null Hypothesis (\(H_0\)): Groups are same within-group variability.
Equal group:
Bartlett test of homogeneity of variances
data: score by group
Bartlett's K-squared = 2.0115, df = 3, p-value = 0.57
Unequal group:
Bartlett test of homogeneity of variances
data: score by group
Bartlett's K-squared = 82.755, df = 3, p-value < 2.2e-16
Interpretation:
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.1779 0.9107
36
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 3.4749 0.02581 *
36
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Use Levene’s Test (more frequently used) when:
Use Bartlett’s Test when:
aov() Df Sum Sq Mean Sq F value Pr(>F)
group 3 666.6 222.20 7.578 0.000471 ***
Residuals 36 1055.5 29.32
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
Df Sum Sq Mean Sq F value Pr(>F)
group 3 724.1 241.37 11.45 2.04e-05 ***
Residuals 36 759.2 21.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:

ESRM 64503