Experimental Design in Education
Educational Statistics and Research Methods (ESRM) Program*
University of Arkansas
2025-02-24
Class Outline
There are three basic types of experimental research designs:
Pre-experimental designs: no control group
True experimental designs: control group, random assignment
Quasi-experimental designs: control group, but no random assignment. Assignment is often based on pre-existing criteria.
A true experimental design also has different subtypes.
Characterized by the methods of random assignment and random selection.
These designs help control for extraneous variables.
| Group | Treatment | Post-test |
|---|---|---|
| 1 | X | O |
| 2 | O |
Example
A researcher wants to determine if a new reading intervention program improves reading comprehension in second-grade students.
By comparing the pre-test and post-test scores between the two groups, the researcher can determine if the intervention program caused a significant improvement in reading comprehension compared to the standard curriculum.
Example
Imagine a study evaluating the effectiveness of a new diversity and inclusion training program in a company. Researchers are concerned that a pre-test measuring employees’ attitudes might make them more aware of the issues and thus influence their responses on the post-test, regardless of the training’s quality.
By comparing the post-test results across these four groups, the researchers can determine the true effect of the training program, the effect of being pre-tested, and whether the pre-test made employees more or less receptive to the training.
| Group | Treatment | Pre-test | Post-test |
|---|---|---|---|
| 1 | X | O | O |
| 2 | X | O | |
| 3 | O | O | |
| 4 | O |
The researcher manipulates two or more independent variables (factors) simultaneously to observe their effects on the dependent variable.
Features:
Example
An investigation into the factors that cause stress in the workplace seeks to discover the effect of various combinations of background noise and interruptions on employee stress levels.
This is a 3x2 factorial design, which creates 3 * 2 = 6 different experimental conditions or groups:
Participants would be randomly assigned to one of these six conditions. This design allows researchers to answer three key questions: - What is the main effect of background noise on stress? (i.e., does noise level, in general, affect stress?) - What is the main effect of interruptions on stress? (i.e., does the interruption rate, in general, affect stress?) - What is the interaction effect between noise and interruptions? (i.e., does the effect of interruptions on stress depend on the level of background noise? For example, perhaps high interruptions are only stressful when combined with high noise.)
A technique for dealing with nuisance factors (variables that are not of primary interest but may influence the variables of interest).
Features: - Purpose: To minimize the effect of a single, known nuisance variable on the outcome. - Blocking: Participants are first divided into homogeneous groups or “blocks” based on the nuisance variable (e.g., age, gender, IQ). - Randomization: Within each block, participants are randomly assigned to the treatment or control conditions. - Benefit: This design reduces the variability of the data within each block, making it easier to detect the true effect of the treatment.
Example
In a study of college students, we might expect that students are relatively homogeneous with respect to class or year.
| Block | Group | Treatment | Post-test |
|---|---|---|---|
| F. | 1 | X | O |
| F. | 2 | O | |
| Sop. | 1 | X | O |
| Sop. | 2 | O | |
| Ju. | 1 | X | O |
| Ju. | 2 | O | |
| Sen. | 1 | X | O |
| Sen. | 2 | O |
Example
A study aims to test the effectiveness of a new medication for lowering blood pressure. Instead of using different groups for treatment and control, researchers recruit one group of patients.
Because the same participants are measured at multiple points in time, this is a repeated measures design. The key advantage is that it controls for individual differences between participants, making it a very powerful way to detect the effect of the treatment.
When we review experiments with a critical view, one question to ask is “is this study valid?”
Validity is the foundation of trustworthy research. It ensures that the conclusions we draw are accurate and meaningful. Without it, we might be measuring the wrong thing, mistaking correlation for causation, or finding results that don’t apply to the real world.
Why It Matters: Mini Examples
Randomized experiments are often called the “gold standard” of research design, particularly for establishing internal validity. The primary reason is that they are the most effective way to establish a cause-and-effect relationship between a treatment (cause) and an outcome (effect).
By randomly assigning participants to groups, the experimenter creates two or more groups that are statistically equivalent, on average, before the treatment is applied. This process minimizes selection bias and ensures that other potential causes (e.g., age, motivation, prior knowledge) are distributed equally across the groups.
Therefore, if a difference is observed between the groups after the treatment, the researcher can be much more confident that the difference was caused by the treatment and not by some other pre-existing factor.
Randomized experiments allow researchers to scientifically measure the impact of an intervention on a particular outcome of interest (e.g., the effect of intervention methods on performance).
The key to a randomized experimental research design is the random assignment of study subjects:
Randomization has a very specific meaning:
Randomization in this context means that care is taken to ensure that no pattern exists between the assignment of subjects into groups and any characteristics of those subjects.

Note
Note
Concern:
Threats:
Causes of Threats:
Example of a Threat to Internal Validity
Imagine a study measures the effectiveness of a 3-month public health campaign designed to increase recycling.
The conclusion seems to be that the campaign worked. However, during that same 3-month period, a very popular celebrity independently launched their own high-profile “Go Green” initiative. Now, it’s impossible to know if the increase in recycling was due to the health campaign or the celebrity’s influence. This external event is a history threat that compromises the study’s internal validity.
Complete math test in swimsuits
Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had undergraduate students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson et al. 1998). At first, this manipulation might seem silly. When will undergraduate students ever have to complete math tests in their swimsuits outside of this experiment?
Assumption: “This self-objectification is hypothesized to (a) produce body shame, which in turn leads to restrained eating, and (b) consume attention resources, which is manifested in diminished mental performance.”
“Self-objectification increased body shame, which in turn predicted restrained eating.”
Example of Cialdini et al. (2005)
In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests chose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini 2005).
Threats to External Validity:
As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to and that participants encounter every day, often described as mundane realism.
The best approach to minimize this threat is to use a heterogeneous group of settings, people, and times.


ESRM 64503