Author
Affiliation

Jihong Zhang*, Ph.D

Educational Statistics and Research Methods (ESRM) Program*

University of Arkansas

Student Name:

Shawn Wierick

Student Email:

scwieri@uark.edu

Student Evaluation:

  • Q1_SCORE: 2

  • Q1_STUDENT_ANSWER: Setting up the alpha level depends on the decision-making because we need to determine the critical value. To illustrate, when we increase the bar (decrease alpha from 0.05 to 0.01), we control the risk of making a set of mistakes that reject the null hypothesis in conclusion. This situation reduces the chance of rejecting the null hypothesis. At the same time, it is hard to detect the tiny differences. With larger sample sizes, we need a lower alpha because the data provides more power to detect small differences. If the p-value is less than or equal to alpha, we are statistically significant and we reject the null hypothesis. If the p-value is larger than alpha, we fail to reject the null hypothesis.

  • Q1_FEEDBACK: You correctly explain how adjusting alpha affects the likelihood of rejecting the null and mention the impact on difficulty of detecting small differences. However, you did not explicitly mention Type I error or provide its definition. Clearly stating how raising alpha increases the Type I error rate, and defining Type I error, would have made your answer complete.

  • Q2_SCORE: 3

  • Q2_STUDENT_ANSWER: In a scenario of a p-value less than 0.05 but greater than 0.01, such as 0.03. The results with these alpha levels will be significantly different, which will influence the conclusion. For example, a researcher wants to examine the impact of physical activity duration on the academic performance of university students. In the first scenario, the p-value (0.03) is less than the alpha level (0.05), indicating a statistically significant difference in exam scores among the different physical activity groups. The conclusion is that the period of physical activity significantly affects academic performance, with students who engage in 1 hour or more of physical exercise performing better on average than those with less physical activity. The findings highlight the importance of daily physical activity among students for achieving optimal academic outcomes. In the second scenario, the p-value (0.03) is larger than the alpha level (0.01), indicating no statistically significant difference in exam scores among the different physical activity groups. If we are sure about our data, that is enough to decrease the alpha (0.01); we can then say that there is no relationship between physical activity and the academic performance of university students. The conclusion is that the period of physical activity doesn’t significantly affect academic performance at alpha (0.01). The findings fail to reject the null hypothesis. Thus, we cannot conclude that the amount of physical activity affects academic performance at the university level. Finally, the decision of choosing an alpha level of 0.05 is more appropriate. It has been difficult to determine the effectiveness of physical activity on academic performance at an alpha level of 0.01, which makes it hard to reject the null hypothesis.

  • Q2_FEEDBACK: Well explained differences in interpretation at p=0.03 with two alpha levels and included context. You note the arbitrariness of alpha choice and advise care in interpretation. For full marks, be sure to mention that reporting the exact p-value (e.g., p=0.03) is standard when p > .001, and discuss the need for practical as well as statistical significance, but your response demonstrates clear understanding.

  • Q3_SCORE: 4

  • Q3_STUDENT_ANSWER: There are many limitations of the p-value, and it can be problematic, specifically when researchers rely solely on it to reject the null hypothesis. The first reason will lead to binary decision-making, categorizing results as significant or non-significant, which can oversimplify the interpretation and overlook the nuances of the data. Therefore, researchers can report confidence intervals or Bayesian statistics to report the posterior distribution. The second reason is neglecting the effect size because a small effect can produce a small p-value if the sample size is large enough. It is essential to consider and calculate the effect size and the power analysis that enables us to determine the sample size required to detect a significant effect. Another reason is the probability of extremes under the null, since the p-value due to the null hypothesis does not address whether similar extreme data could also occur under the Alternative hypothesis. It can lead to an overemphasis on the null hypothesis and ignore the reasonable explanations for the data. In this case, exploring the theory, finding other explanations, and trying various models is critical.

  • Q3_FEEDBACK: You identified several problems with p-values (binary decision, neglecting effect size, limitations under the null) and mentioned confidence intervals and Bayesian methods as alternatives. You also discuss the importance of effect size and power. For even fuller marks, calling out descriptive statistics/visualizations and specific effect size metrics (e.g., Cohen’s d) would help, but your response demonstrates broad understanding.

  • TOTAL_SCORE: 10

  • OVERALL_COMMENTS: Great job overall! You show strong understanding of hypothesis testing, p-value interpretation, and the need for alternative approaches. For Question 1, aim to explicitly define Type I error. Otherwise, your answers are clear, thoughtful, and well-supported.

Back to top