Important definitions

Null hypothesis: the statistical hypothesis that there is no (important) difference between experimental treatment and control in relation to the primary endpoint
Alternative hypothesis: the statistical hypothesis that there is a difference between experimental treatment and control in relation to the primary endpoint (the alternative hypothesis may be directional and hypothesise that the different is positive or negative, or it may be non-directional and hypothesise merely that there is a difference)
Sampling distribution: is the probability distribution of expected results expected assuming a particular hypothesis about the effect size is true (e.g. the null hypothesis), all the assumptions associated with the statistical model are true, and the trial is conducted as planned.
Type I error (\(\alpha\)): is the pre-test probability of rejecting the null hypothesis when the null hypothesis is true. It is usually set at 0.05.
Type II error (\(\beta\)): \(\beta\) is the pre-test probability of accepting the null hypothesis when the alternative hypothesis is true.
Power \((1-\beta)\): is the pre-study probability that the study will produce a statistically significant result for a given sample size and postulated effect size
\(p\) value: a measure of the compatibility of the observed data with the data that would be expected if the null hypothesis was true when all other statistical and methodological assumptions are met
Confidence interval: is the range of values that is considered more compatible with the observed data assuming the statistical and methodological assumptions of the study are met. A 95% confidence interval provides the range of values for which a test of an effect size within the range against the observed data would provide a \(p\) value \(> 0.05\).

Interpretation

Standard hypothesis tests are set up in such a way that if the observed data falls into the \(\alpha\) region, then observing such a result is more likely if the alternative hypothesis is true compared to if the null hypothesis is true. This is an important aspect of the justification for rejecting the null hypothesis and accepting the alternative hypothesis.

Assuming the outcome is a continuous variable that is normally distributed, the following figure illustrates the set-up of a hypothesis test.

A hypothesis test is set up in such a way as to avoid Type I and Type II errors. Type I errors occur when you reject the null hypothesis when the null hypothesis is true. Type II errors occur when you fail to reject the null hypothesis despite the null hypothesis being false.

One way to avoid type I and type II errors is to ensure the statistical test is adequately powered. A suitable rule of thumb is that \(\beta \approx 0.2\) (which is equivalent to a power of 80%).

In an overpowered test, it may be the case that a result in the \(\alpha\) region is more likely if the null hypothesis is true rather than if the alternative hypothesis is true.

In an underpowered test, it may be the case that a result in the \(\alpha\) region is not much more likely if the alternative hypothesis is true compared if the null hypothesis is true. In this situation, the result falling in the \(\alpha\) region may not be a reliable indicator that the null is false.

Setting up a hypothesis test

If you are doing a hypothesis test, you need to think about power.¹ The power of the test is considered before collecting any data, but after you have specified your research question and identified an appropriate statistical test. At this point I am going to assume that you have already selected an appropriate statistical test. You now need to determine how big you study needs to be to be able to reliably answer your research question.

The information you need will depend on the specific statistical test, but here are some general principles. The statistical power of the test depends on:

Sample size: the larger the sample, the higher the power.

You also need to think about loss to follow-up, drop-outs and non-adherence to the study.
Estimated effect size: this is the effect size you are trying to detect (perhaps the difference between two means). The smaller the estimated effect size, the lower the power

This might be determined by existing evidence, or it might be the “minimally clinically important difference”.
Variability in the sample: the more variability in the sample, the lower the power of the test
Predetermined \(\alpha\), \(\beta\): standard values are \(\alpha = 0.05\), \(\beta = 0.2\), but if you change these, you will change the power of the test (obviously, I hope)

The best way to get across this is with examples—which we do later in the module. For now it is enough to appreciate the determinates of statistical power.

Keep in mind that not all statistical analyses are hypothesis tests.↩︎

Introduction

Important definitions

Interpretation

Setting up a hypothesis test