Session1
Introduction
These pages include slides and code to support the first day of our course in experimental design and analysis.
A revision of the slides will be available soon:
The code in these windows can be run directly from your browser, or copied into your own R session.
If you want to use your own R sesion, you’ll need to have packages installed:
ggplot2, see, ggpubr, ggbeeswarm, patchwork and easystats
Worked example
Here we simulate a simple experiment with two groups of mice and a single outcome measure.
We set the ‘true’ average values of our outcome measure, simulate an experiment with our chosen design, and then attempt to estimate the difference between groups using our simulated data.
Using simulations like this can help us to think through experiments, plan analyses and ultimately run sample size calculations.
Simulate the data
Initial analysis
Now we have simulated the data we can run our analysis, in this case a bar chart with error bars combined with a p-value to compare the groups.
What would you conclude?
A better graph and statistics
While this presentation of results (p-value and dynamite plot) is often used, it is extremely limiting. It is more useful to present data point individually (perhaps with summary statistics) and then to make inferences about the estimated mean difference between groups:
It’s interesting to compare the linear model result with the t-test result.
Accuracy and precision
Suppose we conduct a research study to estimate the effect of our supplement on the alpha diversity.
Each instance of our study will produce an estimate for the effect, which is (hopefully!) close to the truth.
Suppose it was possible to repeat the study many times, you would produce a different effect estimate from each one.
The study is unbiased if the mean average value of those repeats is equal to the true value. That is, while there will be some chance variation the estimate will not be systematically too high or too low.
Any bias in the study would cause the study estimate to be systematically wrong in either direction. There are many sources of bias. For example we know that healthier people tend to more willing to participate in research than the population (selection bias), or people who are treated with placebos might report better symptoms than those who are not (plaebo effect). These can lead to systematically wrong conclusions about populations or the efficacy of treatments.
The precision of the study refers to how similar the estimates from repeated runs would be. A precise estimates have small standard errors and narrow confidence intervals. But it is possible to be very precisely wrong (if there is bias), or to miscalculate the precision of your study.
Precision is largely determined by the level of variation between in your experimental units, your measurements, and by the sample size. Increasing sample size and reducing sources of variation will increase precision.
Our goal in experimental design is to answer your research questions with low bias and high precision, so that your estimates are consistently close to the truth! On the whole bias is more problematic than imprecision, because the latter can be calculated, reported, and overcome using larger studies. Bias is much more difficult to detect, and can become compounded by larger samples or repeated studies if the same biases are present each time.
To deal with sources of bias and variance, we have to understand how they arise, and how to reduce or remove their effects on our estimates.
Analysis of a blocked randomised study
Introduction
The code below simulates a parallel group randomised controlled trial conducted in men and women.
There were 40 participants in total, 20 men and 20 women.
There was a blocked randomisation, so that men and women were balanced across groups (10 each per treatment group)
The blocking was performed because we know that men have typically higher responses than women.
Remember the experimental design equation for this study is:
Outcome = Treatment + Sex + Error
Below we show why it is important to respect this equation in our analysis phase as well as our design phase.
Since we have a two group study with a continuous outcome, we might be tempted to apply a two-groups t-test to compare the responses:
While there is some evidence of the difference between groups (that we know is there!), it is not statistically significant. We have type-2 error in this case (false negative). This is caused by the high variation within each treatment group overwhelming the signal from the treatment.
But if we stratify the data by sex, we explain much of that variation. The signal is much more obvious, and is statistically significant in both groups despite there being half the participants in each!
The linear model corresponding to the unpaired t-test is given below. Note the residual error in the output and the standard error for the effect of treatment.
lm(data=dat , y ~ treatment) |> summary()
Next we’ll try the model that corresponds to the design equation. We have explained far more variance (smaller residual error), and so have a m.
lm(data=dat , y ~ treatment + sex) |> summary()
Although this dataset was picked to illustrate the point, we can find the power of the study with each analytical approach by replicating the study many times, and finding what proportion of p-values is less than 0.05.
Here, using the design equation to set up the analysis model leads to a much more powerful design, by explaining the variance associated with the identified factors. It also means that the assumptions underlying the regression model, in particular normal random errors, more likely to be met.
lm(data=dat , y ~ treatment + sex) |> report::report_model()