In September 2023 I taught a week-long course on statistical workflow at the Nelson Mandela African Institution of Science and Technology (NM-AIST), a public postgraduate research university in Arusha, Tanzania established in 2009.
The course was hosted by Dean Professor Ernest Rashid Mbega and the Africa Centre for Research, Agricultural Advancement, Teaching Excellence and Sustainability (CREATES) through the Leader Professor Hulda Swai and Manager Rose Mosha.
Our case study was an experiment on the NM-AIST campus designed and implemented by Dr Arjun Potter and Charles Luchagula to study the effects of drought, fire, and herbivory on growth of various acacia tree species. The focus was pre-data workflow steps, i.e. experimental design. The goal for the week was to learn some shared statistical language so that scientists can work with statisticians on their research.
Together with Arjun and Charles, with input from Drs Emmanuel Mpolya, Anna Treydte, Andrew Gelman, Michael Betancourt, Avi Feller, Daphna Harel, and Joe Blitzstein, I created course materials full of activities. We asked participants to hand-draw the experimental design and their priors, working together with their teammates. We also did some pencil-and-paper math and some coding in R.
Course participants were students and staff from across NM-AIST. Over the five days, between 15 and 25 participants attended on a given day.
Using the participants’ ecological expertise, we built a model to tell a mathematical story of how acacia tree height could vary by drought, fire, herbivory, species, and plot location. We simulated parameters and data from this model, e.g. beta_fire = rnorm(n = 1, mean = -2, sd = 1) then simulated_data …= rnorm(n, beta_0 + beta_fire*Fire +… beta_block[Block], sd_tree). We then fit the model to the simulated data.
Due to difficulty in manipulating fire, fire was assigned at the block-level, whereas drought and herbivory were assigned at the sub-block level. We saw how this reduced precision in estimating the effect of fire:
We redid the simulation assuming a smaller block effect and saw improved precision. This confirmed the researcher’s intuitions that they need to work hard to reduce the block-to-block differences.
To keep the focus on concepts not code, we only simulated once from the model. A full design analysis would include many simulations from the model. In Section 16.6 of ROS they fix one value for the parameters and simulate multiple datasets. In Gelman and Carlin (2014) they consider a range of plausible parameters using prior information. Betancourt’s workflow simulates parameters from the prior.
Our course evaluation survey was completed by 14 participants. When asked “which parts of the class were most helpful to you to understand the concepts?”, respondents chose instructor explanations, drawings, and activities as more helpful than the R code. However, participants also expressed eagerness to learn R and to analyze the real data in our next course.
The hand-drawn course materials and activities were inspired by Brendan Leonard’s illustrations in Bears Don’t Care About Your Problems and I Hate Running and You Can Too. Brendan wrote me,
I kind of think hand-drawing stuff makes it more fun and also maybe less intimidating?
I agree.
More recently, I have been reading Introduction to Modern Causal Inference by Alejandro Schuler and Mark van der Laan, who say
It’s easy to feel like you don’t belong or aren’t good enough to participate…
yup.
To deal with that problem, the voice we use throughout this book is informal and decidedly nonacademic…Figures are hand-drawn and cartoonish.
I’m excited to return to NM-AIST to continue the workflow steps with the data that Dr Arjun Potter and Charles Luchagula have been collecting. With the real data, we can ask: is our model realistic enough to achieve our scientific goals ?