My 2nd year of grad school I took a fantastic course called Computational Methods in Population Biology (ECL 233), taught by my one of my co-advisors, Sebastian Schreiber and one of my QE committee members, Marissa Baskett. The course was pretty evenly split between ecologists and applied math students, and focused on pretty applied mathematical modeling concepts. As a quantitative ecologist with relatively sparse formal mathematical training, but pretty solid computational/R skills, this course was incredible. Being able to implement models in R meant I could poke and prod the models, changing parameters or investigating intermediate values. This helped me translate the more formal mathematical models into data I could work with like any other. Richard McElreath notes this pedagogical benefit in posterior distribution sampling in Bayesian statistics in his fantastic book, Statistical Rethinking1 I’m not usually one to fawn over academic stuff, but Statistical Rethinking absolutely changed the way I think. Not only did it give me a deep appreciation and intuition for Bayesian data analysis, but it is an absolute pedagogical marvel. If you haven’t read it yet, run, don’t walk, to grab a copy..
Back when I took ECL 233, I was a relatively confident R user, but
nowhere near as competent as I am now. In particular, I’ve come to
embrace a tidier
approach to working with data, trying to
keep things in dataframes/tibbles as much as possible. This has been
remarkably slick while simulating data for the sake of testing out
statistical models or more formal simulation-based calibration (perhaps
the topic of a later blog post). You set up parameters in their own
columns, so each row has a full set of parameters necessary to generate
data, then you store the resulting data in a list-column.
In discrete-time population models, you can’t vectorize everything
since the population size is calculated from the population size at the
last time step, so you’ve gotta use a for
loop at some
point. What this often means is that population simulations are stored
in matrices or arrays; for example, you might have a matrix where each
column corresponds to an r
value for your model, and each
row corresponds to a time step. You then use a for
loop to
generate the time series for each r
value’s column. R is
pretty nice for working with matrices and arrays, and they’ll often be
the fastest/most efficient way to implement big ole models. But in some
cases, it would be really nice to be able to use a more
tidy
approach, for the sake of plotting and organization.
It can be really nice to have all your parameters neatly associated with
all your simulated outcomes in a single tibble. I hate having a bunch of
disparate vectors and matrices floating around for a single model, so
this approach really appeals to me.
To demonstrate this approach, I reformatted one of my ECL 233 homework assignments looking at the Ricker model2 Ricker’s original paper and its ability to generate C H A O S. I’ll show how a tidy approach makes it easy to look at different parameter values, plot our results, and calculate various things about our model.