It’s all about the salamanders

Jacob Felson writes:

I have a statistics question that may lend itself to multilevel modeling and incomplete data.

A friend of mine did an ecological experiment. He created 12 enclosures in each of 8 different ponds. Four kinds of salamanders were placed in each enclosure. 24 salamanders of three species were placed in each enclosure. The number of salamanders of the fourth species (A. Opacum) was the manipulation, so it was varied: 0, 8, 16, and 24.

Here’s the key: data was collected on the weights and sizes of the A. Opacum salamanders at the beginning of the experiment, and data was collected on the weights, sizes (and of course, number of survivors) of all salamanders at the end of the experiment. However, individual salamanders were not tagged, so one cannot trace the growth, decline or death of individual salamanders. So one can only “match” the input and output data at the level of the enclosure.

My question is: must the analysis proceed at the level of the enclosure? Or might there be a modeling framework that would allow one to use all of the raw data collected at the level of the salamanders themselves, even though they cannot be linked between t1 and t2? One can of course simply take the average weight of salamanders within each enclosure at t1 and t2, and run an ANOVA on that. But might it be profitable to have a model that incorporated all of the data at the salamander level, despite the inability to examine growth of individual salamanders?

My reply: In theory you could fit a latent parameter model in which the identification of which salamander goes with which is considered as missing data. I don’t know if it’s worth the effort, though. Another way to go would be to stick with the enclosure-level analysis but try summaries other than the average weight, if that might be of interest.

1 thought on “It’s all about the salamanders

  1. I've got a similar question, and I wonder if your advice (try a simpler approach?) would apply.

    Take your radon study as an example. Suppose you had a follow-on study to investigate the effectiveness of various treatments A, B, and C for excessive radon in a home. You have a pre-treatment value plus a post-treatment value.

    In the interim, the homeowner applied one or more treatments. In most cases, only one was applied, but, for various reasons, some applied two or all three. That's not related to the results of the earlier ones, for no radon measurements were made except for the pre- and post-treatment measurements (2 in total on each house).

    Now you want to estimate the efficacy of treatments A, B, and C. It's relatively easy if each house only had one treatment applied. Given the exact nature of the experiment, some measurements document the result of one treatment, while others document the effect of 2 or 3 treatments.

    y(obs)[i] is thus either y[i] or sum(y[i])(y[i] in the set of all treatments applied to this house).

    Thoughts?

Comments are closed.