Skip to content
 

Going beyond confidence intervals

Anders Lamberg writes:

In an article by Tom Sigfried, Science News, July 3 2014, “Scientists’ grasp of confidence intervals doesn’t inspire confidence” you are cited: “Gelman himself makes the point most clearly, though, that a 95 percent probability that a confidence interval contains the mean refers to repeated sampling, not any one individual interval.”

I have some simple questions that I hope you can answer. I am not a statistician but a biologist only with basic education in statistics. My company is working with surveillance of populations of salmon in Norwegian rivers and we have developed methods for counting all individuals in populations. We have moved from using estimates acquired from samples, to actually counting all individuals in the populations. This is possible because the salmon migrate between the ocean and the rivers and often have to pass narrow parts of the rivers where we use underwater video cameras to cover whole cross section. In this way we “see” every individual and can categorize size, sex etc. Another argument for counting all individuals is that our Atlantic salmon populations rarely exceed 3000 individuals (average of approx. 500) in contrast to Pacific salmon populations where numbers are more in the range of 100 000 to more than a million.

In Norway we also have a large salmon farming industry where salmon are held in net pens in the sea. The problem is that these fish, which have been artificially selected for over 10 generations, is a threat to the natural populations if they escape and breed with the wild salmon. There is a concern that the “natural gene pool” will be diluted. That was only a background for my questions, although the nature of the statistical problem is general for all sampling.

Here is the statistical problem: In a breeding population of salmon in a river there may be escapees from the fish farms. It is important to know the proportion of farmed escapees. If it exceed 5 % in a given population, measures should made to reduce the number of farmed salmon in that river. But how can we find the real proportion of farmed salmon in a river? The method used for over 30 years now is a sampling of approximately 60 salmon from each river and counting how many wild and how many farmed salmon you got in that sample. The total population may be 3000 individuals in total.

There is only taken one sample. A point estimate is calculated and a confidence interval for that estimate. In one realistic example we may sample 60 salmon and find that 6 of them are farmed fish. That gives a point estimate of 10 % farmed fish in the population of 3000 in that specific river. The 95% confidence interval will be from approximately 2% to 18%. Most commonly it is only the point estimate that is reported.

When I read your comment in the article cited in the start of this mail, I see that something must be wrong with this sampling procedure. Our confidence interval is linked to the sample and does not necessarily reflect the “real value” that we are interested in. As I see it now our point estimate acquired from only one sample does not give us much at all. We should have repeated the sampling procedure many times to get an estimate that is precise enough to say if we have passed the limit of 5% farmed fish in that population.

Can we use the one sample of 60 salmon in the example to say anything at all about the proportion of farmed salmon in that river? Can we use the point estimate 10%?

We have asked this question to the government, but they reply that it is more likely the real value lies near the 10% point estimate since the confidence has the shape of a normal distribution.

Is this correct?

As I see it the real value does not have to lie within the 95 % confidence interval at all. However, if we increase the sample size close to the population size, we will get a precise estimate. But, what happens when we use small samples and do not repeat?

My reply:

In this case, the confidence intervals seem reasonable enough (under the usual assumption that you are measuring a simple random sample). I suspect the real gains will come from combining estimates from different places and different times. A hierarchical model will allow you to do some smoothing.

Here’s an example. Suppose you sample 60 salmon in the same place each year and the number of farmed fish you see are 7, 9, 7, 6, 5, 8, 7, 2, 8, 7, … These data are consistent with their being a constant proportion of 10% farmed fish (indeed, I created these particular numbers using rbinom(10,60,.1) in R). On the other hand, if the number you see are 8, 12, 9, 5, 3, 11, 8, 0, 11, 9, … then this is evidence for real fluctuations. And of course if you see a series such as 5, 0, 3, 8, 9, 11, 9, 12, …, this is evidence for a trend. So you’d want to go beyond confidence intervals to make use of all that information. There’s actually a lot of work done using Bayesian methods in fisheries which might be helpful here.

36 Comments

  1. L says:

    Talk about data fishing! :)

  2. One question I have is how the 5% threshold was defined. Was there an analysis that took into account the uncertainty, and then said in essence “if in a sample of 60 fish there are more than 5% in the sample, it seems prudent to take steps to reduce farmed fish in the river” (sample variation already taken into account, 5% in the sample taken to be enough evidence for possibly higher levels)

    or, was there some genetic population dynamics model that said “when we put 5% into our model then we have an unacceptable rate of genetic mixing so we should try to keep the population in the river below 5%” (5% is a population level parameter based on theory with no sampling error included, now a decision analysis is needed to balance risk given limited information from a sample)

    Because, in the first case the 5% is chosen as a sample proportion and could be based on the uncertainty already, whereas in the second case it’s the population parameter that is important, and so the uncertainty is not built-in to the decision.

    Though, knowing how this stuff is usually done, my guess is that someone simply set 5% in some rule-book somewhere based on no analysis whatsoever just that it “seemed like a low enough value”.

    • Rahul says:

      You are too harsh. :) I’d rephrase: “Someone, somewhere had to set a threshold & lacking the resources (time, money) to create a model or a detailed analysis used his prior knowledge & intuition to select a number for the threshold”

      • That’s fine, and expert guesses can helpful, but the point is that the existing threshold is maybe not based on much information compared to the information currently being collected, so if you’re going to the trouble to collect huge amounts of information with cameras at narrow points in the river etc, you might as well actually do some analysis and come up with some new science based decision making.

        Normally I don’t, but I’ll plug my little consulting company here http://www.lakelandappliedsciences.com/ because this sounds like something I could really help Andres with and I have various background that is relevant.

    • Dalton says:

      Regarding how the threshold was defined: I can’t speak to Norway, but I can speak to Pacific salmon in the Columbia basin. One paper I see often cited is this one by Mike Ford: https://swfsc.noaa.gov/uploadedFiles/Events/Meetings/Fish_2015/Document/8.2_Ford_2002_Cons_Biol.pdf

      The idea is that fitness in the captive populations is selected differently than fitness in wild populations. Specifically, hatchery fish don’t have to compete for food in a natural river during the juvenile lifestage. All these fish have to do is eat their food pellets, grow nice and fat, and resist diseases that inevitably occur when you crowd a bunch of fish in pens. But as a consequence this fish may be more fit as adults than wild fish because they had the opportunity to grow large as juveniles. So the idea is that they may out-compete wild fish on the spawning ground, but that the resulting offspring will be less fit as juveniles because they do not have the wild genes that help optimize survival as a wild fish.

      In practice though, these thresholds are set somewhat arbitrarily as a result of negotiation in stakeholders meetings (often including federal and state regulators, tribal representatives, and if in a river with a hydroelectric project the dam owner or other landowners).

      • Selective pressures being different is the main thing I’d think of as well. The information they’re collection might actually be useful to evaluate whether the captive bred salmon really are less successful as juveniles, and how much that impacts things in general. Perhaps they are less effective, but then, because of that, they tend to select out of the population anyway, so you can tolerate them. Or, perhaps they are too effective as adults and compete in the oceans for the food, wiping out the natives and then when they come back and lay eggs their offspring die off… so that they tend to push the population towards extinction. Perhaps their mixing in of genes selected for disease resistance actually helps bolster the wild salmon over the medium to long term.

        There’s lots of potentially interesting modeling to do here in terms of iterated seasons of competition and cross-breeding, and with a detailed dataset of fish counts you could do a lot more effective model fitting than just with a few small sampling surveys.

        • Dalton says:

          This is a multi-million dollar business in the Pacific Northwest. Lots of people are looking into just these types of questions and there is already a rich literature. Not to mention the ongoing experiment in using hatcheries to recover natural populations.

          • That doesn’t surprise me at all considering how important both the Salmon fishing and the Hydroelectric power business are in the northwest, but it does seem that the Pacific situation is perhaps quite different from a population where only 500 to 3000 wild salmon are visiting Norway. I went to a salmon hatchery and took a half-day tour outside Eugene Oregon about a decade ago. There they were doing things like intentionally dropping the eggs down a rough sloped bed to try to kill off the weaker embryos and select for stronger juveniles… But there they were actually trying to bolster the natural populations as you say, not produce human food for market.

            We’ve had several fisheries questions on this blog over the last 4 or 5 years, and my impression is that the use of Bayesian fitting of mechanistic models is becoming “a thing” at least among a certain population of researchers, so that’s encouraging.

  3. Anoneuoid says:

    >”There is only taken one sample. A point estimate is calculated and a confidence interval for that estimate. In one realistic example we may sample 60 salmon and find that 6 of them are farmed fish. That gives a point estimate of 10 % farmed fish in the population of 3000 in that specific river. The 95% confidence interval will be from approximately 2% to 18%.”

    There are many approaches to calculating such intervals:
    > require(binom)
    Loading required package: binom
    Warning message:
    package ‘binom’ was built under R version 3.2.5
    > binom.confint(6,60)
    method x n mean lower upper
    1 agresti-coull 6 60 0.1000000 0.04320331 0.2049342
    2 asymptotic 6 60 0.1000000 0.02409092 0.1759091
    3 bayes 6 60 0.1065574 0.03645465 0.1843301
    4 cloglog 6 60 0.1000000 0.04069043 0.1909142
    5 exact 6 60 0.1000000 0.03759127 0.2050577
    6 logit 6 60 0.1000000 0.04562248 0.2052514
    7 probit 6 60 0.1000000 0.04325646 0.1979359
    8 profile 6 60 0.1000000 0.04102707 0.1923304
    9 lrt 6 60 0.1000000 0.04100430 0.1923297
    10 prop.test 6 60 0.1000000 0.04131130 0.2116995
    11 wilson 6 60 0.1000000 0.04664283 0.2014946

    I haven’t investigated the properties of these tests at all, but you can find this paper by searching around in which he says the method you used (apparently called “asymptotic” in the R function above) does not have the nominal coverage:

    “Method 1, the simplest and most widely used, is very anti-conservative on average, with arbitrarily low CP for low h. Indeed, the maximum coverage probability is only 0)959; min DNCP is 0 and min MNCP is 0)0205. In this evaluation with h(0)5, the deÞcient coverage probability stems from right non-coverage; the interval does not extend su¦ciently far to the right, as evidenced by the high frequency of ZWIs and the fact that a large part of the calculated interval may lie beyond the nearer boundary, 0.”
    http://www.ncbi.nlm.nih.gov/pubmed/9595616 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.408.7107&rep=rep1&type=pdf)

    >”We have asked this question to the government, but they reply that it is more likely the real value lies near the 10% point estimate since the confidence has the shape of a normal distribution.”

    The way confidence intervals are calculated means that the coverage (eg 95%) refers to the percent of intervals that are supposed to include the “real” value upon repeatedly sampling and calculating the CIs. They seem to be thinking of Bayesian credible intervals. Many times for simple estimates like this the confidence interval approximates the credible interval using a uniform prior, so this confusion does not cause much practical issue. I am not sure about the case of proportions though.

    • To get a Bayesian probability interval for this problem I’d probably use a beta-binomial model with a fairly strong prior. I mean, you’re pretty sure that there are some escapees, but nowhere near 50% of the fish are farm-escapees right? So you could start with something like a beta(1,7) prior (which has 99.2% probability that the frequency is less than 0.5) and then if you see say 6 out of 60 in your sample, your new probability over the frequency is given by the curve dbeta(x,7,67) and using qbeta(.95,7,67) we’d get that there’s a 95% Bayesian probability that there is less than 15.6% farm escapees

      The prior information is very real here, and most likely you have even more prior information that beta(1,7) expresses. Most likely MUCH more. beta(3,30) sounds like it’d be a reasonable choice based on this kind of background info and the idea that those numbers are about what has been seen in the past.

      • Shravan says:

        Did I miss something here? Why don’t they just count the farm fish to find out how many escaped? Isn’t that under the farmer’s control? It seems crazy to estimate the fish in the wild when you have them in an enclosed space (which is what I assume a farm means).

        • Before moving to Oregon I had never been to a fish hatchery, and I would have thought your question was a very reasonable one. Now, however, I’ve seen vast pools densely filled with jumping, darting fish, and I would shudder to try to count them. But, according to a comment above, the farmed fish may all be tagged, which is amazing, and so a count could in fact be easy to do. However: losing a fish from the farm certainly doesn’t mean it’s in the water in the wild. At least here, I’d guess that the majority of lost fish are eaten by raptors, and so are likely to be found in trees! (The big mystery to me is why the hatcheries aren’t constantly being invaded by swarms of eagles and ospreys…) Fish hatcheries are fascinating, by the way.

      • Anoneuoid says:

        >”using qbeta(.95,7,67) we’d get that there’s a 95% Bayesian probability that there is less than 15.6% farm escapees”

        To compare to the CIs shouldn’t you instead calculate this interval?:
        > qbeta(.025,7,67)
        [1] 0.03942901
        > qbeta(.975,7,67)
        [1] 0.1703611

        Also, shouldn’t we do:
        > pbeta(0.05,7,67)
        [1] 0.07231941

        Gt looks like, given this model/data, the probability the proportion of farmed fish is less than the 5% threshold equals 0.07. This tells quite a different story than the binary “is the cutoff inside the interval or not” method.

        • Once you have the posterior distribution you can answer lots of questions such as those you mention. The point is that the Bayesian model, using both prior data and a probability model that gives a plausibility for a very specific result we are interested in answers the real-world research questions that scientists actually have, such as “how much credence should I give to the idea that there really is less than 5% of escaped fish? answer: 7% chance there is less than 5% escaped fish in the river”

    • elin says:

      You could also bootstrap the CI.

  4. Rahul says:

    Dumb question: Regarding >>>”counting how many wild and how many farmed salmon you got in that sample”<<<

    How does one tell them apart? Phenotype? Or genetic testing?

    Also: Do those 60 end up as dinner or is it catch-and-release?

    • Dalton says:

      Again, I can’t speak to Norway, but for Pacific salmon hatchery fish are typically identified with either an external mark (clipped adipose fin) or an internal mark (coded wire tag or PIT tag).

      Given the imperiled status of wild stocks of Atlantic salmon, I imagine the wild fish are gently released back to the river. Farmed fish are likely euthanized.

      • Rahul says:

        If it is an external mark are the camera-acquired images not good enough to distinguish it without a catch?

        Alternatively are these tags not readable from a distance, at least in those narrow river portions?

        • Dalton says:

          That’s a good point. And if it’s an internal mark, like a cheap PIT tag, you could get an electronic signal of each farmed fish passing upstream. It seems like one solution to the problem would simply be mandating marking of all farmed fish in such a way that they can be enumerated moving upstream either by a PIT detector or by camera identification at the existing enumeration point. Then it’s a question of a census and not a sampling problem. Less invasive than netting for the fish as well.

  5. Dalton says:

    There’s a real argument for not sampling more fish at one time or for not taking more samples. Sampling these fish is likely to somewhat invasive. Either a tangle net or electrofishing. There is bound to be some mortality associated with that.

    Additionally, shouldn’t there be a finite population correction in here?

    • If you’re sampling 60 and see 6, and there really are only around P=500 total population in a given year, then a small finite-sample correction would be in order. Basically you know there are 6 + f*(P-60) escapees where f is your uncertain fraction and P is your uncertain population. But with f on the order of 10% and 60 on the order of 10% of P your finite sample error is on the order of 1% which is a lot less than the sampling error in f, or the estimation error in P even with a camera system you could easily imagine you miss maybe 10% of the fish, so it’s marginal as to whether it matters that much. On the other hand, in a year where you only get P = 150 or 200 fish, 60 starts to be a big deal!

      As to the catching and associated mortality, I think that’s a really good reason to be smart about the data analysis, and try to use your information as efficiently as you can. As Rahul says, looking for clipped fins in the camera data, or including time-series information on fish counts. There is probably a “wave” of fish, they don’t all arrive on the same exact day, and perhaps wild and escaped salmon tend to arrive at different times within the season for example, so instead of capturing 60 fish at once you might be better off capturing 20 fish at 3 different times and using your background information about time-series to get better estimates with the same sample size.

  6. Dalton says:

    One more comment, and then I’ll shut up. But speaking of fishy confidence intervals. The phrase “confidence interval” appear no less than 24 times in the recent ruling by the 9th Circuit Court of Appeals throwing out NOAA’s Biological Opinion on the recovery of endangered and threatened Pacific salmon stocks in the Columbia River. There appears to be a bit of a Bayesian interpretation of these frequentist intervals as well, when the judge states that:

    “In the 2014 BiOp, NOAA Fisheries assumes very specific numerical benefits from habitat improvement. These benefits, however, are too uncertain and do not allow any margin of error. Further, a key measure of survival and recovery employed in the 2014 BiOp already shows a decline, but NOAA Fisheries has discounted this measurement, concluding that it falls within the 2008 BiOp’s “confidence intervals.” Those confidence intervals, however, were so broad, that falling within them is essentially meaningless.”

    http://earthjustice.org/sites/default/files/files/1404%202065%20Opinion%20and%20Order.pdf

    • Simon Gates says:

      That’s the usual way confidence intervals are interpreted i.e. as if they were credible intervals from a posterior probability distribution. I see this all the time in medical things. It’s in the original post too: “…they reply that it is more likely the real value lies near the 10% point estimate since the confidence has the shape of a normal distribution.”
      Why? because that makes sense to people, and it’s really hard for people to get their heads around the idea that a confidence interval is just an interval and doesn’t represent a probability distribution.

  7. Paul says:

    Is there a way or are there approaches to account for the distance between sample size and population size in the calculation of a confidence/credible interval? I mean if you have a population of 100 individuals and you sample 99 individuals by random your interval will have a certain size, although your point estimate will be very close to the true value. Or consider this case: Someone gives you a sample of size 100 and you compute an interval for some parameter of interest. Afterwards you are told that the sample actually was the whole population of interest, so your interval is useless, the point estimate is the true value.

    • jrc says:

      A few no-names (Alberto Abadie, Susan Athey, Guido W. Imbens, Jeffrey Wooldridge) have some recent work on that:

      http://www.cemmap.ac.uk/uploads/220114_Imbens.pdf

      “In this paper we investigate the justification for positive standard errors in cases where the researcher estimates regression functions with data from the entire popu- lation. We take the perspective that the regression function is intended to capture causal effects, and that standard errors can be justified using a generalization of randomization inference. We show that these randomization-based standard errors in some cases agree with the conventional robust standard errors, and in other cases are smaller than the conventional ones.”

      Also, in your toy example of sampling 99 out of 100, you can just use arithmetic. There is a 100% chance that the true population percentage is within 1PP of your estimate! But that isn’t about “causal effects” that is just about estimating proportions in a population (which is closer to the current fish example than the framework above, but I think that paper is an interesting enough thought experiment it was worth passing along).

    • Paul, suppose you’re sampling from a finite population and trying to determine the average of some measurement in the population, for example the age of the population.

      So, you have N people and have sampled n of them. The average age of the n in your sample is x which is observed, and the average age in the rest of the population is X which is not observed.

      Then the average age of the full population is XX = (nx + (N-n)X)/N

      You can use the information in your sample of n people with average age x to get a Bayesian uncertainty interval over X. You may also have uncertainty about the exact size N of the population. You may have information about the approximate size of some biasing factors in your sampling procedure, you may have information about roundoff errors in the calculation of the sample average x, etc etc. Each of these factors can be quantified approximately at least using Bayesian probability intervals. Using software like Stan you can easily calculate a posterior over the overall average XX as defined above, either using X and N as parameters, or treating them as generated quantities based on some other model where there are parameters, like for example that you’re fitting a distributional form to the data, and then generating a plausible sample for the unknown values from the fitted distribution.

      When you’re dealing with explicit sampling from a finite population, resampling/bootstrap procedures can make good sense. There’s a sense in which a bootstrap resample is like a sample from one of the high-probability models for the distribution of the values in the whole population. I think this can be made formal with the appropriate assumptions.

  8. Mark says:

    “We have asked this question to the government, but they reply that it is more likely the real value lies near the 10% point estimate since the confidence has the shape of a normal distribution.

    Is this correct?”

    No, not in the least. In fact, for any single observed confidence interval, there is almost a 1/3 probability that one of the endpoints is closer to the “true” value than is the point estimate. There is no normal distribution within the confidence interval.

Leave a Reply to Paul