## Fish cannot carry p-values

Following up on our discussion from last week on inference for fisheries, Anders Lamberg writes:

Since I first sent you the question, there has been a debate here too.

In the discussion you send, there is a debate both about the actual sampling (the mathematics) and about more the practical/biological issues. How accurate can farmed be separate from wild fish, is the 5 % farmed fish limit correct etc… There is constantly acquired new data on this first type of question. I am not worried about that, because there is an actual process going on that makes methods better.

However, it is the discussion of the second question, use of statistics and models, that until recently, have not been discussed properly. Here a lot of biologists have used the concept “confidence interval” without really understanding what it means. I gave you an example of sampling 60 salmon in a population of 3000. There are a lot of examples where the sample size have been as low as 10 individuals. The problem has been how to interpret the uncertainty. Here is a constructed (but not far from realistic example) example:

Population size is 3000. From different samples you could hypothetically get these three results:

1) You sample 10, get 1 farmed fish. This gives 10 % farmed fish
2) You sample 30, get 3 farmed fish. This gives 10 % farmed fish
3) You sample 60, get 6 farmed fish. This gives 10 % farmed fish

All surveys show the same result, but they are dramatically different when you have to draw a conclusion.

When reporting the sampling (current practice) it is the point estimate 10 % that is the main reported result. Sometimes the confidence interval with upper and lower limits is also reported, but not discussed. Since there is only one sample drawn from the populations, not discussing the uncertainty with such small samples can lead to wrong conclusions. In most projects a typical biologist is reporting, the results are a part of a hypothetical deductive research process. The new thing with the farmed salmon surveys, is that the results are measured against a defined limit : 5 %. If the point estimate is above 5 %, it means millions in costs (actually billions) for the industry. On the other hand, if the observed point estimate is below 5 % the uncertainty could affect he wild salmon populations . This could result in a long term disaster for the wild salmon.

With the risk of being viciously tabloid: The biologists (and I am one of them) have suddenly come into a situation where their reports have direct consequences. The question about the farmed salmon frequencies in the wild populations have become a political question in Norway – at the highest level. Suddenly we have to really discuss uncertainty in our data. I do not say that all biologists have been ignorant, but I suspect that a lot of publications have not and do not address uncertainty with respect.

The last months more mathematical expertise here in Norway have been involved in the “farmed salmon question” presented. The conclusion so far is that you cannot use the point estimate. You have to view the confidence interval as a test of a hypothesis:

H: The level of farmed salmon is over 5 %

If the 95 % confidence interval has an upper limit that contains the value 5 % or higher, you have to start measures. If the point estimate for example is 1 % but the upper limit in the 95 % confidence interval is 6 %, we must start the job to remove farmed salmon from that population. The problem with this and the fact that the confidence interval from almost all the surveys will contain the critical value of 5 % (although the point estimate is much lower), is that in most populations you cannot reject the hypothesis. The reason for all intervals containing the critical value, is the small sample sizes.

To use this kind of sampling procedure your sample size should exceed about 200 salmon to give a result that will the fish farming industry fair treatment. On the other hand, small sample sizes and large confidence intervals will always be a benefit for the wild salmon. I would like that on behalf of nature, but we biologists will then not be a relevant as experts that give advice in the society as a whole.

Then there are a lot of practical implications linked to the minimum sample size of 200. Since the sample is done by rod catch, some salmon will die due to the sampling procedure. But the most serious problem with the sampling is that several new reports now show that the farmed fish will more frequently take the bait. It is shown that the catchability of farmed salmon is from 3 to 10 times higher than that of wild salmon. This will vary so you cannot put in a constant factor in the calculations.

The solution so far seems to use other methods to acquire the samples. Snorkeling in the rivers performed by trained persons, show that over 85 % of the farmed fish is correctly classified. Since a snorkeling survey involves from 80 to 100 % of the population, the only significant error is the wrong classification, which is a small error compared to the uncertainty of small sample procedures.

Thanks again for showing interest in this question. The research institutions in Norway have not been that positive to even discuss the theme. I suspect that has to do with money. Fish farmers have focus on growth and money but sadly, but so far I guess the researchers involved to monitor environmental impacts see that a crises give more money for research. Therefore it is important to have the discussion free of all questions about money. Here in Norway I miss that kind of approach you have to the topic. The discussions and development and testing of new hypothesis is the reason why we became biologists? It is the closest you come to be a criminal investigator. We did not want to become politicians.

My general comment is to remove the whole “hypothesis” thing. It’s an estimation problem. You’re trying to estimate the level of farmed fish, which varies over time and across locations. And you have some decisions to make. I see zero benefit, and much harm, to framing this as a hypothesis testing problem.

Wald and those other guys from the 1940s were brilliant, doing statistics and operations research in real time during a real-life war. But the framework they were using was improvised, it was rickety, and in the many decades since, people keep trying to adapt it in inappropriate settings. Time to attack inference and decision problems directly, instead of tying yourself into knots with hypotheses and confidence intervals and upper limits and all the rest.

1. BenK says:

Can they remove other problems as well, by including more in the model? They are, after all, not trying to estimate the level of farmed fish, but instead the population genetic impact on the wild fish. Unless they model start to finish, piece-wise optimization and thresholding will end up causing trouble.

The model they construct could directly estimate goals of the program: perhaps that key alleles in the farmed population cannot reach fixation in the wild population on an indefinite time horizon, or something like that, whatever it may be.

2. Jack PQ says:

Your last comment on Wald sounds like it could be almost directly applied to JM Keynes in economics. The difference is Keynes’s ideas are not used today for economic research, but they are, strangely, for economic *policy*. This disconnect is frustrating.

“Keynes and those other guys from the 1940s were brilliant, doing economics in real time during a real-life recession, post-recession turmoil, and war. But the framework they were using was improvised, it was rickety, and in the many decades since, people keep trying to adapt it in inappropriate policy settings. Time to attack macroeconomic problems directly…”

• Andrew says:

Jack:

Could be, I have no idea. Let’s see if any macroeconomists are reading deeply enough into the comment thread to express their opinions here.

To get back to Wald, etc.: I respect the depth of the insights of these brilliant researchers of previous eras, and I think their insights and some of their methods can remain valuable. The problem comes when modern-day researchers get stuck in their frameworks.

3. Nick Menzies says:

“My general comment is to remove the whole “hypothesis” thing. It’s an estimation problem”

I would frame this as a decision problem. Without considering the consequences of the decision, isn’t any decision rule developed based only on the qualities of the estimator arbitrary? In science we generally separate estimation from decision analysis, and that seems reasonable when the people doing the estimation are not aware of the decisions that their estimation will affect. But here it seems that the relevant decision is fairly clear, and the only reason to collect this information is to inform this decision. Clearly, valuing the consequences of different choices is hard (i.e. requires knowing how threat to the natural populations changes as a function of the value being estimated (fraction non-native), the cost and impact of any control measures you would impose, and the cost you would place on losing that healthy population). But it seems that this is a situation where it could be done, even if the outcome is to develop a simple decision rule that could be applied routinely (ie don’t attempt the full decision analysis each time).

A benefit of this approach is that you can also ask value-of-information questions: what is the benefit of larger sample sizes, and how does this vary by x, y, z?

• Andrew says:

Nick:

From a Bayesian perspective, I consider inference to be an intermediate step in the decision problem.

• Nick Menzies says:

I agree with that (indeed, I find applied decision analysis inherently Bayesian, at least in the contexts I am familiar with). My point is that our opinions about the best way to do estimation should derive from how these ways improve decision-making. Separating out and considering this intermediate step on its own seems a 2nd-best to considering the estimation approach as just one of the variables that can be chosen as part of the decision analysis (where this includes sample size, timing etc as well as how data are analyzed and interpreted).

For practical reasons it is infeasible attempt a full decision analysis every time we do research to estimate a quantity (what is the value of better estimates of the population growth rate? Let me count the ways…), but here the decision problem seems fairly well specified.

• Andrew says:

Nick:

Yes. And, beyond that, thinking about the relevant decision problems can be helpful in the inference stage, to give a sense of what features in the model are important. For example, if you know a variable is going to be used in a later decision, you’ll want to include its coefficient in the model, even if it’s not statistically significant. Conversely, a variable that is not relevant to decisions could be excluded from the analysis (possibly for reasons of cost, convenience, or stability), in which case you’d interpret inferences as implicitly averaging over some distribution of that variable.

Seems like this estimation and comparison against a 5% standard for compliance can be handled by using confidence intervals (e.g., 95%) and thinking about where that interval lies with respect to 5% in an equivalence/inequivalence type of analysis where both a liberal, benefit-of-doubt (minimizing fish-farm producer risk) and a conservative, fail-safe (minimizing wild salmon population risk) approach are addressed. If the latter is deemed more important, then those estimated confidence intervals should be in a region <= 5%, which may be very hard to achieve unless sample sizes are large. The benefits of the equivalence/inequivalence approach is that it forces people to recognize that it is difficult to specify such policy criteria for compliance that minimizes risk simultaneously to both ends. So someone will have to decide which risk is more important at this time for this policy issue. It also forces people to reevaluate where that compliance criteria (5% in this case) came from and whether anyone was really thinking about variation (both in terms of impacts to salmon populations and in terms of sampling for monitoring compliance) when it was specified. And yes, Bayesian credible intervals could be used for the equivalence/inequivalence approach if so desired.

5. Rahul says:

>>>Therefore it is important to have the discussion free of all questions about money.<<<

Is that realistic? Consider fishing or fish farms or river dams. All of those involve decisions about money.

• More to the point, in actual fact, the right way to do this is to make it ALL about money. It’s shocking how bad regulators are at coming up with this stuff, but the *right* way to do it is VERY straightforward.

First, we define a sampling scheme and a model for the sampling, as well as parameters that describe the population, say the total size of the population and the fraction that are escaped farm-raised. At this stage you can think about different methods of sampling, such as snorkel, pole catch, multi-location vs single location, use of prior year data, time series, etc and come up with a reasonable Bayesian model for a posterior simultaneously over the total population and the escaped fraction, and any other relevant decision variables not under your control. Note that some of the variables under your control may be ones that control the sampling/model choice.

Now, you define a COST function that maps from the decision variables both those not under your control (such as total population, and fraction of escaped fish), as well as those under your control (such as amounts of money spent on different programs to change the status quo) and maps these variables to societal dollar loss. This is not easy, but it involves considering things like the cost to society of the loss of natural resources, the cost to farmers of putting in extra safeguards, the cost to society of fishing for and killing escaped fish in the rivers, the value generated for society by farming, blablabla. You will ultimately have to think carefully here, and get input from all the stakeholders.

Next, you take the posterior distribution over the decision parameters after doing your survey, and you consider different options for what to do in terms of decisions you can make to change the variables under your control (ie. amount of money you will spend to kill escaped fish, amount of money you will require farmers to spend to improve their pens, amount of money you will spend to improve habitat for the wild fish… etc)

Now, you seek out a decision, defined as a set-point of all the variables under your control, averaging over the uncertain variables from your survey, that minimized the expected cost where the expectation is *taken with respect to the posterior probability distribution of the uncertain decision variables you estimated from data*. Typically you spend some computer time to seek out a decent solution even if it’s not guaranteed to be a global minimum cost solution.

You can fight politically all you want about what the factors should be that go into the cost function, and to a lesser extent, how we should do the sampling and modeling, but we really should be able to agree that THIS IS THE RIGHT WAY TO DECIDE WHAT TO DO.

I’m not sure why Andrew’s so down on Wald, because Wald’s “essentially complete class theorem” essentially says that if you seek out a solution in the set of Bayesian decision problems of the type described above, you will have a solution that is not dominated (that is, there is no other method that is ALWAYS better). So basically he proves a theorem that tells you “don’t bother looking elsewhere, work on doing a good job of your Bayesian decision as described above”.

I think where Andrew gets his animosity towards that old-school 1950’s stuff is where they took this as an excuse to then define point-estimators using things that have no real-world cost functions associated to them (like quadratic loss or 0/1 loss or whatever). I take those as just textbook examples of how to carry out a bayesian decision like those textbook examples of how to calculate the trajectory of balls dropped off ladders… it’s just an excuse to get you used to the math. The problem of getting “stuck in the framework” is really the problem of getting “stuck” in the re-use of these non-real-world textbook proxies for real cost functions.

• Dale Lehman says:

If only it were so simple… No, I don’t think we can all agree that “THIS IS THE RIGHT WAY TO DECIDE WHAT TO DO.”

• Well, we do have a mathematical theorem that says that every other way is dominated or no better (ie. there will be a Bayesian method that gives at least as good or better decisions every time)

• Rahul says:

The fact that an existence theorem says that the optimal strategy belongs to Class-foo shouldn’t, in general, be a reason to reject particular strategies of Class-not-foo, right?

• I think it does. I mean, decisions regarding things like fisheries or locations for building nuclear power plants or the choice of assumed loads to design bridges to carry or who to give mammograms to … are just FAR TOO IMPORTANT to leave to ad-hockery, especially when the history of ad-hockery shows that it’s not just “not optimal” but pretty often actively bad.

What it means for a decision rule to be dominated is that there are rules that work better *in every instance* of whatever actually turns out to be the case. As Corey pointed out the Bayesian rules are an “essentially complete class” because there are some non-bayesian rules that might do as well as a bayesian rule… but my guess is this is like saying “there are some planets in the universe that are just as good as earth for sustaining human life” technically perhaps true, but practically totally useless because none of them are reachable ever.

• Rahul says:

I meant it in this narrow sense: I can have two classes of strategies Class-A & Class-B & an existence theorem that says Class-A dominates Class-B but yet we keep using Class-B strategies in practice because we do not have a way to *know* what that particular dominant Class-A strategy is for a specific problem although we can be sure of the *existence* of such a dominant strategy.

i.e. An existence theorem doesn’t have to outline the specific example.

i.e. “there will be a Bayesian method that gives at least as good or better decisions every time” but do we always know what that particular method is?

• Well, we have also that the higher the true value of the parameter is in the posterior distribution, the more weight the decision gives to that parameter value, so, when the inference is good (that is, it puts the real parameter values high in the posterior distribution) then the decision will be good (it will primarily take into account values of the parameters near the true value)

So, it’s not just an existence theorem, but also, it naturally fits with the logic of Bayesian inference. It’s not like “gee if you use this whacked out prior that puts weight on exactly 3 units away from the true value on either side it turns out strangely to be a really good rule, better than the one that has a spike right on top of the real value!” or something like that.

• Olav says:

Well, the fact that a decision procedure dominates another one does not mean it’s “better” in any absolute sense; it just means it has lower frequentist risk across all possible parameter values.

• https://en.wikipedia.org/wiki/Dominating_decision_rule

The outcome of a decision rule that dominates another decision rule will *always* be better or equal no matter what the values of the unknown quantities turn out to be. Strict dominance means it will always be strictly better.

No frequentist anything is required here. We could have a one-time decision to make (which technological solution should we use to deflect an asteroid? Which of 100 different investments should we put ALL of our money in… etc).

If we use a Bayesian rule, we are guaranteed that there is no other rule which will dominate it. That is, there is no other rule which would have been guaranteed to make things better (or at least as good) for us no matter what happened.

In practice, in most decision cases I’ve seen, NHST based threshold rules repeatedly produce terrible decisions and it is easy to make things better by using a very moderately informed bayesian decision rule (that is, you could assign simple cost functions that are really only very approximate, and very mildly informative priors, and then still do way better). When your posterior distribution over parameters is mildly peaked, errors in your cost function at asymptotic values etc just don’t matter because they’re clamped to zero by the tails of the posterior distribution.

• Olav says:

From the Wiki page you linked: “In decision theory, a decision rule is said to dominate another if the performance of the former is sometimes better, and never worse, than that of the latter.”

The performance criterion here is frequentist risk, i.e. expected loss when averaged over all possible data sets. But this is just one performance criterion you may use. For example, under squared loss, the sample mean dominates the sample median, and so the sample mean is a “better” estimator than the median in that sense. However, the sample median is less sensitive to outliers and is therefore better in the sense of being more robust to sample variability.

• No, I don’t think there’s anything Frequentist involved here. Suppose there is actually just ONE decision to be made ever.

Suppose that there is an unknown quantity in the world x, and there is an outcome y which depends on the true but unknown value of x as well as the decision we make, and a cost function c(y,d), and some decision variables d (such as budgets to be allocated). Now we have two decision rules D1,D2 that using x help us decide what to do with d.

now, suppose D1 is dominated by D2. This means, regardless of what x turns out to be, c(y,d=D1(x)) [greater than or equal to] c(y,d=D2(x))

That is, it doesn’t matter what the real value of x is in the world, the decisions we make will always be better or equal (lower or equal cost) under decision rule D2.

• Note, I fully acknowledge that just because a decision rule DB is a bayesian decision rule doesn’t mean that it DOES dominate some other, say NHST based rule DF. However, if DF is an NHST based frequentist threshold type rule, then there DOES EXIST a Bayesian rule that dominates or is at least no worse than it, and if we use our best available information to build our cost functions and priors and models and soforth, we are searching within the right class of decision rules for one that uses *the best of our knowledge*.

It’s trivial to see that many Frequentist rules are utter nonsense. For example, suppose your rule is “choose a sample sufficient enough to reduce the uncertainty of the fraction of escaped fish down to +- 2% and then if the 95% confidence interval contains values over 5% charge D dollars to the farming industry to take action Foo”

This kind of rule could easily wind up requiring you to take a sample of pole-caught fish that is say 700 fish, with a 20% of them dying after being caught, so that you wind up killing 140 wild fish out of a total population of say 1000, and after 10 years of doing what you’re required by law to do, you’ve extinctified the population just from the ridiculous requirements of your survey…

• Corey says:

Is there anything frequentist here? Well, Olav is right to point out that risk functions are expectations over all possible data sets given a specific parameter value; I personally consider such averages frequentist. The thing that makes the Complete Class Theorem neat is that it shows that under this frequentist criterion the Pareto-optimal frontier of decisions rules is (essentially) the set of Bayes decision rules.

In particular, when you write:

[S]uppose D1 is dominated by D2. This means, regardless of what x turns out to be, c(y,d=D1(x)) [greater than or equal to] c(y,d=D2(x))

you’ve misstated what it means for one decision rule to dominate another. The correct statement is:

Suppose D1 is dominated by D2. This means, regardless of what x turns out to be, E(c(y,d=D1(x))) is greater than or equal to E(c(y,d=D2(x))) with the inequality strict for at least one x and the expectation taken w.r.t the sampling distribution of y given x.

• Thanks Corey, I agree, I had taken the expectation to be implicitly built in to the c (I was being sloppy) and I misunderstood what measure was being used. But I have a question, and since you’ve probably remembered this better than I have (I took a bayesian decision theory course maybe 2011?) you may be able to enlighten us. Let me try to be extremely explicit here:

From a Bayesian standpoint, there are two possible ways to think about things. Suppose that there’s some data x’ which informs us to some extent about the true x. Then we have two possibilities:

D1 dominates D2 when:

E(c(y,D1(x’)), p(y|x)) ≤ E(c(y,D2(x’)),p(y|x))

and for the frequentist p(y|x) is taken to be a fact about the world, whereas for a Bayesian it’s taken to be a fact about our knowledge about y if we were told x

vs

D1 dominates D2 when:

E(c(y,D1(x’)), p(y|x’)) ≤ E(c(y,D2(x’)),p(y|x’))

a Bayesian will form p(y|x’) as p(y|x)p(x|x’) whereas a frequentist will deny the existence of p(x|x’) holding that x is a fact about the world and so has no sampling distribution.

doesn’t Wald’s theorem say basically that in order to find a D1 in the dominant class the Frequentist “essentially” needs to invent a function q(x|x’) and pretend that it isn’t a probability distribution, and then try to minimize the expected cost

in order for D1 to be non-dominated, it “essentially” has to be D1 == minimize expected cost E(c(y,x),p(y|x)q(x|x’)) and that it will turn out that q has all the properties of a bayesian p(x|x’)

That’s my recollection of the main point of Wald’s theorem.

• Also, it’s a bit confusing in what I’m trying to google up about how we’re taking the expectation, but let’s assume it’s the first case as you say,

E(c(y,x),p(y|x))

this need not have any frequentist content, that is, there can still be exactly one outcome y that will ever be observed. (Let’s say “y = hurricane next tuesday at location L sufficient to damage a boat currently harbored at location L by y dollars”) whereas say “x” is “a vector of values that describes the current state of hurricane foo”

and p(y|x) is just our bayesian knowledge about the time-evolution of the hurricane between now and next tuesday if we were given the current hurricane state vector exactly.

For someone like me, who takes a strong physicists view that the state of the world is essentially some ginormous vector S and it evolves in time according essentially to a totally unknown but in principle logically knowable lagrangian L(S), there simply does not exist a p(y|x) in terms of “the true frequency under random sampling” but there sure does exist a bayesian p(y|x) in terms of “for all I know what my very limited view of S and L(S) tells me”

• Corey says:

doesn’t Wald’s theorem say basically that in order to find a D1 in the dominant class the Frequentist “essentially” needs to invent a function q(x|x’) and pretend that it isn’t a probability distribution, and then try to minimize the expected cost

Yup. Here’s Jaynes:

If a sampling theorist will think his estimation problems through to the end, he will find himself obliged to use the Bayesian mathematical algorithm, even if his ideology still leads him to reject the Bayesian rationale for it. (PTLOS, pg 415)

This is overstating the case a bit; Aris Spanos for one has decided that loss functions (and hence risk functions) have nothing to do with inference — a theoretically Pareto-optimal way of making decisions apparently has no philosophical implications at all for the warrant data give to various statistical hypotheses. And a sampling theorist can always refuse to think his estimation problems through to the end, like Larry Wasserman, a former Bayesian(!) who went full behaviorist (although he hates that label) and loves him some risk functions.

• Nick Menzies says:

I am very sympathetic to this view as a pragmatic approach to policy evaluation, though to conclude that this this is the only right way to proceed presumes a social welfare function defined over final outcomes. A reasonable person might place value on concepts of fairness etc defined in terms of the process by which outcomes are achieved, rather than just the final outcomes of the process. Of course, you could then say that we should place a monetary value on the process features that we value, but then the whole process starts to explode.

• I think the only problem really with “process” is the extent to which cost functions accurately approximate some kind of fair average tradeoff in values across the population. If you and I agree that C(x) is a good societal cost function for x being the population of wild fish, and we then impose that onto regulation, as long as most people in the population kind of agree with it in broad strokes, no one is going to really have a problem with the fact that it was arrived at by a process that ignored input from everyone else in the world… it’s only when a bunch of people are ignored or marginalized AND they have a different opinion on what things ought to look like, that we run into a problem.

so, I don’t think it makes sense to put monetary values on “the process” of deriving cost functions to implement in policy, but I do think these kinds of tradeoffs need transparent negotiations etc. I very much dislike things that go along the lines of “all us powerful politicians all got together behind closed doors and decided that our new “trans-pacific-fishing-policy” will be to decimate the population of grey whales as fast as possible” or whatever. But this kind of politics is already being done, and it’s being done in the context of highly sub-optimal classes of decision making. Like “if there is more than 5% of escaped fish then do X” or like here in CA “no-one shall ever hunt a mountain lion again”.

Well that made sense for like the first 10 years or so, and then populations came back, and now a couple of miles away in a dense suburb, there are three mountain lions that come to someone’s front lawn and lie down there to hang out each night in full view of their web-cam… um, maybe that’s really not an optimal outcome after hundreds of pet dogs are eaten and several children, hikers, joggers, etc get mauled. I mean wouldn’t it make more sense to say sell a mountain lion hunt tag for \$50,000 and then use the money to say buy 10 acres of wetlands in a critical area?

So, yeah, I agree with you that we want good processes, but I think that’s true whether we’re using the process to define a cost function, or using the process to define a bunch of if-then-else rules that will be obsolete and highly sub-optimal in 5 to 10 years anyway.

• Nick Menzies says:

It seems like this is a discussion of how to operationalize values. For what you write, it would seem hypothetically possible to go through some process — transparent negotiation, maybe voting, etc — to derive a social welfare function which operationalizes our shared values. Arrow’s impossibility theorem shows that a function with desirable features is not possible at a technical level (unless we can make interpersonal utility comparisons), as well as being about the best PhD dissertation ever. However, I would agree that one could possible work out a welfare function that works pretty well, to the extent that it would be a useful input into policy making. A good example is in health in countries like the UK — health interventions which demonstrate a ratio of health benefits to costs that fall above a particular value are much more likely to receive public funding.

But my point is not about how the social welfare function is constructed, but more that people care about more than just the final distribution of outcomes, the care about the process by which those outcomes were achieved. As a crude example, if we both have a fatal disease and there is only one tablet available to cure it, I would likely feel very different about the fairness of things if you quickly grab the pill and swallow it, as opposed to us both agreeing to flip a coin for it, then you winning and swallowing the pill. In both scenarios you end up with the pill and I end up expecting to die so the final outcomes are the same, yet the processes through which these outcomes were realized are very different, and it matters. This may seem like a trivial example, but in the fishing example the application of any decision rule will create winners and losers, so I would not be surprised if concerns for procedural justice play a role in how that rule is created.

• I’ll buy pretty much all of that. You don’t need to work hard to convince me that the methods by which we come up with a reasonable social welfare function for the decision makes a difference as to how well the whole thing will work (ie. preventing violent conflict etc).

• Actually, the one thing I don’t buy is that Arrow’s impossibility theorem applies. Score or Range voting is a deterministic voting system with non-imposition, non-dictatorship, monotonicity, and independence of irrelevant alternatives. The reason it doesn’t violate Arrow’s theorem, is that Arrow’s theorem only applies to ordinal voting systems (ones where you express purely preferences of one thing over another). The cardinality of the numbers in Score/Range voting express strength of preference.

I think building social utility functions using score voting should work pretty well.

6. Dale Lehman says:

Sorry, for some reason I got cut off just when starting to explain. I have no quarrels with your proposed statistical methodology, but it is the economic methodology that is not so innocent. To economists, costs can only be measured in terms of maximum willingness to pay or minimum willingness to accept (depending on the assignment of property rights). This is a very limited view of “value” when speaking of effects on wild species or native people’s livelihoods (or any similar “intangible”). Yes, you can invent a measure of cost that applies to these, but the rules of economics limit what can be included in such an estimation. For example, there is no such thing as “citizenship” value (unless measured by willingness to pay). Similarly, if you believe that wild species have some innate right to exist and not be threatened, then there is no singular way to value threats to this. Of course you can measure this cost in a number of ways, but again, economics only permits some types of value to be counted.

So, I would not agree that this is the right way to decide what to do. (for further reference, there is an excellent – controversial to economists of course – book by Mark Sagoff, The Economy of the Earth, that delves extensively into the philosophical underpinnings of the economics approach).

• Note, the Bayesian decision rule doesn’t say anything about HOW to create the cost function. And while economists might have opinions about what is considered a “valid” cost function, as far as I know there is no mathematical theorem that says that economists are always right ;-)

The nice thing about requiring people to come up with a cost function to plug into is that we can separate *how to make a decision* from *what the values are that different people have*, and it then makes the value tradeoffs very plain.

This is also why it almost never happens that we make decisions this way. People like to have ulterior motives that they don’t disclose.

• Dale Lehman says:

I agree with this but I think you need to be very careful to point out that the cost function may not be a valid economic cost function. Too many economists (I am one and I stopped doing environmental economics long ago due to these limitations) simply assert that dollars are only a metric to measure values and that any values can be measured that way. But while dollars are a metric, economics does not permit values to be measured in any ways someone pleases. So, I think you are using “cost function” in a mathematical sense while I see “cost function” in an economic sense. I just want the difference to be very clear.

• I think you may be saying that there is *no such thing* as a “collective” cost function, or that different people have different ideas about what is “admissible” in a collective cost function… As far as I see, this is really what politics is all about, in negotiating between parties about what are acceptable tradeoffs at a collective level, so it’s not a pure economic question. There really is no right answer as to what every component of the cost function should be, any more than there is a right answer as to what the price of a Picasso painting “should be”. There are however quite a few components of a cost function that are plain market prices, so they at least are fairly uncontroversial (like for example the square foot cost of a certain kind of net, or the market wage for fishery biologists to do surveys etc)

• Dale Lehman says:

You have put this very well. In fact, if you read Sagoff, that is precisely his point – these decisions are political and politics (not the kind we see in the news) involves articulating differing sets of values and making collective decisions in the face of these. Economists bypass this completely by reducing the cost function to a mechanical exercise of estimating what a market price would look like for things that are not traded in markets (they have gotten quite creative in doing so, but this does not mean their method is correct – Sagoff makes the point that economists’ belief in economic efficiency is merely another set of values and muses about how much economists would be willing to pay to have economic efficiency).

I will point out that Sagoff’s arguments were roundly rejected by economists. Their rejections, in my opinion, were superficial and/or incorrect.

• Thanks Dale, not being an Economist I don’t really have an opinion on the state of what Economists think about the things that fall under their technical jargon “cost function” but I don’t think we need to be beholden to their jargon.

It’s like “significance” in statistics, a poor choice of words.

The cost function I’m referring to consists of sums of terms that represent actual market prices, estimated market prices for non-traded goods (what economists love apparently), and politically negotiated effective prices for non-traded goods (the political component that apparently economists hate).

The fact that there is no actual living person who would be willing to pay \$1 Trillion 2016 USD to own and breed a healthy breeding pair of Dodo birds doesn’t logically restrict us from making decisions as if there were.

• Also I do think with many Environmental issues, there is a legitimate concern with the asymptotic behavior of these functions. What is the “loss” from having Dodo bird populations drop to zero? It’s an irreversible thing (Jurassic Park notwithstanding) and so there is definitely a temptation to make these asymptotes very steep, but we can’t make them infinite, because if there’s a nonzero probability of the population of icelandic Eyjafjallajokull beetles going to zero, the decision problem is undefined. Even the “cost” of total extinction of humans can’t go to infinity technically because typical models aren’t going to assign zero probability to those scenarios, so we wind up having to assign some very large number, such as 1 Trillion dollars per capita times the population of the earth. Now, I don’t know about you, but I might be “willing to pay” \$1 Trillion to prevent total extinction (that is, really I’m willing to accept decisions based on the idea that total extinction costs \$1 Trillion per person given that it probably is multiplied by a very small probability), but I wouldn’t be “able to actually pay”. So if total extinction has more than a trivially small probability then we’ll have to negotiate harder to set that dollar amount politically to a more reasonable value.

The nice thing is, with good decision making, we should stay well away from the scenarios where “very bad” things happen, so that they have very small probabilities associated to them, and therefore, errors in our estimates of the function value at these critical asymptotic points really turn out not to matter anyway. Maybe we’re all paying an extra \$4 a year to keep White Rhinos from going extinct than we would if we spent a lot more time negotiating politically and taking polls and figuring out what people really cared about exactly… but ultimately so-what, all that negotiations and poll taking costs more than \$4/yr/person anyway.

On the other hand, if we choose a NHST based “did we exceed a threshold of 5% or not? If so, bring on the bill for \$100 Billion Dollars (say that last bit in the accent of Mike Meyers as Dr Evil) type decision making, we can easily wind up extinctifying large swaths of the oceans or whatever because the decisions are so terrible and fail to really take into account what actually matters to people.

7. GabbyD says:

“If the point estimate for example is 1 % but the upper limit in the 95 % confidence interval is 6 %, we must start the job to remove farmed salmon from that population.”

I’m not sure what the problem is: if the estimate is 1 and the upper limit is 6, then the lower limit is -4, which is clearly impossible, therefore, the conclusion is simply that you dont have enough information. you need to sample more.

What might be a tolerable range of uncertainty for a problem like this?

• Suppose the effect of sampling “enough” is that you in the process kill all (or a large number) of the fish you’re trying to protect…

8. Just to give an example. Suppose we somehow decide that the important parameters for decision making are N the total river population of fish, and F the fraction of the total population that are hatchery escaped.

Furthermore, suppose we ignore stuff about how easy it is to catch hatchery fish and soforth (Obviously, we’ll do better if we take this stuff into account). We assume in catching n fish if k of them are hatchery fish, then the proportion of hatchery fish in the river has posterior dbeta(k+3,(n-k)+97) using an informed prior of dbeta(3,97) because we know that currently there are not an enormous fraction of hatchery fish in the river.

Now, how bad is it to have N total fish and NF of them are hatchery escaped? Let’s call how bad it is to have 3000 fish and 3% hatchery escaped 1 badness unit (this is really a rescaled dimensionless ratio of dollars for example). What we’ll do is spend C(N,F) * CurrentBudget on efforts to reduce hatchery escapees and increase wild populations with the money going to some regulatory agency to be spent according to recommendations of fisheries biologists etc.

Let’s just build a really simplistic cost function to see how it’d do:

C(N,F) = (5/(4*N/3000+1))^3 * exp((F-.03)/.06);

some example values:

C(3000,.03) = 1 ;; an assumed current value
C(1000,.03) = 9.8 ;; get serious when things hit 1/3 the total current population
C(100,.03) = 86;; spend a *lot* when there are only 50 breeding pairs in the wild, hopefully we never get here, because we’re making good decisions

now, how about based on fraction escaped with constant total population:

C(3000,.05) = 1.4
C(3000,.10) = 3.2
C(3000,.15) = 7.4

Obviously, we’re just making up these numbers, but they have some intuitive sense of at least going in the right directions relative to the problem, and they are potentially the correct order of magnitude

Now, suppose we have good information about the total population of 3000, but we take a sample of 60 fish and get 4 escaped (6.7% observed), in R we calculate the expected loss (note I’ve called the C function “cost” because “c” is already the “concatenate” function in R)

mean(replicate(1000,cost(3000,rbeta(1,4+3,97+(60-3))))) ===== 1.30

suppose instead it were 6 escaped fish (10% sample population)

mean(replicate(1000,cost(3000,rbeta(1,6+3,97+(60-6))))) ===== 1.62

now suppose it is 3 escaped fish (5% of sample)

mean(replicate(1000,cost(3000,rbeta(1,3+3,97+(60-3))))) === 1.18

now suppose we take a sample of 100 fish and get 5 (5% but more tight answer):

mean(replicate(1000,cost(3000,rbeta(1,5+3,97+(100-5))))) === 1.23

now suppose 100 fish and we get 3

mean(replicate(1000,cost(3000,rbeta(1,3+3,97+(100-3))))) === 1.03

vs 60 fish and we get 2:

mean(replicate(1000,cost(3000,rbeta(1,2+3,97+(60-2))))) === 1.05

so bigger sample reduces the uncertainty and hence changes the amount to pay, depending on whether it concentrates the prior near a “good” point, or concentrates it near a “bad” point.

Note, I’m not advocating the use of these numbers in *ANY WAY* just trying to give the textbook version of this problem so that fisheries biologists could potentially see how it might work. They can program this into R, and play with the functions, and see how the numbers depend on things like how big the population is and how many fish are found in a sample of size N etc.

You’ll also notice that there are no big “cliffs” to jump off of. You don’t say “gee we caught 4 out of 100 fish that were escapees, no one has to pay anything” and then maybe next year “gee we cought 5 out of 100 fish that were escapees, farmers, please pay \$100 Billion Dollars to remedy this…

• Note some issues with this: there is potentially an incentive for the regulators to reduce the number of fish in the river and increase their budget… Spend it poorly on wine, women, and song, and increase their budget even more!!! Soon, they’ll have 50 breeding pairs, and 70x the budget for wine women and song! woohoo

So, real-world decisions should really be made not in terms of “how much money to give to the regulators” but maybe “how much money to spend on each *specific* program recommended by biologists, given that each one has an uncertain estimated outcome” and then… track through time the actual effectiveness of each program, and improve your estimates of how good each program is so that you funnel the money into policies that actually really do improve things.

So real-world decision problems should include both uncertainty in what the state of the world currently is (that you address with cameras in the river, and small catching surveys etc), and uncertainty in the outcome of each recommended program, as well as some way to evaluate the effectiveness and feed it back into the process to reduce the uncertainty about how good each program is… through time

Given that we’re talking about Billions of dollars in Norway, I think it makes really good sense to spend a couple hundred thousand dollars designing an actually effective decision scheme.

• Ok, clarifying further how the methodology goes in the more general case.

Suppose you have surveys that tell you a posterior distribution of N and F as of today, call those N_0 and F_0. Then, you have various different programs designed to either reduce escapees, or increase the wild population or both. Let’s say there are k of them. Suppose you have a notional dollar amount d_i where i goes 1 to k, to spend on each of the k programs, and you have a model for what the effect of doing those programs will be on N_1 and F_1 next year based on predictions with uncertainty.

So, your fancy model gives you p(N_1,F_1 | N_0, F_0, d_i), then you seek out d_i such that

C(N_1,F_1)+sum(d_i)/ScaleFactor is minimized in expectation over the posterior values of N_1,F_1 (that is, you plug in d_i values using some kind of optimization program until you find one that predicts the smallest expected badness for next year)

And, you collect information on the effectiveness of the different programs, and improve your model, so that next year p(N_2,F_2 | … d_i) is hopefully more concentrated on the true outcomes and you’re doing as well as you possibly can, you can also include new programs, or eliminate ones that seem pointless or actually do harm, updating your model as you go, hopefully including a transparent auditing process by an independent group.

This is, essentially, a Bayesian optimal control problem. Note that doing this correctly prevents the bad incentives, since the prediction for “throw money at wine, women, and song” would presumably increase sum(d_i) and produce higher C(N_1,F_1) costs than the policy of “buy stronger nets and hire people to monitor the water outflow near fish pens, and buy up key breeding habitats for wildlife preserve areas, and collect a moderate amount of wild fish eggs and hatch these fish back into the river” or whatever.

9. Olav brings up a good point: http://statmodeling.stat.columbia.edu/2016/07/28/30762/#comment-289403

which is that expected value over all outcomes isn’t necessarily *the one true* criterion to use for decision making, so if you disagree with that criterion then the complete class theorem etc just doesn’t matter to you. But, why would we want to use the bayesian rule of choosing D that minimizes expectation E(cost(y, D(x’)), p(y|x)p(x|x’)) as our criterion for choosing a decision rule?

First off, the expectation is a function of the probability of every single possible outcome. Therefore it doesn’t “throw away” information. You could create a decision rule something like

D(x’) == set your decision variables based entirely on some Function(median(x’))

and in some sense, this throws away everything you know about the posterior distribution of x except the location of one point on the CDF. This is really a terrible idea, because if you have an extremely wide and/or skewed distribution for the posterior over x then you may well have a wide variety of outcomes y which have a wide variety of costs associated to them, and yet, you’re ignoring those facts about the other possible outcomes.

So, if you want to take into account every possible outcome, you’ll need to include every possible outcome within the formula for your decision, and the only way to do that is via some kind of integral over the possible outcomes, and since different outcomes have different plausibilities associated to them, you’ll want to incorporate the plausibilities in your formula, not just the x values, these two things strongly suggest we need to make decisions based on integrate(foo(x)p(x),dx) for some foo(x), and since foo(x) is something we’re free to choose, we might as well call it “cost” and give it an interpretation.

whether you symbolically use p(y|x) or p(y|x) p(x|x’) in your expectation, when it comes to minimizing expected cost, since you don’t KNOW x, you will not be able to calculate the actual decision if you insist on minimizing with respect to p(y|x), fortunately Walds’ theorem shows us that the non-dominated rules are the ones where we choose some weighting function for the possible x values, in other words, choose a prior and calculate p(x|x’)=p(x’|x)p(x)/Z, where Z is a normalizing factor. You can shove Z into the cost function if you like since multiplying cost by a positive constant doesn’t change the minimization problem, then you can pretend that p(x’|x)p(x) is not a probability distribution …

So, why use a Bayesian decision rule? Because it is a function of every possible outcome (via an integral), because it uses all information available to us, because it allows us as much flexibility as we want to define whatever cost function we need, because that cost function has a direct interpretation in terms of “how bad would this outcome be”, and because in principle even if you don’t believe in Bayesian statistics but you do believe in sampling distributions p(y|x) the only non-dominated rules are the ones where we weight the different possibilities via a function that has all the properties of a prior.

10. Peter Erwin says:

You’re trying to estimate the level of farmed fish, which varies over time and across locations.

But my impression is that they really do want to estimate the level of farmed fish in individual locations at a specific time (or at least for a specific year), not just some smoothed value across time and different locations (though they probably want to know that, too). The point presumably being that if a particular location has significant contamination by farmed fish, it’s probably because the nearby farm has problems that should be investigated (and possibly prosecuted).

Going back to the original post:

Here is the statistical problem: In a breeding population of salmon in a river there may be escapees from the fish farms. It is important to know the proportion of farmed escapees. If it exceed 5 % in a given population, measures should made to reduce the number of farmed salmon in that river. [emphasis added]

It’s a bit like someone asking the best way to determine whether individual patients do or do not have a disease, so that the correct treatment can be recommended, and you’re steering them toward methods which will determine the overall disease level in the population instead.

• “varies over time and across locations” basically means “it’s different each year in each river” which is exactly what your point is. So I think this is just a communication issue, not a disagreement in principles.

11. I find this a very interesting example of path dependence in science. We are still to some extent locked-in to the improvised scheme from the 40’s. Sydow et al. (2009) provide interesting organizational explanations of how such situation can emerge. Sterman and Wittenberg (1999) provide models of how we can get locked-in to one paradigm in science. Hämäläinen and Lahtinen (2016) discuss path dependence in model based problem solving in general – mainly within the scope of a single modelling project, but to some extent also from disciplinary long term perspective.

Related references:
Hämäläinen Lahtinen (2016): ‘Path dependence in Operational Research—How the modeling process can influence the results’
http://www.sciencedirect.com/science/article/pii/S2214716015300117

Sterman, Wittenberg (1999): ‘Path Dependence, Competition, and Succession in the
Dynamics of Scientific Revolution’ http://pubsonline.informs.org/doi/pdf/10.1287/orsc.10.3.322

Sydow, Schreyögg and Koch (2009) ‘Organizational Path Dependence: Opening the Black Box’
http://www.jstor.org/stable/27760032