In industrial settings this can be the case. There is a small effect, essentially zero, with a relatively large known mean and low variance. However that ‘essentially zero’ effect can be meaningful in terms of production cost or other concerns. That is why equivalence and non-inferiority testing is commonly used.

]]>I’m not sure if that is the same point I am making, but don’t disagree with it. I’m saying the NHST paradigm is based on the principle that correlations/effects are rare. Thus it is somehow exceptional to find such, and studies are designed for this purpose. Instead the principle should be everything is correlated with everything else, and studies should be designed based on that principle.

]]>Hi Bob,

But shouldn’t it be also informative to run the same model with different priors based on previous findings? That is, using only the effects (summary statistics, not posterior draws) from similar studies and just see how much models change/robustness/reliability.

]]>“For a fully informative prior for δ, we might choose normal with mean 0 because we see no prior reason to expect the population difference to be positive or negative and standard deviation 0.001 because we expect any differences in the population to be small, given the general stability of sex ratios and the noisiness of the measure of attractiveness.”

To be fair, I see that narrowing the prior can be justified from a purely probabilistic point of view. If you have the “correct” prior for the “clean” case, for example the effect of true beauty on sex ratio is effectively sampled from a N(0,0.002) distribution, knowing that there is a certain level of attenuation you can easily derive the effect of measured beauty on sex ratio. At least if the “measured beauty” is only partially correlated to the “true beauty” and is not correlated at all to any other factors that could affect the sex ratio. If it is partially measuring beauty and partially measuring something else, the net effect is not trivial to determine. If the “noise” is completely random, you will have in the extreme case (measured beauty uncorrelated to true beauty) a prior equal to zero.

In summary, it’s not impossible that you chose your prior by assuming first a precise prior for the effect of true beauty and then a precise amount of classification error. I guess I cannot accuse from over-precision, given that you said that’s a “fully informative prior”.

]]>Good point…

]]>> In answer to your first point: noise in x will attenuate the correlation between x and y.

Sure, if there is an effect it will be smaller. The attenuation will result in weaker data and the likelihood will move towards zero. Even if you don’t change the prior, the posterior will change as expected. I guess that if you had a prior centered at some value other than zero it would make sense to move the prior accordingly (to reflect the attenuation in the expected effect). I’m not so sure about changing the variance of the prior.

> In answer to your second point: No, I don’t know there’s no difference.

Ok, let me rephrase it. You know that the difference is small (much lower than 1%) and even the most extreme outcome wouldn’t provide enough evidence to suggest otherwise.

]]>Carlos:

You write:

Why would the prior depend on the noisiness of the measure of attractiveness? Say I have a prior for some experimental setting. If I had a similar setting with more noise I think I would still use the same prior for the parameter of interest (but maybe there would be a nuisance parameter related to the noise).

I also find that prior very strong. If the beautiful parents had *only girls*, you would estimate the population difference to be just 0.1%. Maybe that’s your point, that the whole study makes no sense because you know that there is no difference and even in the most extreme outcome you wouldn’t really change your mind?

In answer to your first point: noise in x will attenuate the correlation between x and y. Suppose, for example, that there’s some precisely measured “beauty” variable x for which the more beautiful parents are 0.1% more likely to have girls. Now suppose you don’t observe x, instead you observe z, a noisy measure of x, and then you compare the proportion of girls among parents who have high and low values of z. This difference will then be less than 0.1%. It’s called attenuation in econometrics and it’s easy to show analytically or by simulation.

In answer to your second point: No, I don’t know there’s no difference. There *is* a difference, it’s not zero. Older mothers and younger mothers have (small) differences in Pr(girl), white mothers and black mothers have differences in Pr(girl), etc. Take any two groups and you’ll get different probabilities. But, given all the empirical research on sex ratios (and there’s a lot, because N is huge and the data are just out there for free in birth records), we know that these differences are small. Not zero. Small.

+1

]]>I was assuming that each study used in the ‘meta-analysis’ was based on random selection from the same population with a single unknown true mean or proportion. Each study would be performed separately and would have different means or proportions but could be regarded as part of one large study and their data pooled to give a better estimate of the true mean or proportion. If this could not be assumed (at least to be roughly true) then I agree it would not work.

]]>Blog post discussing the accuracy of said video:

https://helix.northwestern.edu/blog/2010/11/cell-biology-animation-and-reality

links to actual atomic force microscopy of the Myosin V molecule, which does have little feet that bind and unbind at a regular interval…

]]>Here’s a pretty cool science based animation of how a particular motor protein works (and a bunch of other stuff too):

https://youtu.be/yKW4F0Nu-UY?t=3m40s

You could suppose for illustration purposes that say the “feet” of this protein could absorb microwaves selectively because they “walk” at some 1000Mhz or whatever (or the microwave energy is a 1st 2nd or 3rd harmonic of whatever they do). If you add microwave energy, perhaps they vibrate back and forth rather than moving forward, hence a certain thing doesn’t get transported to its appropriate place as quickly, and so some chemical reaction does or does not occur fast enough to prevent some naturally occurring damage. This is more of a heuristic than anything else, obviously I have no particular candidate process in mind, just the idea that the intricate mechanical processes that large bio-molecules undergo could be selectively disrupted due to resonance at microwave frequencies. The more I learn about biology the more impressed I am at how complex it is, but also robust.

]]>See also Edwards’ ‘prior likelihood’ (eg in his book Likelihood).

This will only work for identifiable parameters though, ie those that just need enough data to estimate.

]]>Yet, your description here, that people think the multiplier could be anything from negative to positive, and O(1) or so… suggests a perfectly fine prior, maybe normal(1.0,10.0)

Nevertheless, I agree with you about your skepticism that economics will begin to do this. I just don’t think this is because doing it is hard, or wrong, or anything like that, it’s because of politics etc.

]]>A fresh unbiased study performed meticulously will continue to converge on the true mean as the number of observations increase. However, unless the prior probability distribution shares the same mean, it will bias the fresh study and delay its convergence on the true mean and thus be counter-productive. It would have a similar biasing effect as ‘P-hacking’. Ideally, the prior data should have been a pilot study for the ‘fresh’ study so that it could be regarded as part of it. In other words, the ‘prior data’ would have to be chosen very carefully. Others reading the study might prefer for the fresh data to be ‘normalized’ on its own to create a ‘fresh’ posterior probability distribution and to use the author’s prior probability as a guide to suggesting his or her own for personal use e.g. deciding to perform another study to replicate or contradict it.

]]>ack, of course the blog ate the angle brackets Stan uses for bounds on p0… sigh.

]]>Bob, a useful construct when you have a region where you really are pretty indifferent, like say +- 1000 but you want to include some weight on the whole real line is something like

parameters{

real p0;

real dp;

}

transformed parameters {

real p;

p = p0 + dp; // a convolution of a uniform with a normal

}

model{

dp ~ normal(0,some_scale);

}

thereby giving you a nice flat plateau in -1000,1000 but convolved with some gaussian to give an infinitely smooth prior over the whole real line.

]]>Oren: here’s the first PubMed link I found on searching “microwave resonance proteins” https://www.ncbi.nlm.nih.gov/pubmed/18240290

taking their abstract at face value (possibly not a good idea, but a starting point since I don’t have access to full text) they suggest that resonant absorption of microwaves can certainly affect proteins selectively. It’s at least plausible. Yet, I fully agree with you in the basic point you’re making that policy is being made by people with strong but uninformed priors.

When it comes to power-lines at 60Hz I think the results are completely different, such small objects as proteins are likely to see 60Hz as essentially DC, resonance absorption should be up in the range of microwave ovens certainly above 500Mhz etc.

]]>Yes, I think Huw though is imagining “making up” a dataset that you a-priori think might be representative of the range of stuff you expect to see, then do Bayesian inference on this fake dataset, and see which parameter values are consistent with this fake data thereby backing out a prior for a parameter from what you think the data ought to look like… I like this idea a lot as a way to get informative priors, and since it’s not a weighted average of crappy studies it might be more reasonable, certainly doesn’t suffer from file-drawer and poor research practices etc.

]]>Hi Daniel – I knew I recognized your name from somewhere.

Regarding the effect of EM on cells: the problem is that, not only is the radiation non-ionizing, it’s not even comparable to thermal energy. So any effect involving some activation barrier being surmounted by the radiation would already be blown past by ambient thermal noise. Robert Adair (Yale physicist) treated this issue at great length in the early ’90s (back when there were scares about power lines), albeit focusing on lower frequencies where the issue is even more clear cut. (My physics chops are a little rusty, and I don’t have a strong intuition about the resonance idea, except that it seems unlikely at those energies. Cell phone frequencies are below the blackbody peak at room temperature and I’m pretty sure there are a gazillion energy levels accessible to pretty much any large molecule in those ranges, particularly in a liquid environment.)

But at any rate, this is just to emphasize my mechanistic prior, which is evidently different from that of the Chronicle’s health writer and the Berkeley city council, who seem ready to use the uninformed (ha!) prior that every modern technology is carcinogenic unless proven otherwise, and also the studies showing otherwise should be ignored (because they disagree with said prior too strongly).

]]>I’ll put it in more traditional economic terms. The textbook Keynesian model says that if the economy is not at full employment, then the multiplier (the effect on GDP of increasing the gov’t budget deficit by $1 = 1/(1-MPC) where MPC is the marginal propensity to consume (the derivative of total consumption spending with respect to income). Since the MPC is around .8, the multiplier would be around 5. Somewhat more sophisticated models incorporate taxes and imports and these will reduce the size of the multiplier somewhat. This model prevailed until the 1970s. Since them, a portion of the economics discipline would claim that the multiplier is 0: any increase in government spending will squeeze out private investment dollar for dollar. Some other extremists would go so far as to make it negative, claiming nefarious influences on the private sector and worrying about what the government spend money on. And, of course, there are some ideas that the multiplier is not at all stable and it depends on many other things, such as consumers going on strike, etc.

But the point is that these differences are deeply rooted in philosophical differences in how people believe the economy works. There is no consensus. We could establish several priors corresponding to different schools of thought and then examine the same evidence in each case. That would be instructive and I would support that. But I don’t think you will see that any time soon – it makes these schools of thought less “scientific” and more “subjective.” If you want to claim these beliefs are wrong, I’m in agreement with you. But I think it is part of the fundamental reason why economists, at least, would resist the advice in Andrew’s post (of course, I could be wrong, since I can’t really speak for most economists).

]]>Huw (and Daniel)- it you have not noticed it yet you might find my comment of interest http://statmodeling.stat.columbia.edu/2017/10/04/worry-rigged-priors/#comment-578656

The different likelihood distributions multiplied together is approximately a weighted average and if the likelihoods are quadratic it is exactly equal to the inverse variance weighted average.

Something more thoughtful is advisable and if such can’t be discerned – flatten the multiplied together likelihood to reflect more uncertainty e.g. raise it to some number less than one (called something like fractional likelihood).

There also will be a related post later this afternoon.

]]>How about beauticians, dieticians, and musicians vs. physicists, hypnotists, and barist(a)s?

]]>The usual problem is that you only get summary statistics, not Bayesian posteriors. It’d be great if you could just include data from other studies in one big meta-analysis, but that’s rarely possible.

If you did get some kind of Bayesian posterior downstream, there’s the problem of how to compute with it if it’s not conjugate. That’s one of the reasons working directly with other data is easier.

]]>If we take a uniform prior over the range plus or minus one million, what does it say probabilisitically?

1. The probability the parameter is in (-1000, 1000) is only 0.1%

2. The probability the parameter is outside of (-1000, 1000) is 99.9%.

That’s probably not the information you want to provide to your Bayesian model if you don’t expect the parameter to have values outside of (-1000, 1000). I keep meaning to write a case study that shows how this works (along with the truncation you get that Daniel Lakeland describes above if you err on the other side and make the boundaries too tight). Andrew’s already written papers showing how the original diffuse inverse gamma priors suggested in the original BUGS examples led to overinflated variance estimates.

]]>> If the beautiful parents had *only girls*

For context, I mean if all the 600 kids from beautiful parents in the study were girls.

]]>Daniel,

if the point estimate obtained from a flat prior is ok but the posterior distribution is too wide maybe the problem is with the likelihood function and not with the flat prior. In any case, I don’t think the problem is that the prior specifies a 99.9% chance an effect size has absolute value greater than 10^305.

Andrew,

I agree and I think I said something similar myself (“What matters is what may be the effect on the inference when this prior is used in the context of the model once we include the data.”). Regarding the paper you link to:

“For a fully informative prior for δ, we might choose normal with mean 0 because we see no prior reason to expect the population difference to be positive or negative and standard deviation 0.001 because we expect any differences in the population to be small, given the general stability of sex ratios and the noisiness of the measure of attractiveness.”

Why would the prior depend on the noisiness of the measure of attractiveness? Say I have a prior for some experimental setting. If I had a similar setting with more noise I think I would still use the same prior for the parameter of interest (but maybe there would be a nuisance parameter related to the noise).

I also find that prior very strong. If the beautiful parents had *only girls*, you would estimate the population difference to be just 0.1%. Maybe that’s your point, that the whole study makes no sense because you know that there is no difference and even in the most extreme outcome you wouldn’t really change your mind?

]]>Also Carlos: from a probabilistic perspective, the flat prior assumes the value *is* enormous. Flat on +- 10^308 has 99.9% probability outside the +- 10^305 region. As soon as you add in an assumption of non-probabilistic estimation (ie. point estimation) the prior has a different effect, which is to not change the location of the maximum, so you might argue that pure maximization based point estimation has no real probabilistic content (in the Bayesian sense of probability on the parameter space). The question I have is if the flat prior results in a Bayesian posterior that makes no sense, why would you necessarily think it would be a good idea to take the maximum a posteriori value from this posterior and call it a good estimate. Later we can get into James-Stein estimation and the inadmissibility of this flat-prior point estimator.

]]>Just breaking the math down a bit here in the Bayesian case may help. Suppose you have a prior on a parameter

$latex \theta \sim \mbox{Uniform}(-10^6, 10^6)$.

Just a simple uniform prior on an interval. That prior says it’s very unlikely that the value of $latex \theta$ is small, because

$latex \displaystyle \mbox{Pr}[ | \theta | 10^5] = 1 – \frac{2 \times 10^5}{2 \times 10^6} = 0.9$.

]]>Oren, re the cell phone stuff, I basically agree with you, but the idea that non-ionizing radiation could cause cancer is I think a little more nuanced. Any enzyme associated with DNA repair or oxidative stress or whatnot that could be activated or inactivated by selective absorption of microwave type radiation could cause cancer over time through this indirect method, basically inhibiting the ability of the cells to cope with naturally occurring processes, or increasing the rate at which those naturally occurring processes occur. If I wanted to study such things I’d be looking at molecular resonances of the proteins to see if their chemical kinetics or protein folding configurations could be affected by absorption of certain wavelengths…

Also, hi from an ex colleague, assuming there aren’t too many Oren Cheyettes in the SF Bay area.

]]>This is a really useful way to get nuanced priors.

]]>Both studies continue to get media attention – out here in the Bay Area, we were just treated to an alarmist story by the SF Chronicle’s health writer on the risk of smart watches, quoting heavily from two go-to figures in the “cell phones will give us all cancer” community and mentioning the NIH/NTP report. Particularly at the local level, a lot of questionable policy gets made based on these sorts of reports – e.g., Berkeley on cell phone warnings and Petaluma on herbicides used by public maintenance staff.

]]>PS. If there are no real prior data sets, then a pseudo-data set would could be ‘imagined’ subjectively based on informal experience or theories, its subjective likelihood distribution arrived at and normalized to give a prior non-baseline prior probability distribution.

]]>The prior probability can be based on a series of data sets each being assumed to share the same ‘true’ mean but each with its own likelihood distribution. The likelihood densities of the different likelihood distributions can be multiplied together to form a joint likelihood distribution and then ‘normalising’ the latter so that all the posterior probabilities sum to 1 (normalisation always assumes that the ‘baseline prior is uniform or flat for random sampling, which is correct – see my blog: https://blog.oup.com/2017/06/suspected-fake-results-in-science/). The resulting posterior probability becomes the prior probability distribution for the new study. This is multiplied by the likelihood distribution of the new study data and normalised again to give the latest updated posterior probability distribution (to be discussed in the ‘discussion’ section of the paper).

]]>Maybe psych journals should require a retrodesign function execution from Gelman and Carlin’s article, so the reader can assess the Type S/M and retrospective power.

]]>Very interesting podcast, I’m buying the book by him. The idea that Germans banned coffee to protect beer—this I have got to read about.

Talking about resistance, I spent the morning trying to figure out how to convince an action editor that a bunch of low-powered big effects is not as convincing as a small effect from a large-sample study. First I have to demonstrate how Type M error arises… the news has apparently not reached psychology.

]]>Following on some thoughts on priors for economic “multiplier effects” but we’d run out of reply room above.

Let’s let t be defined in years, and the “one year future total consumption per capita” function be

C1(t) = integrate(C(t+s)ds,s,0,1)

Where C(t) is the sum of all transactions that occur on a given day divided by the population N divided by 1/365 to put C(t) in units of dollars per person per year. C(t) is a piecewise constant function over each day.

Now, I take the 1 year multiplier effect to be

C1(t) if we have the government spent G dollars per capita (Call this C1_G(t)), where G dollars is any number between 0.001 times GDP/capita and 0.01 times GDP/capita (we assume an intermediate asymptotic stability of the effect for these moderately small spending levels)

minus

C1(t) if we don’t spend the G dollars per capita

divided by G

M = (C1_G(t) – C1(t))/G

Now clearly, this quantity depends on our choice of 1 year as the time period of interest, but we might expect that we’d get a similar effect for a range of window lengths from say 1/2 year to 2 years and so it’s *not extremely sensitive* to the window length. This is partly due to the fact that we average over 320 million people, and that we integrate our function over a full year or so, thereby smoothing out short term fluctuations quite a bit.

Next we note that logically we can in fact get quite large negative values, as I say if everyone in the country goes on strike because the Nazi party comes into power and whatnot… then C1_G(t) could go to zero, while C1(t) the counterfactual would have been something like 57000 $/person but… it’s extremely unlikely

In fact, for the most part, we’d expect this number to be something like 1 as the increase in GDP caused by spending G dollars per person would be something like G dollars per person, divided by G we’d get 1. So probably the peak of the prior density should be 1.

Furthermore it also seems like we could easily get 0, where each dollar spent by the govt causes someone to withhold a dollar of spending. This would be the case where we’re pretty much just doing a straight transfer from one group of people to another…. So the prior should be wide enough that 0 has density that is not so much lower than the density at 1. Finally, it’s reasonable that you might activate a lot of activity by your government spending, if it’s targeted properly (maybe you stimulate the economy of a depressed region, where lots of labor is available but little free cash for example). So you should be considering quantities out into the range of 2 or 3.

With all this in mind… an initial prior seems like normal(1.0,2.0) would be a good place to start, including values well into the negative range, and well above 1.0 but giving 1.0 the peak.

]]>However, once you create enough missing data you cannot estimate the models anymore, because some of the statistics are not observed enough anymore to estimate the parameters (e.g. I had posteriors from -100 to +150). Luckily I came across a youtube video of one of Andrew’s presentations about weakly informative priors, where he discussed a similar issue that parameter could not be estimated, because there was (nearly?) no data for it. Now, using these priors, Normal(0,4), the models converge nicely with 50% of the data missing (which in networks means that for many statistics you have 75% of the data missing). My point is that my main reason to choose this prior is pragmatical, you cannot run the model with a flat prior. I therefore wonder how much of this discussion applies directly to my choice of prior?

*I am not a native English speaker (as you might have guessed), but is there an difference between -icians and -ists? It seems to me the -icians (statist-, econometr-, psychometr-, mathemat-,…) have a better understanding of what they are doing compared to the -ists (psycholog-, sociolog-, biolog-,…).

]]>* the basic idea makes sense… Editing on phone, sentence fragments…

]]>Aha I see we are secretly pointing out the same thing. The truth is that although the basic idea that spending money can induce growth in the economy “the multiplier” really doesn’t exist as a well defined thing. So it’s not surprising that the numerical value is controversial ;-)

]]>Andrew,

Actually, I phrased that last sentence poorly. I hear “let the data speak for itself” a lot, and like you I disagree with it, in two ways:

In a Bayesian/Frequentist context I prefer Bayesian which says that we need to make prior knowledge (common wisdom, our assumptions, etc) explicitly part of the model and then let the data push things around, speaking more loudly or more softly depending on how much and how strong it is.

In a general Data Science context, the methods and models we use will find a signal, if that’s possible. But the signal may not be what we hope it is. It could be a “leak from the future” in the data, which is very common. It could be a bot “clicking” on links rather than a potential customer. Heck, almost every engagement I go into doesn’t have a data dictionary and that data doesn’t speak for itself. (In fact, when I make the mistake of thinking I hear it talking based on the name of a field, I’m often deceived because that name doesn’t mean what I think it means.) So the data doesn’t actually speak for itself in this context either.

Only in the narrow sense of “don’t necessarily believe what ‘experts’ say about the data” does “let the data speak for itself” make sense to me.

]]>You are trying to build a dynamic model – a worthy goal, but not what the multiplier was designed to represent. It is a comparative statics result: if we increase government spending by $1 (without increasing taxes), what is the final increase in GDP after the system equilibrates. There is still a time dimension, as many economists will give different answers if you ask what the change in GDP will be after 6 months or after 1 year, etc. Also, the answer will vary, depending on the initial state of the economy (extent of unemployment, etc.). We can put more detail in and I have no doubt you can provide a prior distribution that will be defensible (as well as open to criticism). But this misses my point. I don’t believe you can provide a prior that can be said to represent “consensus” because there is none. And, while I do think the effort would be worthwhile, I think you will find great resistance to this approach for the very reasons I am trying to convey. I think the resistance to specifying priors is, in large part, a resistance to revealing that the emperor has no clothes. After all, economists pride themselves on being more scientific than the other social sciences.

]]>I think so – and in general that is well argued here – Calestous Juma. From coffee to tractors: Why fear of loss inspires resistance to new technology. And today’s pragmatic Bayesian approaches are new technology – just the theorem is old.

20 minute podcast here http://www.cbc.ca/radio/thecurrent/the-current-for-march-30-2017-1.4045972/march-30-2017-full-episode-transcript-1.4048646#segment2

]]>I get the basic / intuitive idea behind the “multiplier” effect, my big issue is that I don’t see how it can be defined *precisely* to give a universal way of calculating it. Let me explain

Suppose we take C(t) to be the total consumption by all members of the US at time t, a continuous function of time. Well, of course we know, like in the stock market, that consumption is not continuous. When I buy a sandwich a few dollars is transferred all at once. This is not the same thing as saying that all day long I spent a few pennies each hour…

You might think this is pedantic, but it seems to me the “multiplier” effect is some kind of derivative, how much total consumption changes when some particular amount of consumption by a certain party occurs. d something / d something

But the derivative is an unbounded operator, and it doesn’t even exist for a discrete series of transactions… and so we can really only discuss this in terms of taking the real series of discrete transactions, smoothing them in some way, and then defining our derivative of this smoothed thing… Fine, but then the result we get is dependent on the way in which we do the smoothing… Is there a way to define all of this in such a way that the result is largely independent of our choice of smoothing method for a wide range of smoothing methods? If so, we’re in the same situation as we get when trying to represent a steel bar using continuum mechanics, sure it’s atoms, but if we smooth the atoms by a smoothing kernel of width greater than 100 atomic distances and less than 1mm which is quite a few orders of magnitude… the results are nearly the same.

It’s less obvious to me how this would work for consumption. First off, consumption clearly has a very strong daily oscillation. I buy very little at midnight, and quite a bit more at noon. So any smoothing we do must be over a timescale large with respect to a day. But, there’s also clearly seasonal effects in consumption, christmas is big for retail, summer is big for travel… so smoothing seems to need to be large with respect to a year! But over decades technology and policy and things all change a lot. So I don’t think we’re ever in any regime where a smoothing based view of what’s going on really applies very well.

Now of course we’re interested in a causal effect, spending G government dollars causes some change in something, over some time period relative to what it would have been if the G event hadn’t occurred…. So it’s not a simple derivative in time, it’s a counterfactual about how much consumption would occur in some time period after the G event compared to what would have happened in the absence of G… But defining this in a way that is insensitive to the choice of time period still seems impossible. You could for example do a truncated Laplace transform (ie. discount all future consumption out to some window according to some discount rate) but then you’ll wind up with a result that’s very sensitive to the discount rate and the truncation window.

So, if you want to do a particular analysis, and you want to choose a particular way of doing the calculation, then I can give some particulars of the appropriate prior. All this is to back-up the assertion that Andrew made in a recent paper: The choice of prior is intimately connected to the choice of likelihood / data model.

]]>Strange that this is not commonly done – the technical challenges are not that hard http://statmodeling.stat.columbia.edu/wp-content/uploads/2011/05/plot13.pdf

(Actually that was the reason the journal editor gave for rejecting the paper – not enough technical innovation to justify publication in my prestigious journal)

]]>> Constructing a prior is work.

It was the original motivation for the work I did in meta-analysis (to get prior for cost/benefit analysis of funding for clinical trials).

A little bit of thought about this soon suggests you don’t want some weighted average of the (mostly crappy) studies that happened to get published. Or maybe it takes more than a little thought…

]]>Carlos,

As we discuss in this paper, the prior can often only be understood in the context of the likelihood. In particular, a sample average or maximum likelihood estimate can be “quite reasonable” in some contexts but not in others. In a setting where measurements are accurate and plentiful and the goal is an estimate of a simple parameter whose value is not near the boundary of parameter space, then, sure, the flat prior can work. In a setting where measurements are noisy, sample size is not huge, and the goal is something more specific, then maximum likelihood or Bayesian inference with a flat prior can give bad answers: estimates with bad frequency properties, with high bias, high variance, high type M errors, high type S errors, the whole deal.

]]>Carlos, this is intimately tied up in the insistence on a point estimate though. The behavior of a point estimate of course is far less affected by the clearly wrong tails of the prior because the location of the point estimate is determined by the location of the optimum which is totally insensitive to the tails.

This is of course by design for the person who distrusts priors, nevertheless as soon as you want to construct a measure of uncertainty or a risk and utility based decision you have a different story.

The risks associated with point estimation when outcomes and their consequences can vary widely are significant. If a posterior distribution is tightly peaked near your point estimate then things are ok, if there is nontrivial width then that flat prior can be deadly for your decisions as you wind up considering possibilities well outside what anyone actually thinks might happen, simply because no one wants to be in charge of justifying a prior choice. Walds theorem applies whether the user of statistics likes it or not.

]]>Daniel

You can start here (http://marginalrevolution.com/?s=multiplier). Of course, that is not an authoritative source and it represents the more right wing side of economics – Krugman would have a somewhat different take. But I have no doubt you can generate a prior – or even two or three. And, I believe doing that would be superior to conducting a new study using some data and declaring a confidence interval for the *true* size of the multiplier from that single study. I am not disagreeing with the post or your comments here – I am providing my view for much of the underlying resistance to change and clinging to these frequentist methods. If our estimates for the size of the multiplier shift depending on which prior you choose – and I believe they would – then it exposes the entire enterprise to be a sort of mathematical trick, a way to couch a subjective belief as “scientific.” And, who wants to do that? (only real scientists perhaps).

+1

]]>> it automatically assumes the value in question is ridiculously enormous.

A flat prior doesn’t assume that it *is* enormous, it assumes that it *could be* enormous. An informative prior may be better, but an uninformative prior is not obviously stupid. What matters is what may be the effect on the inference when this prior is used in the context of the model once we include the data.

If you say that the flat prior means that you expect the value of interest to be greater than 10^305 you make it look stupid.

If you say that the flat prior means that you will take the mean of the data to estimate the value of interest it looks much less stupid, actually it looks quite reasonable.

Let’s say you measure the height of a sample of people to estimate the average height in the population and you get mean=170cm. Maybe you have reasons to think you should correct it a bit in either direction, but taking the 170cm at face value is not obviously stupid. If you get mean=512km there are issues with your model or experimental setup much worse than the fact that the prior doesn’t rule out that value.

Of course nothing is normal, all models are wrong, etc. Everyone understands that if we say that the height in a population is normally distributed with such and such mean and standard deviation this is just an approximation. The median and the mode might be different from the mean, the shape of the distribution around the mean might be far from normal, and surely there are no negative heights or heights larger than 10^305.

]]>