Comments on: Good advice can do you bad

By: Andrew

Andrew — Tue, 15 Mar 2016 20:41:03 +0000

In reply to Richard Kennaway. Richard: Of course, if theta is really drawn from a uniform (-A, A) distribution with A very large, and you gather data y~N(theta,1), and you condition on y=1, then, yes, theta will be positive 84% of the time. The point is that, in the situation where you'd be computing such a probability, theta is not really drawn a uniform (-A, A) distribution with A very large. In settings where y~N(theta,1) and you might observe y=1, theta is much more likely to be close to zero and much less likely to be large. The uniform prior is a default model that can cause all sorts of problems (cf. power pose, ovulation and voting, social priming, etc etc etc).

By: Richard Kennaway

Richard Kennaway — Tue, 15 Mar 2016 20:20:19 +0000

I am not seeing the problem with number 3, and a simple simulation confirms that in the given situation, when y is in the range 0.99…1.01, then theta > 0 about 84% of the time. Why is this a problem? Yes, you could lose the bet, but that is what a bet is.

How “large” 84% is depends on what you are going to do with it. It would be a laughably low justification for publishing a claim that theta > 0. I don’t think I’d even consider it suggestive of something worth a further look. On an even money bet, an 84% chance of winning is a good deal. And AlphaGo just beat Lee Sedol in 80% of their five games.

I used a uniform prior over a finite interval large enough to contain almost all of N(y,1); any such finite interval will give the same result. Even taking the prior on theta to be N(0,1) gave a frequency of (theta>0 | y=1) of about 73%. It takes priors like uniform on [-1,0.1] or on [-0.1,1] to drastically change the posterior. So I am not clear on the point of the example. If you don’t know which of those priors to use, then all of them are wrong. Is the real problem here, how to choose a prior when you don’t know how to choose a prior?

By: Keith O'Rourke

Keith O'Rourke — Mon, 14 Mar 2016 14:02:52 +0000

In reply to Christian Hennig.

Christian.

From a _technically sophisticated_ analysis I have to review “for data with well proven normality (p-value according [to some two named test]”…

The Greenland et al paper in the ASA p_value statement supplement does a nice job at getting at this issue http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108

By: Andrew

Andrew — Mon, 14 Mar 2016 13:06:54 +0000

In reply to Christian Hennig. +1

By: Curious

Curious — Mon, 14 Mar 2016 01:43:33 +0000

In reply to Christian Hennig. +1

By: Christian Hennig

Christian Hennig — Mon, 14 Mar 2016 01:25:47 +0000

Good advice: Statistical methods rely on model assumptions and these should not be forgotten when choosing methods and interpreting results.

How it can mislead (I): Use methods that don’t seem to have assumptions and think that this makes the problem go away (usually because people haven’t worked out exactly what the methods are doing, or because people haven’t taught/learnt what the more or less implicit assumptions are, e.g., for classification and regression trees, hierarchical clustering, principal components etc.).

How it can mislead (II): Run a randomly chosen goodness-of-fit/misspecification test, if deemed necessary followed by transformation, and declare the model true afterwards.

By: Paul

Paul — Sun, 13 Mar 2016 19:04:02 +0000

In reply to Paul. Well summarized, I think we can agree on this. Thanks for the links.

By: Martha (Smith)

Martha (Smith) — Sun, 13 Mar 2016 16:52:03 +0000

In reply to Paul.

As you said yourself, most of your first two paragraphs is highly exaggerated.

To elaborate: Researchers need to know enough statistics
a. to be critical readers of research using statistics
b. to be able to make wise choices of statistical consultants
c. to be able to work collaboratively with statistical consultants.

Typical statistics courses in many fields do not do this; instead they usually just teach how to “do” simple statistics.

In fact, there is a whole big issue of “consulting” vs “collaborating”. There was an interesting discussion of it on the American Statistical Association discussion group recently. However, I suspect it is only available to ASA members; here’s the URL in case it is accessible: http://community.amstat.org/communities/community-home/digestviewer/viewthread?GroupId=2653&MID=29962&CommunityKey=6b2d607a-e31f-4f19-8357-020a8631b999&tab=digestviewer#bm16

There is a brief discussion of statistical collaboration at http://magazine.amstat.org/blog/2016/01/01/a-recipe-for-successful-collaborations/, which I believe is generally accessible.

By: Paul

Paul — Sun, 13 Mar 2016 11:44:31 +0000

In reply to Martha (Smith). Well, but then I don't understand the -1 for "most of the preceding".

By: Martha (Smith)

Martha (Smith) — Sun, 13 Mar 2016 04:42:51 +0000

In reply to Paul.

I wasn’t arguing in favor of “Do it yourself” — but for being a critical consumer, just as one ought to be when engaging the services of a dentist, plumber, etc.

By: Daniel Lakeland

Daniel Lakeland — Sat, 12 Mar 2016 23:46:05 +0000

In reply to Phil.

Multiplying forces: Yes I see what you’re talking about now, and while I now agree that the logical requirement may not rule this out… it might be that such a force couldn’t possibly be *local*. Imagine your simulator, it has to essentially talk to all the nearby particles and figure out how many of them there are in order to figure out how big to be.

So, is it conceivable that the reason we don’t see situations like your enormous one basically that there is some kind of locality requirement. Of course, QM doesn’t seem to be exactly local really… so anyway I like this conversation, though I think it’s pretty far off the original topic ;-)

I also wonder about the strong nuclear force. It seems as though it’s a purely theoretical construct that can’t be tested asymptotically at infinity. If you take a couple of quarks and pull them apart, you generate new quarks, so you can’t actually test whether as you pull quarks apart to long distances (like say 1cm or 1m or 1km) the force stays constant (and hence the potential energy grows linearly). So basically, given the untestable nature of it all, you’re in a situation where there’s no need to “trust” the standard model for long-distance strong nuclear forces. In fact, the “net” nuclear force looks like the Reid potential or whatever, and drops off to zero at infinity as you expect given that the whole universe doesn’t collapse down to a single nucleus :-) https://en.wikipedia.org/wiki/Nuclear_force

By: Paul

Paul — Sat, 12 Mar 2016 20:54:43 +0000

In reply to Martha (Smith).

Martha: I agree with you: blind trust is never a good idea – there are black sheep (even with a license) in each profession. However, I think in some situations you have no other choice. And I think a researcher who performs a major study – equipped only with correlation coefficients, t-tests and some basic knowledge about p-values – is in such a situation. This is like playing dentist with a hammer, screwdriver and drilling machine: it may works, but don’t look at the result.

In addition, as a consumer, it’s your choice: If you are not satisfied with a service you can choose a different professional.

And please, don’t get me wrong: I know that there are scientists (non-statisticians) out there who are able to conduct beautiful analyses. But in my experience they are not the majority.

By: Martha (Smith)

Martha (Smith) — Sat, 12 Mar 2016 20:24:51 +0000

In reply to Paul.

Paul:

+1 to your final paragraph; -1 to most of the preceding. In particular, I don’t think it would be wise for me, as a consumer, to automatically trust every dentist, doctor, mechanic, etc.; even though some are “licensed,” that doesn’t automatically mean they have good judgment.

By: Phil

Phil — Sat, 12 Mar 2016 18:07:53 +0000

In reply to Phil.

Daniel:

|“Or there could be forces that don’t add (like gravity) but that multiply”
|
|There actually *CAN’T* be forces that multiply, The dimensions of the quantity have to remain FORCE, and in this situation they wouldn’t.

I think you’re misinterpreting what I mean by “multiply.” What I mean is, if you have one particle you get a force F at a given distance; if you have two then you a force 2*F, and if you have three then instead of a force of 3*F you get a force of 2^3 * F, and so on. Get 10^30 of these together and you’d have a force of 2^(10^30)*F. The force would still have dimensions of force, it’s just that the force would increase very rapidly as you agglomerate particles.

I doubt you would say that this is impossible, but just in case someone else is tempted to say this, they should recognize that we could in fact make devices that work like this. We could make a little electromagnet that senses when it is close to some number of similar devices, and could increase its output accordingly. If they all act like this, they could generate a force like the one I made up. So even in the world as we know it, we could make a sort of simulator of what I’m talking about.

“And there are forces that get stronger with distance not weaker”

| But, I don’t think there can be forces that asymptotically get stronger right? I mean, forces like a nuclear force that have a high potential | barrier at some radius and then fall off to zero at infinity… yes, but not things that increase without bound at infinity.

Of course there’s a sense that the only things that can happen in the universe are things that do happen, and as far as we know there are no forces that get stronger with distance, therefore such a force cannot exist. The expansion of the universe, and the expanding distances between the bits of matter in it, certainly put an upper limit on how strong such a force could be. But…well, we don’t have a Grand Unified Theory yet, and as such I’m not sure we can really rule out that such a force _could_ exist.

The Strong Nuclear Force at least doesn’t _decrease_ with distance, although perhaps one could picture that as a force that falls off like r^-n where n happens to be 0 instead of a positive number, not an indication that n could ever be negative.

I dunno. I’m not even sure there’s a puzzle to speculate about, so I’m not sure my ruminations on the puzzle are productive. But I’m not sure there’s NOT a puzzle, either.

By: Daniel Lakeland

Daniel Lakeland — Sat, 12 Mar 2016 16:06:42 +0000

In reply to Phil.

Phil, I don’t have much argument with you, it certainly is true that the world has to be not too wacky in order for life to exist… so that’s kind of interesting. But I wanted to respond to this bit because it is a really interesting fact:

“Or there could be forces that don’t add (like gravity) but that multiply”

There actually *CAN’T* be forces that multiply, The dimensions of the quantity have to remain FORCE, and in this situation they wouldn’t.

I mean, I suppose you could posit a world where this particular symmetry law doesn’t hold… but in our world, quantities with a particular dimension can’t multiply to produce a quantity with the same dimension. So, this kind of thing isn’t something you could bolt on to our world, it’d be a totally different world, and in that world, there would be NO dimensionless numbers, really there would be no concept of dimension of a quantity.

Also:

“And there are forces that get stronger with distance not weaker”

But, I don’t think there can be forces that asymptotically get stronger right? I mean, forces like a nuclear force that have a high potential barrier at some radius and then fall off to zero at infinity… yes, but not things that increase without bound at infinity.

By: Paul

Paul — Sat, 12 Mar 2016 13:21:36 +0000

In reply to Peter Chapman.

I think the correct solution to this problem is to not teach them statistics. Instead, researchers should be encouraged to get help from trained and experienced (maybe even certified – I have no idea how it is in the US, but in Germany anyone can call him- or herself a statistician) statisticians. Or to stick with your example: I don’t build a bridge myself, I hire an engineer who knows what to do.

When I have a painful tooth, I go the dentist. When I’m sick, I go to the doctor. When my car is broken, I go to a mechanic. When I have mental problems, I go to a therapist. When I have legal trouble, I hire a lawyer. What I want to say: There are things you can do by yourself, but for some you need a professional. And statistics in one of those things where you need a professional. However, our current academic education system does not stick to this principle. Instead, we teach students of any discipline how to calculate t-tests and correlation coefficients. Later, it is required that those researchers are able to build and analyze complex studies. You don’t need to be Einstein in order to realize that this can not end well. Years of practical experience in data analysis can not be replaced by a few courses in statistics.

Of course, my above statement that we should not teach them statistics is highly exaggerated. However, the more I think about what we should teach students the more I stick with this statement: You can’t really interpret statistical results without knowing the methods behind it. Instead, statisticians should (a) be available for each research group and (b) be better trained in translating complex results.

By: Phil

Phil — Sat, 12 Mar 2016 00:42:19 +0000

In reply to Phil.

Daniel: You sort of beat me to it.

The reason gravity is not negligible in spite of being very very weak in a seemingly appropriate comparison is that gravitational forces add while EM forces can cancel (and, in the normal course of things, do cancel). Get 10^30 atoms together, while the 10^30 protons are having their forces canceled out by 10^30 electrons, and suddenly the shoe is on the other foot.

And that real-world example shows of why it’s not obvious to me that physically important dimensionless numbers couldn’t be a thousand orders of magnitude bigger (or smaller) than they are. There could be forces (indeed, there are forces) that fall off exponentially with distance rather than in a power-law, and there are forces that get stronger with distance rather than weaker. There could be a force that falls off like r^-10, or a force that is strong on a distance scale of a meter but negligible at a distance of 10 meters. Or there could be forces that don’t add (like gravity) but that multiply. In some of these cases, if you created a natural-seeming dimensionless quantity you could end up with a ridiculously large number.

…Which isn’t to say I think your argument is wrong, I’m just still not convinced that the “dimensionless numbers are always near 1” observation is trivial. Yes, maybe that “natural-seeming” dimensionless quantity wouldn’t seem natural if you knew it was going to be 10^500,000, I recognize that. But there’s an alternative argument that looks at the question a different way: what if, somehow, all of the important quantities were the same except gravity were, say, 10^100 times weaker compared to all of the other forces. Just turn down G. Well, galaxies and stars wouldn’t form, there would be no fusion, the only element would be hydrogen, there would be no life, etc. If someone outside the universe were to look at the physics, they’d say “there’s no chance of life in that universe; no sentient being will ever exist there who can ponder the relative magnitudes of the forces.” (Indeed, just turn down G by a factor of 1000 and you probably get this case).

I recognize that there is a connection between your way of looking at it and the one that I just raised, but the connection isn’t all that tight. Or maybe it is. I’m still not sure.

By: Daniel Lakeland

Daniel Lakeland — Fri, 11 Mar 2016 20:52:52 +0000

In reply to Phil.

Phil, another relevant number is the number of atoms in the universe. That’s around say 10^80, which means that an effect which is 10^-80 times smaller than say the electric force could conceivably add up to be around the same size as the electric force when considered at the scale of the entire universe… anything smaller than that couldn’t ever “add up” over all the particles in the universe to be non-negligible.

That puts a kind of scale at least for the logarithm of the smallest dimensionless group we could likely detect, at least assuming effects that are linear in the number of particles.

By: Daniel Lakeland

Daniel Lakeland — Fri, 11 Mar 2016 19:31:54 +0000

In reply to Phil.

Phil, I understand your argument too. I think the two arguments have to go together to give the full picture:

1) Things that are really tiny compared to things that we notice are just… too tiny to notice… If gravity were 10^5000000 times weaker than it is, we just wouldn’t even know about gravity.

2) Whenever we have an option on how to calculate a dimensionless ratio we prefer to calculate it so that it makes things O(1) in the problem at hand.

Gravity is really weak at atomic scales G m_e^2/ k q_e^2 = 2.4e-43 (m_e = electron mass, q_e = electron charge)

but Gravity is much stronger when applied to two objects the mass of the earth each with a single electron charge…

If there were no logically achievable scale at which gravity had a nonnegligible effect (ie. if gravity were still terribly weak when two galaxies were considered relative to if the two galaxies had a unit electron net charge…) then we wouldn’t know that gravity existed!

Physics dictates that we are aware of all the “strongest” effects. Since we tend to like to make things O(1) we will tend to compare things to the strongest effect at work in our problem. For nuclear physicists that’s the strong nuclear force, for engineers it’s the electrical force (material strength), and for astrophysicists it’s the gravitational force.

That explains why dimensionless constants aren’t typically bigger than 10^50 (or even maybe 10^2), since we tend to put the big part in the denominator.

If there’s a fundamental thing that is negligible in all these cases… ie. it’s relative size is 10^-50 or smaller in each case… then we don’t know about it because it’s always negligible. That explains why dimensionless groups are never smaller than 10^-50 or whatever.

put together we get this result that all the dimensionless groups we calculate are between about 10^-50 and say 100 (or 10^50 if you like).

By: Phil

Phil — Fri, 11 Mar 2016 18:57:08 +0000

In reply to Phil.

Daniel,
I understand your point. I do. I’m not sure what I can do to convince you of this, other than to say it again.

And I will even say that I am in partial agreement with it. Maybe full agreement. I’m just not quite sure…and it’s not just me but genuinely smart people like, uh, I think it was Fermi but maybe it was somebody else, but it was somebody at that level. But I’m not 100% sure (and neither was Fermi-or-whoever. Looking at the ratio of the strength of the forces in a nucleus is a natural thing to do: what is the ratio of the gravitational force between protons to the electromagnetic force? The force of gravity? And so on. If you do this, you find that gravity is very very weak. There’s a nice story (probably apocryphal, or possibly exaggerated from a related story) about Feynman giving a lecture about this, and he compares the forces and shows that gravity is weaker by a factor of 10^36 and says “…so gravity is very very weak. Really almost incredibly weak.” And just then a speaker that was hanging from a mount on the ceiling fell to the floor with a startling crash. Feynman said “Weak, but not negligible!”

Anyway, OK, you calculate the ratios and you find that gravity is weaker by a factor of 10^36. What if it had come out to be 10^5000000? Your argument is “well, then people wouldn’t talk about that ratio, then. They would calculate some other ratio, like the ratio of the gravitational force between two protons in a nucleus to the electromagnetic force between two protons that are separated by the diameter of the universe,” or something like that. I just don’t buy it. Indeed, even if you do calculate that number I just invented, let’s call it the Phil number, it’s not going to help: the universe is only about 10^16 meters in diameter, that’s about 10^30 atomic nuclei. Square that and you get 10^-60. So if you separate two protons by the width of the universe, you only decrease the force between them by a measly factor of 10^60. If the EM force between them were 10^5000000 times larger than the gravitational force when the distance was a nuclear diameter, then Phil number is still going to be 10^49999904.

Essentially, you are making a “garden of forking paths” argument — there are lots and lots of dimensionless numbers we can calculate, and of course we pick the ones that are in a ‘reasonable’ range. I understand this argument. I do. It may even be right. But I am not convinced by it.

By: Robert Grant

Robert Grant — Fri, 11 Mar 2016 14:34:19 +0000

In reply to Peter Chapman.

I agree with the description of the problem, in other words, too many statisticians function like a service desk within their organisation. Or like a plumber. You try to do stuff yourself, screw it up, call them in for some unpleasant stuff, and after they’ve left you laugh at their mannerisms and chat idly about how they didn’t have a clue (“biological significance!”, “rich qualitative data!”) and you’d have done a better job yourself. And because you always need just one more great idea investigated, and don’t have more money, you do extra bits yourself. Repeat.

But not the solution. It doesn’t matter how you teach because these dilettanti are not in the stats class anyway, nor for that matter are their teachers, or their bosses. Open data is a big help because studies done without a statistician can increasingly be tested to destruction afterwards by people who know what they’re doing. Statisticians reviewing publications severely is another one. A few painful experiences in public (or in front of the board) and your colleagues will soon shape up. We will make enemies along the way, but hey! we are so hard to recruit and so many projects rely on us that we are pretty much unsackable.

By: Keith O'Rourke

Keith O'Rourke — Fri, 11 Mar 2016 13:21:58 +0000

In reply to Andrew. > The garden of forking paths is what makes the perpetual p-value machine run. Nice. Not having an adequate representation (model) of the garden of forking paths one has gone through, the p-value machine appears to run well.

By: Peter Chapman

Peter Chapman — Fri, 11 Mar 2016 09:43:18 +0000

I spent my career interacting with statistics users who were not trained statisticians, They were chemists, biologists, ecologists, agricultural scientists and so on; ie people who had to use statistics to get their work completed. It was obvious to me that they had never been told that the words that statisticians use, most notably significance and error, had quite precise meanings within the context of statistics. My most vivid memory is of an experiment in which I was trying to determine the magnitude of different sources of variation in a factory setting involving a formulated pesticide. The QA samples were showing that a high proportion of batches were out of spec. I eventually pinned the problem down to inadequate mixing in a 5000 liter container at the factory. My report referred to all sources, including laboratory error, which was tiny. However the analytical chemist on the team was insulted by the idea that he was introducing error into the process and complained to his manager. And people were always banging on about biological significance being much more important than statistical significance, not understanding that statistical significance (or lack of it) determined whether they should even look for biological statistics.

The proximate cause of the problem is: we don’t teach the right things. The ultimate cause is: I don’t really know but I suspect that is because statistics is an academic subject not a professional one – if civil engineering relied on poorly trained people to build bridges, many of the bridges would fall down. Solution: a radical change in what and how we teach the subject.

By: Martha (Smith)

Martha (Smith) — Fri, 11 Mar 2016 03:29:15 +0000

In reply to Andrew.

I’m not sure that this is a response to what I said, which was (with emphasis added)
“*perhaps* the problem is really using defaults *as a substitute for thinking*”

By: Daniel Lakeland

Daniel Lakeland — Fri, 11 Mar 2016 02:58:26 +0000

In reply to Phil. In fact, the best dimensional analysis has as its goal making things all have a similar scale, the value 1 is the obvious common reference, so usually we try to make something be "like 1".

By: Fernando

Fernando — Fri, 11 Mar 2016 01:39:54 +0000

In reply to Curious.

Curious:

That is a better way to put it.

I understood the “good advice” was in fact meant to be bad.

By: Andrew

Andrew — Fri, 11 Mar 2016 01:16:00 +0000

In reply to Laplace. Laplace: I do not recommend that people perform hypothesis tests. But people do perform hypothesis tests, and they use them to make strong statements. I point out the garden of forking paths as a way to explain how they could be getting those gaudy p-values even in the absence of any effect. Here's an analogy. Someone comes to you with a perpetual motion machine. You just laugh, but various people keep insisting: hey, I don't care what your theory says, I see that machine spins and spins on its own. So it could be worth your while to take the machine apart and figure out what makes it run. The garden of forking paths is what makes the perpetual p-value machine run.

By: Andrew

Andrew — Fri, 11 Mar 2016 01:12:47 +0000

In reply to Martha (Smith). Martha: No, I wouldn't put it that generally. Defaults are useful. This particular default can be useful too, but not if you use it to make this particular inference. That's why it's a trap. A lion in a lion cage is not a trap. A lion in a koala cage, that's a trap. Default uniform priors are a sort of koala cage, they are used in all sorts of settings and often work well. But not if you use them to compute Pr(theta > 0 | y).

By: Daniel Lakeland

Daniel Lakeland — Thu, 10 Mar 2016 22:37:12 +0000

In reply to Laplace. All that stuff being said, I agree with you that "Garden of Forking Paths" means little more than "p < 0.05 is easy to find"

By: Martha (Smith)

Martha (Smith) — Thu, 10 Mar 2016 22:28:46 +0000

In reply to zbicyclist. +1

By: Martha (Smith)

Martha (Smith) — Thu, 10 Mar 2016 22:25:28 +0000

In reply to Andrew. So perhaps the problem is really using defaults as a substitute for thinking? (I think P. I. Good once asserted that the most common mistake in using statistics is automatically using the default. I believe he was referring specifically to using software, but the maxim seems to apply generally.)

By: Phil

Phil — Thu, 10 Mar 2016 22:21:55 +0000

In reply to Phil.

Daniel:
I understand your point, and could even add another, which is that there are degrees of freedom in how the dimensionless numbers are defined and if they were inconveniently distant from zero we might define them differently. An example is the ratio of the fine structure constant to the gravitational coupling constant, where there’s the choice of choosing “the gravitational coupling between what and what?” It’s normal to use two protons in a nucleus, but if things were very different maybe we would use two galaxies at a typical galactic distance, or two quarks at a proton diameter.
I have gone back and forth about this issue, for these reasons, but I’m not completely convinced that there’s not more to it. At least I think there might be room for the “weak anthropomorphic principle”: if some of these numbers differed by many orders of magnitudes (or many millions of orders of magnitudes) the result would be a universe that can’t support life of any kind.

bxg: Yes, I was just illustrating that something very close to Andrew’s problem could actually make sense — you could reasonably do an infinitely wide prior — and that you would indeed be willing to make the bet. You’re right that there are other situations, less similar to Andrew’s original one, where you should also make the bet.

By: Martha (Smith)

Martha (Smith) — Thu, 10 Mar 2016 22:17:01 +0000

In reply to Noah Motion. +1 to last paragraph (although my impression is that frequentists are more likely than Bayesians to neglect the point that results are conditional on the model.)

By: Daniel Lakeland

Daniel Lakeland — Thu, 10 Mar 2016 22:08:25 +0000

In reply to Laplace.

The Garden Of Forking Paths is in my opinion a heuristic that explains to people why the following things are true:

1) It’s hard (ie. requires a lot of tries) to use a pre-chosen “null” random number generator to generate a random dataset that has small p value.

2) It’s easy to think up a random number generator that people will plausibly consider “null” which nevertheless makes your actual data have a small p value under that RNG.

The first is true because of the frequency properties of random number generators… the second is true because of the large potential search space of “plausibly null RNGs” so that even if you don’t search hard, it’s easy to find one.

The forking paths thing is just showing how the search space grows exponentially with the number of little tweaks you have available to you. something like N = exp(Kn) for n the number of knobs you have to tweak, and K something like log(20) or so for typical conditions in social sciences. So 20^4 = 160000 you have a lot of things to choose from, and you don’t have to search to hard to find one.

Garden of Forking Paths is just pointing out how situation (1) and situation (2) are in no way connected… and yet we’ve been teaching people to expect that they are.

By: Todd Hartman

Todd Hartman — Thu, 10 Mar 2016 21:20:10 +0000

Good advice: Use robust standard errors whenever you worry that regression assumptions are violated.

How it can mislead: Using robust standard errors creates a false sense of security in many instances and does not correct biased estimates resulting from model misspecification, measurement error, etc.

By: Laplace

Laplace — Thu, 10 Mar 2016 20:51:37 +0000

In reply to Andrew.

“that p-value is meaningless because of the garden of forking paths.” That’s like saying “your Cristal Ball is worthless because the tooth fairy told me so.”

It was obvious a long time ago the (hypothetical and unknowable) frequency a procedure yields the truth is not a measure of how strongly the facts/assumptions in the case seen favor different hypothesis. This Garden of Forked Paths stuff should have driven this home so well no one could deny it. Instead statisticians have doubled down and now take it for granted an inference is greatly affected by the hypothetical decisions a researcher would make if they lived in a different universe with different data.

Great job Andrew. That little piece of insanity should keep Frequentists in business for another generation or two.

It would be nice though if Bayesians at least would acknowledge the “Garden of Forked Paths” is a meaningless incantation which can be cast over any bad-looking p-value to explain away it’s failure. It’s no different than when a prophet excuses every prediction failure by saying “you didn’t believe hard enough”. It’s chief purpose is to make p-values unfalsifiable. It explains every embarrassment and is quietly forgotten whenever p-values seem to work reasonably.

By: Wayne Folta

Wayne Folta — Thu, 10 Mar 2016 20:39:44 +0000

In reply to Ram.

zbicyclist: Somewhere in the Training/Validation/Test distinction is the idea of climbing a gradient, with resultant overfitting. If you try even a single model on just training data, you’ll have no idea how well it will generalize, since your training will climb the training data gradient by definition. If you try multiple models (including of course, variations — variable selection, feature engineering, etc — on a single technique or different techniques) with Training/Validation data you will end up climbing the gradient of the Validation data results and you’ll have no idea how it will generalize. (By definition the model/version that learned your validation best is declared the winner.) So you need yet one more level of holdout to determine how your champion model might generalize.

Cross-validation (CV) uses data more efficiently and yields distributions rather than point estimates, so it’s like holdout data, but better. But not fundamentally different: just as you need Training/Validation/Test holdouts, you need nested CV if you’re tweaking on things and comparing models. Some folks seem to think that CV is so magical that non-nested-CV — i.e. the kind provided by their software package’s regression or classification routines — will save them, but it won’t.

By: Curious

Curious — Thu, 10 Mar 2016 20:31:03 +0000

In reply to Fernando.

You have it backwards:

Good advice: Missing data is a causal problem.

How it can mislead: Causal inference is a missing data problem

By: zbicyclist

zbicyclist — Thu, 10 Mar 2016 19:50:30 +0000

In reply to Ram.

Absolutely. In fact, it’s common now with machine learning work to split the data into 3 pieces.

Training (where a wide variety of techniques may be used)
Validation (validation of the training approach)
Test (completely out of sample)

For example, I read a paper this morning that splits the data this way (60%, 15%, 25%).
Lack of data is not their problem; they used 1.5 million observations.

By: Daniel Lakeland

Daniel Lakeland — Thu, 10 Mar 2016 19:17:20 +0000

In reply to Phil.

Phil:

Dimensionless numbers are the ratios between things of the same dimension… obviously. So when it comes to determining the ratio of two things, if one of them is too small for us to measure and even for us to notice that it exists…. then we’re never going to notice it exists… so there could be like a trillion forces each of which is 10^10^10^50 times smaller than say the electromagnetic force… but we wouldn’t notice them.

By: Anonymous

Anonymous — Thu, 10 Mar 2016 19:08:19 +0000

In reply to Laplace. To be clear: if there are rounds whizzing over your head and you assume they're not meant for you, you'll stand up and risk one of those rounds accidentally hitting you in the head? What kind of analogy is this?

By: mark

mark — Thu, 10 Mar 2016 18:57:29 +0000

In reply to Andrew. I think that the average Marine does not need a boost in testosterone - or need to make riskier decisions.

By: Fernando

Fernando — Thu, 10 Mar 2016 18:50:21 +0000

Good advice: Causal inference is a missing data problem

How it can mislead: Missing data is a causal problem

Why it’s misleading: Circularity.

By: bxg

bxg — Thu, 10 Mar 2016 18:50:11 +0000

In reply to Phil.

Suppose the prior was uniform [-100, 100]. Or N(0, 50).
Isn’t the meat of the original example going to hold: you see y = 1, you’d still be quite convinced y > 0. (Or if y = 5, than y > 4, etc.)

I really think that uniform over _all_ possible theta is just a confusing distraction here. It’s tempting to criticize it because what does it really mean?, etc, and we know an improper prior like this can lead to bizarre
problems. Yet I think it’s nearly irrelevant to this actual example.

By: Wayne Folta

Wayne Folta — Thu, 10 Mar 2016 18:42:38 +0000

In reply to Wayne Folta.

Correction: “… it can certainly influence the priors in unexpected ways.” should say “posterior”.

By: Noah Motion

Noah Motion — Thu, 10 Mar 2016 18:39:58 +0000

In reply to Andrew.

I guess my point is that it’s not the advice that’s leading people astray. As you say at the beginning of the post, 1 and 2 (and 3 and 4) are good, solid, and reasonable. If the action is all in the additional assumptions and mistaken implications, it seems strange to say that it’s the advice that’s misleading anyone.

The new item (since my comment) allows for a nice, simple illustration of how the misleading is due to the person not taking the advice for what it’s worth. If A then B does not imply if not-A then not-B.

I wish more Bayesians (and frequentists, for that matter) would clearly state the bit about everything being conditional on the model more often. That seems very important to me, and I try to keep it in mind when I analyze data and build/fit statistical models.

By: Wayne Folta

Wayne Folta — Thu, 10 Mar 2016 18:29:38 +0000

In reply to Andrew.

The uniform prior issue has been looming large in my studies lately. It really seems to be an orphan without a home…

The “objective” Bayesian says, “A uniform prior is one of those ‘subjective’ Bayesian things. It’s certainly not objective; it can certainly influence the priors in unexpected ways.”

The “subjective” Bayesian replies, “No, it can’t possibly be ‘subjective’, since it’s improper and hence cannot reflect any knowledge or belief.”

The “uniform-distribution” Bayesian says, “But a uniform prior seems to be consistent with the Principle of Indifference. Besides, it tends to give the same answers as Frequentist procedures, so I’m more popular with my peers.”

The “empirical” Bayesian starts laughing at this statement, to which the “uniform-distribution” Bayesian retorts, “And who is it that has no problems using Frequentist procedures to give them their priors? Hmmm?”

I really wonder if an (unbounded) uniform prior can fit into any Bayesian philosophy. Seems like it’s a holdover from the early days and it’s sometimes convenient as a first-step when you step up your model from a frequentist regression to a Bayesian regression and want to confirm that the results are similar… If it doesn’t result in your Bayesian regression taking significantly longer to run than it should.

By: Phil

Phil — Thu, 10 Mar 2016 18:25:35 +0000

In reply to Andrew. Just to elaborate on the obvious here: if you're going to be getting a lot of data it can make sense to use a uniform prior like this, but if you're going to try to draw inferences from a few data points then it doesn't. There's virtually no situation in which you really think "the answer could be between 0 and 1, or between 1,000,000,000,000,000 and 1,000,000,000,000,001 and these are equally likely. But let's contrive a situation that is not so far away from the problem. Or, rather, it will still be infinitely far away but it will capture the same essential features. Suppose I had proposed the following proposition: I will generate a random number theta between 10^-10 and 10^10, and will then do one draw from N(theta,1) and tell it to you. Let's call it y. You will then have the option of a wager: if the true value of theta is < (y-1) then you pay me $5, otherwise I pay you $1. I'd say that if we play the game and get y=1 then yes, you should take the bet. Shouldn't you? (But really you should be very very suspicious that I have done something wrong, because what are the odds that I'd come up with a number so close to zero, or indeed a number with fewer than 8 zeros? (This is a hypothetical question, we all know about what the odds are). This puts me in mind of the observation...I think it was Fermi... that there's something screwy about the universe because all of the dimensionless numbers are near 1, by which he meant between about 10^-40 and 10^40 or something. The fine structure constant is about 1/137, for crying out loud! Why don't any of them come out to be 10^10^10^50 or something? There are a lot more numbers outside the range 10^-40 - 10^40 than there are inside it, I think we can all agree on that.

By: bxg

bxg — Thu, 10 Mar 2016 18:23:02 +0000

In reply to Andrew.

I think the way you are presenting #3 is inviting misinterpretation.
You say the prior has a problem, but maybe that points people in the direction of thinking about issues with improper priors – even though (I think) that’s largely irrelevant to your complaint.
You call the result “bad” and “unreasonable”, without much or any qualification, which can sound
as though is intendeded to illustrate how Bayesianism per se is wrong. But the answer here could well be fine (e.g. if, in the actual application domain, 0 was just another number like any other.)

However, it _seems_ your point is that people can be oblivious to how the specific prior can affect results, and therefore pay too little regard to them – and expect universally good results from a default choice. (This isn’t even an example of disturbingly high sensitivity to the prior; the fault of the uniform prior – if “pure noise” is a particular consideration – isn’t subtle or small.)

I don’t think your original post makes this very clear (well, assuming I am understanding your point.)

By: Andrew

Andrew — Thu, 10 Mar 2016 17:58:51 +0000

In reply to Phil L. Phil: It's uniform on (-infinity, +infinity). And yes, the problem is with the prior, not with Bayes' rule.