“The issue of how to report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.”

Ricardo Vieira writes:

I recently came upon this study from Princeton published in PNAS:

Implicit model of other people’s visual attention as an invisible, force-carrying beam projecting from the eyes

In which the authors asked people to demonstrate how much you have to tilt an object before it falls. They show that when a human head is looking at the object in the direction that it is tilting, people implicitly rate the tipping point as being lower than when a person is looking in the opposite direction (as if the eyes either pushed the object down or prevented it from falling). They further show that no such difference emerges when the human head is blindfolded. The experiment a few times with different populations (online and local) and slight modifications.

In a subsequent survey, they found that actually 5% of the population seems to believe in some form of eye-beams (or extramission if you want to be technical).

I have a few issues with the article. For starters, they do not compare directly the non-blindfolded and blindfolded conditions, although they emphasize that the difference in the first is significant and in the second is not several times. This point was actually brought up in the blog Neuroskeptic. The author of the blog writes:

This study seems fairly solid, although it seems a little fortuitious that the small effect found by the n=157 Experiment 1 was replicated in the much smaller (and hence surely underpowered) follow-up experiments 2 and 3C. I also think the stats are affected by the old erroneous analysis of interactions error (i.e. failure to test the difference between conditions directly) although I’m not sure if this makes much difference here.

In the discussion that ensued, one of the study authors responds to the two points raised. I feel the first point is not that relevant, as the first experiment was done on mturk and the subsequent ones in a controlled lab, and the estimated standard errors were pretty similar across the board. Now on to the second point, the author writes:

The issue of how the report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly. First, it should be noted that each of the bars shown in the figure is already a difference between two means (mean angular tilt toward the face vs. mean angular tilt away from the face), not itself a raw mean. What we report, in each case, is a statistical test on a difference between means. If I interpret your argument correctly, it suggests that the critical comparison for us is not this tilt difference itself, but the difference of tilt differences. In our study, however, I would argue that this is not the case, for a couple of reasons:

In experiment 1 (a similar logic applies to exp 2), we explicitly spelled out two hypotheses. The first is that, when the eyes are open, there should be a significant difference between tilts toward the face and tilts away from the face. A significant different here would be consistent with a perceived force emanating from the eyes. Hence, we performed a specific, within-subjects comparison between means to test that specific hypothesis. Doing away with that specific comparison would remove the critical statistical test. Our main prediction would remain unexamined. Note that we carefully organized the text to lay out this hypothesis and report the statistics that confirm the prediction. The second hypothesis is that, when the eyes are closed, there should be no significant difference between tilts toward the face and tilts away from the face (null hypothesis). We performed this specific comparison as well. Indeed, we found no statistical evidence of a tilt effect when the eyes were closed. Thus, each hypothesis was put to statistical test. One could test a third hypothesis: any tilt difference effect is bigger when the eyes are open than when the eyes are closed. I think this is the difference of tilt differences asked for. However, this is not a hypothesis we put forward. We were very careful not to frame the paper in that way. The reason is that this hypothesis (this difference of differences) could be fulfilled in many ways. One could imagine a data set in which, when the eyes are open, the tilt effect is not by itself significant, but shows a small positivity; and when the eyes are closed, the tilt effect shows a small negativity. The combination could yield a significant difference of differences. The proposed test would then provide a false positive, showing a significant effect while the data actually do not support our hypotheses.

Of course, one could ask: why not include both comparisons, reporting on the tests we did as well as the difference of differences? There are at least two reasons. First, if we added more tests, such as the difference of differences, along with the tests we already reported, then we would be double-dipping, or overlapping statistical tests on the same data. The tests then become partially redundant and do not represent independent confirmation of anything. Second, as easy as it may sound, the difference-of-differences is not even calculatable in a consistent manner across all four experiments (e.g., in the control experiment 4), and so it does not provide a standardized way to evaluate all the results.

For all of these reasons, we believe the specific statistical methods reported in the manuscript are the simplest and the most valid. I totally understand that our statistics may seem to be affected by the erroneous analysis of interactions error, at first glance. But on deeper consideration, analyzing the difference-of-differences turns out to be somewhat problematical and also not calculatable for some of our data sets.

Is this reasonable?

My other issues relates to the actual effect. First the size of the difference is not clear (average difference is around 0.67 degrees, which are never described in terms of visual angle). I tried to draw two lines separated by 0.67 degrees on Paint.net, and I couldn’t tell the difference unless they were superimposed, but I am not sure I got the scale correct. Second, they do not state in the article how much rotation is caused by each key-press (is this average difference equivalent to one key-press, half, two?). Finally, the participants do not see the full object rendered during the experiment, but just one vertical line. The authors argue that otherwise people would use heuristics such as move the top corner over the opposite bottom corner. This necessity seems to refute their hypothesis (if the eye-beam bias only work on lines, than they seem of little relevance to the 3d world).

Okay, perhaps what really bothers me is the last paragraph of the article:

We speculate that an automatic, implicit model of vision as a beam exiting the eyes might help to explain a wide range of cultural myths and associations. For example, in StarWars, a Jedi master can move an object by staring at it and concentrating the mind. The movie franchise works with audiences because it resonates with natural biases. Superman has beams that can emanate from his eyes and burn holes. We refer to the light of love and the light of recognition in someone’s eyes, and we refer to death as the moment when light leaves the eyes. We refer to the feeling of someone else’s gaze boring into us. Our culture is suffused with metaphors, stories, and associations about eye beams. The present data suggest that these cultural associations may be more than a simple mistake. Eye beams may remain embedded in the culture, 1,000 y after Ibn al-Haytham established the correct laws of optics (12), because they resonate with a deeper, automatic model constructed by our social machinery. The myth of extramission may tell us something about who we are as social animals.

Before getting to the details, let me share my first reaction, which is appreciation that Arvid Guterstam, one of the authors of the published paper, engaged directly with external criticism, rather than ignoring the criticism, dodging it, or attacking the messenger.

Second, let me emphasize the distinction between individuals and averages. In the above-linked post, Neuroskeptic writes:

Do you believe that people’s eyes emit an invisible beam of force?

According to a rather fun paper in PNAS, you probably do, on some level, believe that.

And indeed, the abstract of the article states: “when people judge the mechanical forces acting on an object, their judgments are biased by another person gazing at the object.” But this finding (to the extent that it’s real, in the sense of being something that would show up in a large study of the general population under realistic conditions) is a finding about averages. It could be that everyone behaves this way, or that most people behave this way, or that only some people behave this way: any of these can be consistent with an average difference.

Also Neuroskeptic’s summary takes a little poetic license, in that the study does not claim that most people believe that eyes emit any force; the claim is that people on average make certain judgments as if eyes emit that force.

This last bit is no big deal but I bring it up because there’s a big difference between people believing in the eye-beam force and implicitly reacting as if there was such a force. The latter can be some sort of cognitive processing bias, analogous in some ways to familiar visual and cognitive illusions that persist even if they are explained to you.

Now on to Vieira’s original question: did the original authors do the right thing in comparing significant to not significant? No, what they did was mistaken, for the usual reasons.

The author’s explanation quoted above is wrong, I believe in an instructive way. The author talks a lot about hypotheses and a bit about the framing of the data, but that’s not so relevant to the question of what can we learn from the data. Procedural discussions such as “double-dipping” also miss the point: Again, what we should want to know is what can be learned from these data (plus whatever assumptions go into the analysis), not how many times the authors “dipped” or whatever.

The fundamental fallacy I see in the authors’ original analysis, and in their follow-up explanation, is deterministic reasoning, in particular the idea that whether a comparison is “statistically significant” is equivalent to an effect being real.

Consider this snippet from Guterstam’s comment:

The second hypothesis is that, when the eyes are closed, there should be no significant difference between tilts toward the face and tilts away from the face (null hypothesis).

This is an error. A hypothesis should not be about statistical significance (or, in this case, no significant difference) in the data; it should be about the underlying or population pattern.

And this:

One could imagine a data set in which, when the eyes are open, the tilt effect is not by itself significant, but shows a small positivity; and when the eyes are closed, the tilt effect shows a small negativity. The combination could yield a significant difference of differences. The proposed test would then provide a false positive, showing a significant effect while the data actually do not support our hypotheses.

Again, the problem here is the blurring of two different things: (a) underlying effects and (b) statistically significant patterns in the data.

A big problem

The error of comparing statistical significance to non-significance is a little thing.

A bigger mistake is the deterministic attitude by which effects are considered there or not, the whole “false positive / false negative” thing. Lots of people, I expect most statisticians, don’t see this as a mistake, but it is one.

But an even bigger problem comes in this sentence from the author of the paper in question:

The issue of how the report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.

He’s “quite sure”—but he’s wrong. This is a big, big, big problem. People are so so so sure of themselves.

Look. This guy could well be an excellent scientist. He has a Ph.D. He’s a neuroscientist. He knows a lot of stuff I don’t know. But maybe he’s not a statistics expert. That’s ok—not everyone should be a statistics expert. Division of labor! But a key part of doing good work is to have a sense of what you don’t know.

Maybe don’t be so quite sure next time! It’s ok to get some things wrong. I get things wrong all the time. Indeed, one of the main reasons for publishing your work is to get it out there, so that readers can uncover your mistakes. As I said above, I very much appreciate that the author of this article responded constructively to criticism. I think it’s too bad he was so sure of himself on the statistics, but even that is a small thing compared to his openness to discussion.

I agree with my correspondent

Finally, I agree with Vieira that the last paragraph of the article (“We speculate that an automatic, implicit model of vision as a beam exiting the eyes might help to explain a wide range of cultural myths and associations. For example, in StarWars, a Jedi master can move an object by staring at it and concentrating the mind. The movie franchise works with audiences because it resonates with natural biases. Superman has beams that can emanate from his eyes and burn holes. We refer to the light of love and the light of recognition in someone’s eyes, and we refer to death as the moment when light leaves the eyes. We refer to the feeling of someone else’s gaze boring into us. Our culture is suffused with metaphors, stories, and associations about eye beams. The present data suggest that these cultural associations may be more than a simple mistake. Eye beams may remain embedded in the culture, 1,000 y after Ibn al-Haytham established the correct laws of optics (12), because they resonate with a deeper, automatic model constructed by our social machinery. The myth of extramission may tell us something about who we are as social animals.”) is waaaay over the top. I mean, sure, who knows, but, yeah, this is story time outta control!

P.S. One amusing feature of this episode is that the above-linked comment thread has some commenters who seem to actually believe that eye-beams are real:

If “eye beam” is the proper term then I have no difficulty in registering my belief in them. Any habitué of the subway is familiar with the mysterious effect where looking at another’s face, who may be reading a book or be absorbed in his phone, maybe 20 or 30 feet away, will cause him suddenly to swivel his glance toward the onlooker. Let any who doubt experiment.

Just ask hunters or bird watchers if they exist. They know never to look directly at the animals head/eyes or they will be spooked.

I have had my arse saved by ‘sensing’ the gaze of others. This ‘effect’ is real. Completely subjective…yes. That I am here and able to write this comment…is a fact.

No surprise, I guess. There are lots of supernatural beliefs floating around, and it makes sense that they should show up all over, including on blog comment threads.

34 thoughts on ““The issue of how to report the statistics is one that we thought about deeply, and I am quite sure we reported them correctly.”

  1. “Let any who doubt experiment.” I can’t find a reference right now, but Dan Simons explicitly tested the ‘can you feel when someone is looking at you’ idea. People cannot.

  2. The eyes look like they are emitting (+ reflecting?) more IR than the rest of the body here. Especially look during exercise at ~34 seconds: https://www.youtube.com/watch?v=vE3DVtJSmx4

    Also, I have a 4 watt ham radio that can transmit UHF/VHF (wavelengths on the order of the height of an adult human: 0.5 – 2.2 meters) a couple of miles in an urban setting. The human body supposedly uses ~100 W total (20 W for brain). It wouldn’t surprise me if we eventually discover that humans are emitting and detecting some form of highly compressed/encrypted information to each other this way.

  3. Andrew,
    You write “Look. This guy could well be an excellent scientist. He has a Ph.D. He’s a neuroscientist. He knows a lot of stuff I don’t know. But maybe he’s not a statistics expert. That’s ok—not everyone should be a statistics expert. Division of labor! But a key part of doing good work is to have a sense of what you don’t know.”

    That’s all well and fine. But you already put your finger on the problem with this diagnosis above,

    “A bigger mistake is the deterministic attitude by which effects are considered there or not, the whole “false positive / false negative” thing. Lots of people, I expect most statisticians, don’t see this as a mistake, but it is one.”

    So the deeper trouble IMO is that a lot of card-carrying experts, or folks who are looked to as such, continue to operate on some variant of this NHST paradigm. If the Big Names in the field don’t have a functional consensus on how to approach statistical inference problems – or even agree on fatal flaws in e.g. NHST – simply recommending collaboration with statisticians (division of labor) won’t help much.

    My $0.02.

  4. Andrew said:
    “Consider this snippet from Guterstam’s comment:

    The second hypothesis is that, when the eyes are closed, there should be no significant difference between tilts toward the face and tilts away from the face (null hypothesis).

    This is an error. A hypothesis should not be about statistical significance (or, in this case, no significant difference) in the data; it should be about the underlying or population pattern.”

    I’m not quite sure how to make the point I want to make, so here’s a try that might not succeed:

    I think that there is a big problem out there with people confusing “statistical significance” with some type of “practical significance” or “meaningful significance”. One aspect of this is Andrew’s point that the “real” hypothesis is about what happens in the underlying or population pattern, not about what happens in the data. But in practice, we can’t get our hands on the underlying or population pattern, so any conclusion from the data has uncertainty attached to it. So I think two things are needed: One is to point out better methods of inference, and explain why they are better. A lot of people involved in this blog do try to point out better methods of inference, but maybe don’t spend as much time explaining why the alternative methods are better. But also, the idea of “practical significance” (independent of the method of or results of inference) needs to be given more attention — The basic idea of “practical significance” (“What difference in this parameter makes a difference in real world functioning?”) is of course core, but “How closely can we measure?” is also relevant — as well as “How closely can we estimate from the data?”

    • This is something that comes up often in my collaborations. A big part of it is that frequently scientists feel uncomfortable deciding what level of effect would be practically meaningful. In fact, I do not think it is hyperbolic to say that at least half of them are positive phobic about the idea. Now, in some cases this is justifiable. A PhD physiologist declining to decide what constitutes a practically meaningful effect (for, say, clinical purposes) might be being appropriately modest about his/her ability to make that judgment. But a clinical researcher in the same position is a victim of the myth that all research should be “objective.” Indeed, in my experience, “we can’t do that, it’s subjective” is the usual response given when this question is posed.

      Also, what is practically meaningful in one context may differ from what is practically meaningful for some other purpose. That’s why I think the best approach in most situations is simply to report the estimated effect size along with some measure of the uncertainty associated with it and let each “consumer” of the results draw their own conclusions about the practical importance of the finding.

      Finally, sometimes there are external forces that drive this problem. On one occasion I worked on a project studying the ability of a certain procedure to detect when a worker had skipped a required step in a certain process. The procedure actually worked fairly well, with an ROC area of about 0.9. Then came the question of what cutoff to operate it on in practice: which means trading off sensitivity and specificity. My direct collaborator went pale as a ghost when I explained what was involved. He bumped the decision up to his boss, who bumped it up to his. It ended up going all the way to the CEO who came back demanding a “statistically significant, objective, determination of the cutoff point.” When the reply came from me that there was no such thing and that somebody had to take responsibility for a judgment call on how many false negatives are acceptable to avoid one false positive, the project was abandoned, because “we will never get this past the union if it’s not objective.”

      • Addendum to the above: based on this and a couple of other such experiences, be very wary of getting involved in any “research” that is really a labor-management dispute in disguise. And if you do get involved, you should have very little expectation that either side is interested in truth.

  5. Statistics aside, what really bothers me is that researchers in top universities think that this is an important subject matter to study and that top journals are willing to publish the findings. What am I missing here?

    • My thought exactly. What does it matter how the statistics are reporting when the entire experiment is nonsense?

      I thought the motivating idea behind this blog was to avoid letting bad statistics get in the way of good science. This invisible eye beam stuff isn’t good science or even science at all in the first place.

      • Brent:

        Bad statistics allows this paper to be published which in turn crowds out good science. In addition, bad statistics is one reason that people think this sort of thing is good science. In our statistics classes we’ve trained scientists to think that causal identification + statistical significance = scientific discovery.

  6. There seems to me some fields are like “gardens of forking paths” in that the prior knowledge in the fields provides very little prior on where to go to look for meaningful discovery. A lack of training in statistics certainly exacerbates the problem, but something else has gone terribly wrong. Maybe I am just too pessimistic.

  7. I don’t actually care about this paper, but if one did, wouldn’t a better comparison be people vs. mannequins “looking” at the blocks, rather than people vs. blindfolded people? Presumably the question of interest is whether one unintentionally mis-uses gaze as a physical cue, or whether one unintentionally mis-uses the gazer’s *attention* as a physical cue, in which case one would want to have the same gaze without the intent behind it?

    • Raghuveer:

      Just in general, it’s my impression that researchers’ goals in such studies are: (1) causal identification and (2) statistical significance. There’s a belief that with (1) + (2), you’ve proved something and, if it hasn’t appeared in the literature already, you’ve made a discovery.

      The goal is to make a discovery. What exactly has been discovered is less important.

  8. My biggest problem with the study: they explicitly use the blindfold condition as a control for purposes of interpreting the implications of study results for their theory, but they analyze the data as if it were an independent test of their theory. Their argument is that they have conducted a test of the impact of angle minus blindfold, and then separately conducted a test of the impact of angle plus blindfold, and that they kept these separate because each provides “independent confirmation” of their base theory. This is absurd. The two hypotheses did not emerge independently from the theory–each implies the other. If you get an effect of angle in the no-blindfold condition, this is meaningless (in the context of their theory) unless you assume that you wouldn’t get one in the yes-blindfold condition (and vice versa). As a counterexample, had they conducted a test that people would estimate the temperature of a glass of water to be warmer when stared at (i.e., substituting superman’s heat vision for Jedi force vision), this would be an independent test.

    My other problem is one of design: the authors designed a factorial study, with factors angle X blindfold X direction, and then arbitrarily set one set of cells aside and called it an “independent” study. The study factors are nested so that head angle is in all the cells, which are:

    [A. no-blindfold, no-facing] [B. no-blindfold, yes-facing] [C. yes-blindfold, no-facing] [D. yes-blindfold, yes-facing]

    From a pure design perspective, based on the presence or absence of factors between cells, you would contrast (A X B), (C X D), (A+B X C+D). Not testing the latter implies you don’t think the blindfold has an effect. What’s more, their theory is supported in the yes-blindfold conditions (C and D) only if they get a null effect in both–thus, they are *literally* testing a null hypothesis in those cells!

    (I say the choice of where to split the design into two studies was arbitrary because I am giving them the benefit of the doubt that they weren’t trying to have their cake [conducting a conceptual contrast] and eat it, too [avoiding a statistical contrast]. It did catch my attention, though to read “we carefully organized the text to lay out this hypothesis” and “We were very careful not to frame the paper in that way.” They may well have done this for the statistical reasons they give, but it’s also likely they came up with those reasons after looking at the data.)

    Maybe I’m being too tough on the authors, at least in tone–as Andrew said, good for them that they actually responded constructively–but this is nothing compared to the reaction they’re going to get from the “watched pot never boils” research group!!!

  9. I suspect you can get 5% of a sample to say just about anything on a survey. A small number of people will (1) misunderstand the question and guess, (2) give a goofy answer just to be wise-asses, or (3) be random goofballs who actually believe what they say.

    I also don’t find it odd that there are some people who sincerely believe eyes emit rays. It is not ridiculous. Bats emit sonar pings, dolphins emit something or other, which I forget exactly, and some animals actually emit light, e.g., fireflies and those weird fish on the bottom of the ocean. Plus, some real scientists used to believe in eye-beams before the Ibn guy disproved it. Also, you can kind of tell what kind of environment you are in by making sounds — being in a closed room with hard walls sounds different than being outside.

    • Human eyes (and the rest of the body) do emit “rays”. Just look at the video I posted above. Whether anything interesting happens with these rays is another issue.

        • Sure. it is just that “do human eyes emit rays or not?” isn’t actually the question of interest. If I wasn’t so familiar with the NHST dominance of modern-day research I would think this is a pedantic distinction… but apparently no one is pointing it out.

        • Sure, it’s even true that the radiation pressure caused by the infrared emissions from the heat of the blood circulating next to the skin around the eye sockets actually does cause forces on objects… just several orders of magnitude too small to measure in any way.

        • It is possible to detect extremely weak EM signals with the right equipment though.

          Eg, my ham radio example. A cheap radio with 0.25 microvolt sensitivity (at 12 db SINAD)[1] and 50 ohm antenna[2] can detect a (0.25e-6)^2/50 = 1.25e-15 watt (~1 fW) signal[3]. This is about 1/1000 the power consumption of a single human cell.[4] I’m just saying there could be very weak signals being emitted/detected that look like noise.

          [1] https://baofengtech.com/uv-5r
          [2] https://baofengtech.com/nagoya-na-771
          [3] http://www.cantwellengineering.com/calculator/convert/uV
          [4] https://en.wikipedia.org/wiki/Orders_of_magnitude_(power)

        • Its probably off topic at this point but here is someone who claims a sensitivity of 0.85 uV (~0.15 fW):

          Having constructed these receive systems I decided to check the sensitivity and found that the 12 dB SINAD sensitivity, with the GaAsFET preamplifier inline, was approximately 0.085 microvolts as verified on several different pieces of test equipment!

          https://ka7oei.blogspot.com/2015/09/the-most-sensitive-repeater-on-earth.html

          They go on to estimate this is very near the theoretical ideal possible on Earth:

          This means that in a 50 ohm receive system that is terrestrially based (that is, the antenna receives a signal originating from the Earth’s surface) or in a test setup involving dummy loads/attenuators that are at room temperature the maximum sensitivity possible – no matter how good your receive system might be (using a standard 15 kHz FM voice channel) is approximately 0.09 microvolts (more or less) for 12 dB SINAD!

          Why are people so sure a similar biologically-based system cannot exist? The influence on surrounding organisms wouldn’t be drastic, more like a weak pheromone signal.

        • I have no problem with a statement like “people could plausibly sense Infrared emission from human faces” but it’s another thing entirely to say that humans can apply nonnegligible forces to objects through radiation pressure.

          From the Wikipedia page on radiation pressure

          “The radiation pressure of sunlight on earth is equivalent to that exerted by about a thousandth of a gram on an area of 1 square metre (measured in units of force: approx. 10 μN/m2).”

          So if you powder a Tylenol pill, take 1/1000 of it and then sprinkle that over a square meter of the earth, it’s weight is the pressure exerted by the sun, which is a gazillion times brighter than your eyes IR emissions.

          so whether you are looking at an object or not doesn’t affect its weight or sliding friction force or whatever

        • so whether you are looking at an object or not doesn’t affect its weight or sliding friction force or whatever

          I thought the issue was people sensing somehow that someone else nearby is looking at them or the same object they are focused on, not actual telekinesis:

          “when people judge the mechanical forces acting on an object, their judgments are biased by another person gazing at the object.”

        • My understanding is that people act as though other people looking at an inanimate object causes a force on that object… so for example if I see you looking at a block on a inclined plane in such a way that the radiation pressure from beams out of your eyes would push that block up the plane, I will tend to judge that the object would require more force to push it down the plane and cause it to slide… if your gaze comes from the other side, I will judge that less force is required to push it down the slide…

          in other words my estimates of forces on an object will be incorrectly biased in a way that is consistent with me subconsciously applying a kind of erroneous “gaze pressure” coming from your eyes.

        • I read the paper. They had subjects look at a picture of a person looking at a “tube”, then the tube was replaced by a vertical line. The subject then hit a key on the keyboard to tilt is as far as they thought the tube that had been their earlier would tilt before falling over. They don’t report the results in detail but say on average the subjects tilted it 17.7 degrees.

          Also that when the picture of a person’s head had uncovered eyes looking at the tube they tilted it ~0.6 degrees more on average when told to tilt towards the head vs away.

          So it isn’t really what I thought. Also, I would like to know the increments of tilt they used. How many key presses does 0.6 degrees correspond to?

        • Typos: “The subject then hit a key on the keyboard to tilt it [the line] as far as they thought the tube that had been there earlier would tilt before falling over.”

        • I tried tapping a key and imagining what it would be like. It becomes very tiresome after about 35 taps, which corresponds to 0.5 degree increments to get the average result. So I would bet the increment was at most half that (0.25 degrees) or else people would have quit early. In that case the difference would be 2-3 extra taps.

  10. I’m a little uncertain why eye-beams are even a part of the hypothesis. Would it not be more reasonable to have a hypothesis that people judge angles differently depending on their point of view? I’ve only read the article summaries in this post, and not the article itself so I fully admit I could have missed something that is obvious to everyone else.

  11. In all seriousness, it’s no wonder people can make a career of this kind of p-value fishing when apparently very intelligent, thoughtful and scientifically literate (or at least numerically and statistics literate) people are willing to debate possible mechanisms of “eye beams” and the like.

    For all the flaws in my educational indoctrination into the “NHST” framework, I clearly remember being told in my first Masters-level applied stats class that the framework only works when restricted to testing hypotheses with some clinically plausible mechanism associated with each hypothesis.

    There’s no statistical framework that can be created to make sense of “theories” that don’t even pass a basic smell test.

    • the framework only works when restricted to testing hypotheses with some clinically plausible mechanism associated with each hypothesis.

      What was the reasoning given behind this?

      I think it is true, in the sense that the null hypothesis is always false so all statistical significance does is measure the collective prior probability that something interesting is going on. If it is too low, the project wont get funded well enough to have a sufficient sample size. Then you won’t “get significance”, and hence not publish a paper, which is a failure of the method to “work”.

  12. I took it at the time to mean you can’t “significance test” your way through a bunch of meaningless data and say you’re doing science. Of course that was back before the phrase “Big Data” was ever coined.

    • I took it at the time to mean you can’t “significance test” your way through a bunch of meaningless data and say you’re doing science.

      This is correct, but the same is true for meaningful data.

Leave a Reply to Chris Wilson Cancel reply

Your email address will not be published. Required fields are marked *