Christian, at least in psycholinguistics a surprising number of people argue for null results, drawing strong conclusions that \mu=0, from a low-power (lowered even more by violations of normality of the residuals). So it’s a real problem.

I’ll look at all the other things K? O’Rourke and fred linked to, thanks for that.

]]>I’m not sure what property called “independence” you think you’re testing for with all that data (all of which is taken in overlapping light cones), but whatever it is, there is a possibility it isn’t actually being relied on:

]]>Or maybe you mean that the original authors tried to get statistical significance from the data; I think that the probability of that being the case is near 0; I think it was just an oversight. The point I was trying to make in this exchange on this blog was that assumptions (or maybe it’s just the role of outliers or influential values) are not taken sufficiently seriously, at least in some circles.

]]>Again, this may not be a big issue, but how big an issue it actually is depends on how precisely you make your decisions. If you know in advance that you may try out logs if data look quite skew but nothing else, it won’t do much harm. Doing something very flexible such as Box-Cox with parameter estimation makes the problem somewhat bigger but may still not hurt that much, at least if the data set isn’t very small. Trying out transformations until the t-test comes out significant on the other hand is seriously evil.

Then, loss of power is often not such a big problem because it only really hurts in borderline situations, and also, if you interpret a non-significant outcome carefully (particularly not taking it to say that the null hypothesis is true), low power won’t lead you to a wrong conclusion but rather only means that you may miss a possible conclusion.

Apart from permutation testing, the Wilcoxon rank test is also a good option in such a situation (actually that’s probably what I’d do).

It’s a big issue to get into in a blog comment, but (with large samples) there are ways to justify most well-known regression estimates, and tests of whether the corresponding parameters are zero, without parametric modeling assumptions. A quick and rather general introduction is here, the same authors also wrote a recent book that’s good.

These justifications are no cure-all, but understanding how and why they work (and how they relate to more model-based justifications) should make it easier to understand which assumptions are actually doing the work – i.e. which assumptions we can basically ignore, depending on the circumstances, and which are critical.

]]>Would be found too wrong if enquired into sufficiently.

I have been thinking about this quote from Thomas Hoskyns Leonard, A personal history of Bayesian statistics

“It was used [by Ramsey] to demonstrate that Bayesian probability measures can be falsified, and so met an empirical criterion of Charles S. Peirce, whose work inspired Ramsey.”

At least, this does seem consistent with my understanding of Peirce who was none the less very disparaging of Laplace’s indifference priors.

]]>The implication of this is that although much can be tested, in order to say anything you need to make some kind of i.i.d. (or exchangeability) assumption that cannot be tested, either for the data themselves or at least on some level where it produces a regular pattern of dependence or non-identity as in regression or standard time series models. ]]>

Exactamundo.

]]>Good point about assumptions being testable (or, as I sometimes like to say, grounded in reality). This is one of Deborah Mayo’s big points about the philosophy of statistics: what makes a method “objective” is not that it is conventional (for example, we can’t simply label a Jeffreys prior as “objective” just because it is a standard choice, and we can’t simply label a maximum likelihood analysis as “objective” just because it is the default textbook thing to do) but rather that it is tied (ideally, in multiple ways) to reality, for example by being motivated from some combination of logical argument and historical data, and being checkable in some way (possibly not until the future) by comparison to data.

]]>That is a real challenge and why I don’t think its just folks reading carefully.

(And MSc and Phds in statistics just provide the math and skills to start learning about how to apply statistics or at least this was my impression of watching two batchs of bright students doing the MSc at Oxford Statistics department.)

This quote points to the problem.

“Many of the problems with students learning statistics stem from too many concepts having to be operationalized almost simultaneously. Mastering and interlinking many concepts is too much for most students to cope with. We cannot throw out large numbers of messages in a short timeframe, no matter how critical they are to good statistical practice, and hope to achieve anything but confusion.”

http://www.rss.org.uk/pdf/Wild_Oct._2010.pdf

The many reasons for taking transformations other than to get “residuals that are approximately Normal” e.g. that getting additivity and linearity that Andrew referred to, which are to get commonness* of underlying parameters. Additionally, many features of techniques need to be considered, and arguably getting the coverage of intervals correct is usually focussed on first and taken as being required not just desired.

To Fisher, his t-test math was just a way to get an approximation to the permutation test in the early 1900,s – that’s all – and it’s very good (not perfect) for large sample sizes.

I believe you have read history (early papers), conceptual papers by leading statisticians and blogs like this.

* for the brave and adventurous I have written on how to more fully focus on the parameter space here http://statmodeling.stat.columbia.edu/wp-content/uploads/2011/05/plot13.pdf

]]>Additionally, in medical research, you have an “unbiased estimate” only where there is absolutely no effect even internally given non-compliance, drop-outs, missing data, patients being clearly told the experimental treatment may have no benefit at all, loss of blinding, etc.

One way of putting this, is that randomised clinical trials are really good at identifying a treatment effect but often horrible at estimating the size and or variability of the effect.

]]>the type of objection you raise here is one point I’ve been trying to understand for the last three years, so help me out here. I’ve been doing an MSc in statistics from Sheffield, and everything I’ve learnt so far indicates that what you call my confused statement :) is correct. I understand that normality might be a minor issue compared to other issues. But the hypothesis test relies on it. I have raised this point before in more detail (Aug 5 2013):

http://statmodeling.stat.columbia.edu/2013/08/04/19470/

If my reasoning in that post (sorry for the LaTeX) is wrong, I really would appreciate being told so (and why).

Responding to Corey, I had linear mixed models in mind.

]]>(But perhaps Shravan has something else in mind…)

]]>I think this is the assumption that most often causes problems. You have an “unbiased estimate”, but of what? What is the population? In most cases the population is really only the set of people/animals/cells you performed the study on.

]]>You also write about “Estimating the variance of parameters… depend[s] on distributional assumptions being satisfied”, but unless we’re being Bayesian we don’t estimate variances of parameters.

]]>Really do think that depends on the statistician.

Also I don’t think most people can read stuff properly unless they almost already understand what is being written about.

]]>I would phrase it a little different:

Theory about the world ⇒ assumptions behind measurement and structural model + data = inference.

I would focus more on theory rather than assumptions. The latter ought to be the restrictions imposed by the former. (But granted plenty of assumptions are made out of convenience).

]]>Still, if somebody makes “no model assumptions matter at all” out of my initial comment, they don’t bother to read stuff properly and shouldn’t complain if they get things wrong.

Regarding the data you linked, you’re right, this plot shouldn’t make people feel comfortable about a t-test, though Andrew or others would need to comment on whether using Gelman and Hill as a reference for this also rather means that they didn’t read properly, which I suspect. ]]>

“What exactly happens and how much it is a problem if model assumptions are violated (which they pretty much always are) depends on what we’re doing and on how exactly the assumptions are violated (which is sometimes hard to find out)”

need to be fleshed out, not with a laundry list of do-this-don’t-do-that (“recommendations”), but with concrete examples that lead to real understanding of the issues.

Even Christian’s first point about outliers, which seems prety obvious, needs to be stressed in books (I haven’t seen much of that). See, for example, what happens in figure 1, a published dataset I re-analyzed:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0077006

Would you agree that doing a paired t-test on the top left data might be a problem? Yet that’s exactly what was done, with p0.05 and the paper I got from that data would be unpublishable, because that was the basis for the whole paper. I have lots of data sets like these, so it’s a pretty common situation. When I talk to people about such problems in my field, they tell me that they’re just following Gelman and Hill 2007.

I don’t consider myself an expert by any means, and I’m happy to be corrected on all this. It’s great to be able to talk to real and experienced statisticians on this blog.

]]>Assumptions are the levers that allow us to move the world.

So long as you stop to check once in a while whether you are using the right levers…

]]>Well, if people can’t read, they can misinterpret whatever anyone writes… ]]>

We should not get too hung up with arbitrary violations of assumptions because there is always something, but quite a bit of this is actually harmless. Some isn’t, though.

People tend to have a wrong understanding of the meaning of a model assumption. If we do something that is optimal under normality, it doesn’t mean we are not allowed to use it if normality isn’t fulfilled. What exactly happens and how much it is a problem if model assumptions are violated (which they pretty much always are) depends on what we’re doing and on how exactly the assumptions are violated (which is sometimes hard to find out). ]]>