Yes, well put on the “gut feelings” thing. The prior is part of the model and should not be based on “gut feelings” any more than the likelihood is.

]]>Agreed. Leek’s non-technical description of Bayes’ rule bugged me: “Final opinion on headline = (initial gut feeling) * (study support for headline)”. I think I’d have written something like “educated guess” rather than “gut feeling”. Not all gut feelings are of equal merit. One can have gut feelings about technical matters where they have some experience and are reasonably well informed. (I’d call that an educated guess.) Conversely, one can have gut feelings re matters where they are thoroughly uninformed. (See, for example, the global warming discussion in Andrew’s previous post. Wow.) Is it appropriate to give those gut feelings equal weight?

With that in mind, we talk about uninformative priors. Does anyone refer to “anti-informative” priors? i.e. priors which assign non-zero weight to beliefs that are demonstrably false.

]]>Street-Fighting mathematics sounds like an awesome book. Thank you for the link!

]]>I’m going to plug an educational article by one of the mildly frequent commenters here David Hogg http://arxiv.org/abs/physics/0412107

as well as a decent book by Mahajan (Hogg’s coauthor) http://mitpress.mit.edu/sites/default/files/titles/content/9780262514293_Creative_Commons_Edition.pdf

]]>If you’d like to discuss the use of dimensionless analysis in particular social science contexts I would be willing to use your “quality of life” type examples.

One thing to note though, there is no such unit-symmetry in 1-10 scales for social “quality”. Unlike measurements in inches vs meters, the results of the measurement WILL depend on the choice of scale and how it’s interpreted by the questioner. So in that type of question, the symmetry analysis may be less compelling. However, if you are interested in something like dependence on age or on years since some life event, or on how tall a person is, etc. Those things DO still have the unit-independence symmetry.

In any case, I think it’s very useful to think about the variation in “moral attitudes” about sex on a scale that is determined by some naturally near-constant thing like say honesty. The variability in honesty, or the average of honesty and legality or whatever provides a scale for “typical” variations. A regression in which the variation among countries on these factors is constrained to be 1 by rescaling will allow you to determine via a *useful* yardstick whether say sex variation is practically large (>1) , typical ~ 1, or small (< 1).

This is the essence of rescaling, to create a yardstick that measures in units that are relevant to the problem at hand.

]]>I’d be very interested to read that! While I do think that I can understand the reasoning for models in physics (and I remember some discussion about it in a “mathematical modeling” course I took), I wonder if and how far it might be a useful idea for Social Sciences, too. There’s always the question what actually constitutes a strong effect. Let’s say Y is a subjective evaluation of one’s own “satisfaction with life” on a scale from 1 to 10. This is *the* primary dependent variable in the “Quality of Life”-Research on which dozens if not hundreds of regression analysis with all kinds of more or less well reasoned models have been done. The question is also part of most of the Social Surveys I know.

Sometimes there is some kind of intuition about what constitutes a strong or medium effect but mostly people are just discussing signs and significance. If you’re lucky, articles will mention when an effect is statistically significant but practically irrelevant. If you’re really lucky effect sizes will be discussed relatively to some meaningful differences of the independent variables and so there is some – intuitive – idea of relative effect sizes between different independent variables in a model.

In a project I’m working on currently the dependent variables are different kind of “moral attitudes”, also measured on a 1-10 scale and I’m really struggling to think about what constitutes “practically relevant” effects, especially as other research as far as I know just ignores this question! But I also think that this might constitute a problem for dimensionless forms, too. For example moral attitudes toward legality and honesty do not vary a lot between persons (or countries) but moral attitudes toward sexual issues vary much more. I do think that this is a quite meaningful difference and I don’t want to lose that information if I model both variables in separate regressions, so standardizing Y does *not* seem to be a good idea for me but maybe other kinds of dimensional reduction (?) might still be a good idea.

Anyway, dimensionless forms seem to be a possible way to help with this quite dissatisfying way things are but I fear there won’t be many scholars who would want to try to understand this. Quality of Life research is usually done by the more method-savvy crowd and even there I don’t think many would be interested. Reading how many blatantly wrong use of models and interpretation are published in Sociology makes me skeptical that more advanced and more formal-mathematical approaches will practically help the Social Sciences. I’m still very interested if you should write something about the dimensionless forms in your blog!

]]>Finally, because everything you find out about the world needs to be true independent of the units you use, there is a symmetry which can be exploited to reduce the dimensionality of many problems. The classic examples of this are for example with drag coefficients. If you think of the drag on an object as a function of say it’s shape, it’s velocity, the density of the fluid, the viscosity of the fluid, etc. Then it seems like the drag is a fairly complicated function. But in truth, by exploiting symmetry we find out that drag is a function of shape of the object and reynolds number (a dimensionless group). So for a given object, we only need to collect data about a range of Reynolds numbers, a SINGLE parameter.

This same kind of thing IS exploitable in statistical regression, but as mentioned above I don’t have a great example offhand. I should probably think up one and stick it on my blog. If I do, I’ll try to get Andrew to link to it.

]]>Hoping to see you down here in SoCal soon. We can talk models, dimensionless analysis, and dynamics and soforth in person.

]]>The dimensionless form is an important point, thanks for the detailing.

*For one kids in western cities nowadays have more limited motoric abilities, which has a plethora of physical consequences (and cognitive too, since feedback loops are being broken): digestive, hormonal&metabolic, muscular, sensoric…

]]>I’ll try to stick to stats, and this post also suggested me some developements.

]]>I think timeseries are probably a good place to look as well. I’ve seen models in which what looks like uncertainty in a coefficient can be re-interpreted by defining the coefficient to be 1, and considering uncertainty in the duration of the experiment in dimensionless time. That can be a very useful way of looking at things.

]]>There exists some unknown units in which we could measure Foo such that the average value of Foo is 1, and the effect of the rescaled Bar and Baz are 1.2 +- .1 and 2.4 +- .2 and so therefore Baz has the larger effect… which maybe is all you wanted to know.

This whole scheme also really helps when assigning priors. It’s a lot harder to assign a prior to say the absolute mass of some toxin in the diet than it is to say the relative amount of that toxin compared to some other known non-toxic trace element for example the quantity of lead in the diet as a fraction of the quantity of calcium. I have no idea how many milligrams of calcium there are in a typical diet, but I am pretty damn sure that almost every single person on the face of the earth consumes less than 1/10 as much lead as calcium.

]]>Y/C = 1 + Beta/C X + noise_2

It may seem that this is meaninglessly the same. But consider that X has some units as well. Over what range does X vary? Suppose that there’s some scale X_0 so that X/X_0 is O(1) for almost all achievable values of X. Rewrite the equation:

Y/C = 1 + Beta*X_0/C (X/X_0) + noise_2.

Now Beta * X_0/C must be dimensionless and is called a “dimensionless group” in most applications in physics (examples include the reynolds number, the mach number, Peclet number etc).

The new noise is dimensionless as well, but its scale can be interpreted in terms of “as a fraction of C”. Note that we’ve intentionally chosen X_0 such that X/X_0 is “about 1”, so the size of Beta*X_0/C is determining how important X is in the overall outcome.

Your model now reads “Scaled Y is about 1 plus a correction of approximate size Beta*X_0/C plus some noise of size S (where S is sigma(noise)/C)”.

Consider case 2, this is the same model as above, but you’ve already made your model dimensionless. In fact instead you’ve done:

Y/S = C/S + Beta * T + noise(sigma == 1) + noise2(sigma == ??) T

where T is already O(1) because it’s either 0, or 1, Beta is already dimensionless, and S is the standard deviation of the control group. So when T=0 the noise will have a stddev DEFINED to be 1. Instead of defining the constant offset to be 1 like the first model. If the treatment group is noisier, then the size of sigma(noise2) is automatically telling you how much noisier than control by definition.

Playing around with the scales of the various variables can greatly improve the theoretical interpretability of a model. When things are scaled to be “about the size 1” or to vary over “a range of about 1” you can immediately see by the size of the coefficients whether the effects are small, medium, or large. Any coefficient whose size is much smaller than 1 is “small”, anything about 1 is “medium” (as big as other important things you’ve already scaled to be 1) and anything much larger than 1 is probably surprising.

In certain cases where things like the scales for X_0 or sigma are already approximately known, just doing this analysis can immediately tell you a lot about what you should expect from the data before you even see data.

]]>Oddly enough, we were just talking about this problem in our research meeting a few hours ago, in the context of setting up prior distributions for regression models to estimate population cells to use in MRP. I think this is a wide-open area and I anticipate that the first few papers that demonstrate a way to do this well, should and will be highly influential.

]]>I feel like the statistical learning people and the risk perception people have probably thought about this issue as well (how to elicit and/or interpret probabilistic knowledge from people).

]]>I think I agree with you, but I want to make sure. So two quick examples, and maybe you can re-configure them to match your idea of a dimensionless regression with coefficients that can be interpreted as “relative to some group”.

1 – Consider a covariate X that is distributed N(0, sigma), and Y = constant + Beta*X + noise, for some Beta that isn’t 0. Does the intercept here, by de-meaning the data, pass your test? Or what would you have to do?

2 – Suppose a random control trial with binary treatment T and outcome test score Y measured in units of standard deviations of the control group. We regress Y on a constant and a treatment dummy. We interpret our coefficient on T as the marginal effect of treatment – which is to say, the treatment effect is how many SDs higher the mean outcome score is in the treatment group _relatvive_ to the control group.

In my mind, all of these regression coefficients are “relative” to some group and that group is clearly defined – either those with X=0 or those with T=0. I am sure we could be more clear about this in our writing, but I think there is something substantive I’m missing in your idea. Or is it just about getting the units right – by choosing X dist N(0,sigma) (in the first example) and using the control group distribution of scores as a standardization (the second example) I am tacitly embracing your position?

]]>Think about probably one of the most rigorously checklist-driven activities in the world — piloting a commercial passenger aircraft. What were the selection pressures leading to obsessive checklisting? What’s the historical trend in air travel fatalities due to pilot error? Any likely connection there? ;)

One thing I learned from “Thinking Fast & Slow” is that humans often have trouble estimating probabilities. My teenage son got in the habit of spouting off subjective probabilities about this & that, sometimes spectacularly & provably off-the-mark. We decided to start calibrating our judgments by estimating & then measuring our success rate at making trash-basket throws with a mini ball, from a relatively close distance. Simple exercises like this can help put cognitive skills into humbling perspective!

]]>I wrote the post awhile ago, did not see those other comments. We’re on a 1 or 2 month delay here! Also, just in case it wasn’t clear in my post above, I like the idea of assigning numerical priors, as it’s a way of forcing us to come face to face with our assumptions.

]]>If you don’t know what I’m referring to, this might be a suitable starting point: http://statmodeling.stat.columbia.edu/2013/02/07/philosophy-and-the-practice-of-bayesian-statistics-with-discussion/

]]>https://ksj.mit.edu/tracker/2014/03/nate-silvers-new-fivethirtyeight-dishes

http://www.statschat.org.nz/2014/03/18/your-gut-instinct-needs-a-balanced-diet/

https://twitter.com/hildabast/status/445699741830365184

You are right that this was a dramatically simplified version of the approach for a general audience with space constraints. That being said, I think I learned the hard way the problem of subjective priors :-).

]]>Wagenmakers et al. came out and said it when they replied to Bem (2011), giving a prior of .00000000000000000001 (give or take a zero). But of course, by the time you get to that kind of numbers, diplomatic relations are badly broken (although they may at least have been replaced by a form of mutual understanding, cf. the US and the Soviet Union) and nobody’s really trying.

]]>All this is to fully agree with Andrew, and to once again emphasize something I’ve been saying for a long time. Every model, including statistical regression models etc, should be specified in dimensionless form, where every quantity is actually a ratio of the actual quantity to a carefully chosen reference level.

Sometimes just the act of deciding on the reference levels is the most important part of the analysis. Also, in doing this you can eliminate extraneous coefficients and simplify your models.

]]>What you say is part of the story. But the other part is that “increase your risk of cancer” is not defined quantitatively. I can only assume that “increase” would only count if it is larger than some threshold (for example, an increase of 10^-6 in lifetime cancer risk would not count), but that threshold has not been defined. By saying this, I’m not trying to be a pain in the ass; I just don’t see how it makes sense to assign a probability number to an event that it so vaguely defined.

]]>Are you saying all priors ought to be internally consistent with one another? I see the beauty of that but in practice we humans work with a very coarse scale, in increments of 0.1 say. This includes a lot of error.

For example 1/10 could be anything from 0 to .15 say. Accordingly the prior on checklists could be many times more likely than the one on Facebook.

Put differently, the ideal ratio is included in the coarsened set. This ought to provide _some_ solace.

]]>1. Regarding your first paragraph: No, I was not writing about what I “wish” but rather what I think is true. The point of my post is that if you want to be quantitative about prior probabilities, you should be quantitative about what you are assigning probabilities to. If you want to say Pr(A) = 1/4, then it’s a good idea to give the event “A” a quantitative definition.

2. Regarding your last paragraph: Yes, as new information comes in, the probabilities will change. That’s how it’s supposed to go!

]]>I believe that we need to use log scales to express our expectations. The 1-100% scale used in the GJP is simply not enough dynamic range. There is almost no case where a prediction of between 20-80% is a useful expression of the prior.

Instead, if it isn’t simply a coin toss, then one starts to express priors as 10%, 1%, 0.1%, etc. Perhaps a great predictor can use base 5 instead of 10.

As an aside – checklists are more valuable than almost anyone thinks because people are highly distractable. Without a doubt there is some saturation density at which checklists stop helping because nothing else gets done but the checklists. Avoiding that extreme is not a problem for hospitals at this point. If someone went into a situation that had measurable medical outcomes at stake and introduced a checklist where there had not been one, my prior would be 90% certainty that the checklist would improve outcomes over the course of a month or year. If you want to talk about this more, the person to read is Paul Levy at Not Running a Hospital.

]]>I guess it’s easy to have similar attitude towards things like http://science-beta.slashdot.org/story/14/04/28/1823207/male-scent-molecules-may-be-compromising-biomedical-research

“OMG, we can’t allow this to be true!” (regardless of actual study quality).

Anyway, what if the FB association to cancer risk was merely to anticipate it in an otherwise “predestined” subject? When would you call it significant? Days? Weeks? Months? (the “otherwise” is indeed tricky to specify, but let’s skip it for now…).

Only after that choice one shall ponder how high is risk for each one of us “someday”.. and draw conclusions. I bet the result would then differ.

The impact of checklist use is measured relative to not using a checklist- simple. X or not X.

Whereas Facebook use is compared to…what? Twitter use? Gaming? Playing outdoors? A weighted average basket of time spent by people doing other stuff.

So the issue is that “Not X” isn’t defined in the Facebook case. That right?

]]>