More seriously, there’s more than one definition of weighted.

https://www.google.com/search?ie=UTF-8&source=android-browser&q=weighted

]]>What if it’s a very thick coin?

]]>Regarding the 5% swing, of course an actual swing of that magnitude is major, but how are you going to determine it? By sampling, of course, and my comments were based on the Pew change in PID mentioned in your paper.

Regarding Rivers and in more general academic political science research, the main problem from a practitioners point of view is that it can’t be trusted, and that is clear in both the heterogeneity paper but also in the swing voter myth paper (I won’t believe non-response is the driving force behind most poll changes until results like the 2000 polling can be explained by it). No one fakes the data (at least as far as I can tell), but the analysis is typically wrong/incomplete and in a manner that benefits the author. This is apparently necessary to get the paper published. In heterogeneity, would it have been published if a statistically valid analysis had been done on table 2, in swing (and I don’t know whether it has been published), would a title such as “Evidence of Non-response biased by partisan affiliation after seminal campaign events” get any attention (or be published?). I think not, yet that is how I read the results in your paper. The phenomenon I am describing is analogous to the .05 significance problem and it hurts the science.

As regard to the applicability of your paper to pollsters, in states where you can obtain registered voter list, a very simple way to ensure non-biasness of response by partisan/demographic categories is to sort the registered voter list by various criteria (partisan id, sex, age, ethnicity (through surname matching)) and then create clusters of, say, 50 names for 400, 800, or however many n you want to contact. This gives a rough equivalence to the actual composition of the district/state and you start with number one in the cluster and call until you get a respondent. There might be increased non-response in some of these clusters after a campaign event (I’ve never looked) but even if you have to call 4 names in a democratic cluster (say) as opposed to 3 in a republican cluster (presume this is after a republican convention), it doesn’t matter (unless you want to claim there is a vote-choice bias in the democrats that do respond as opposed to those who have “slumped”–I’ve never seen this either). Anyway, this is why non-response isn’t typically a problem in states where you can get registered voters with party identification (obviously, you have to pay to get the phone numbers matched since they typically won’t be on the registered voter file).

I suppose my final comment is a plea that you adopt a more stringent scientific approach to political science research so that is more science than politics. I know it is difficult to get published in polysci journals and your reference group is your fellow academic political scientists but you are better than them–you have rigorous scientific training in statistics that they don’t. I suppose it is a public goods problem–it would hurt you and but help the field. You’re senior enough that the cost to you is bearable. Think of your legacy.

]]>Most of Gelman’s critique of forking paths etc is critique of people using frequentist ideas and p values to make claims. His point is that the claimed error frequency isn’t anything like 0.05 etc because of the post-data post-hoc nature of the testing.

I’m pretty sure his recommendation to consulting clients etc wouldn’t be to do different frequentist p value based testing, but rather to build a model with some causal content, and use bayesian methods to fit it.

]]>I think you should also let the author know that the Monty Hall problem has nothing to do with Bayesians Vs. Frequentism – it isn’t even a statistics question at all!

]]>And another thing. Within a given model the likelihood principle holds and updating is done through Bayes theorem. If the model is tentative, or based on assumptions, then Bayesians are free to change the model whenever they want after reexamining those assumptions. In particular they can change it because the model makes poor predictions.

Doing so doesn’t mean they’re secretly frequentists. They’re still using probability distributions which aren’t frequencies. When they go to check the accuracy of the model they still compute the models implications using Bayes theorem and all the rest.

The idea that you can make assumptions, work out their implications, and see if the those implications are true isn’t owned by frequentist in any sense whatsoever. People were doing this long before frequentism came along. Bayesians can do this just like everyone else. Bayesians were doing this from day 1 (i.e. Laplace), just like very everyone else.

And if some of those assumptions involve distributions which represent uncertainties rather than frequencies, and if they use things like Bayes theorem to get those “implications” in order to compare them reality, they aren’t secretly frequentists or relying in any way, on any part of frequentist ideology.

]]>If you use probability distributions which aren’t frequencies then you’re Bayesian. No ifs, ands, or buts about it.

If you use probability distributions which aren’t frequencies to estimate the speed of light, you can compare your estimate to the actual speed if known, thereby checking the model assumptions.

If you use probability distributions which aren’t frequencies to estimate frequencies, you can compare the estimated frequencies to actual frequencies if known, thereby checking the model.

DOING SO DOESN’T MAKE YOU A FREQUENTIST. YOU’RE STILL STILL USING PROBABILITY DISTRIBUTIONS WHICH AREN’T INTERPRETABLE AS FREQUENCIES.

]]>I’m glad for poor Faye who had to struggle to get this published, and kick out whatever was not deemed P.C., but I don’t see much clarity on the issues here. I find it very odd that Gelman is presented as holding a view that is “the opposite” of frequentism, while he writes a nice article about the relevance, in criticizing the case at hand, of outcomes not observed and paths not taken, merely because they might have been taken. This is the very definition of an error statistician and is wildly at odds with what other Bayesians claim.

We are roundly criticized daily for considering such “could have beens” in reasoning from what has occurred. If Gelman wanted to have a pop article like this have a real impact, he’d focus on the need to critique the reliability of the methodology, taking into account outcomes and paths that might have been followed, thereby allowing leeway that gives grounds to question the actual conclusion. Merely calling what he advocates Bayesian just skirts what has always been the key issue: the need for a method to live up to controlling error probabilities. The mere fact that there’s background knowledge to make you suspect a result (e.g.,political allegiances don’t change so fast or whatever) fails to constitute the actual grounds for criticizing the methodology. Why not just say what the real criticism is?

1. When I say a 5% swing in party ID is large, I mean a 5% actual swing in the population. A 5% swing in the sample is no big deal as it can be caused by a combination of actual swing, nonresponse swing, and sampling variability.

2. I was not familiar with that particular paper by Doug. I have to admit I’m not knowledgeable about much of the political science literature.

]]>If you recall the 2000 election, the polls varied dramatically over

the course of the campaign (see

http://en.wikipedia.org/wiki/Historical_polling_for_U.S._Presidential_elections#United_States_presidential_election.2C_2000). Yet there was no indication of large non-response problems by party from any of the polling organizations (maybe there was and I didn’t see it/it wasn’t reported, but my impression is there wasn’t and I didn’t see any of this in the polling I was doing). This is why I think your findings are probably not true (slump versus changing

responses). Also, when you state that a 5% difference is “major”, you are ignoring that the Pew estimates are random and that the 55 to 47 percent difference in two successive samples may very well be within expected sampling error bounds, depending upon the Pew n (which I

don’t know). Isn’t ingoring randomness in estimation a type M error?

Here’s the problem with “Heterogeneity in Models of Electoral Choice”. Rivers tests two competing regressions (discrete choice) of vote choice (MNL and COLOGIT) and wishes to make inferences on the difference of weights of idealogy and party between the two models.

He estimates both models and calculates weights and standard errors for each model.

Here is Table 2:

Table 2 (standard errors in parenthesis)

MNL COLOGIT

Idealogy -0.214 (0.051) -0.112 (0.774)

Party -0.313 (0.085) -0.730 (2.995)

Here is his comparison:

“Table 2 compares the standard multinomial logit estimates of (19) which impose the homogeneity assumption of equal party and ideology weights for each voter with the average COLOGIT estimates. The discrepancies between the MNL and average COLOGIT estimates

are striking. The average party weight estimated by the COLOGIT procedure is more than twice as large as the MNL estimate, while the average ideology weight is 50 percent less than the MNL estimate. In neither case is the average COLOGIT estimate within two standard

errors of the MNL estimates.”

This shed an interesting light on how a leading political methodologist performs statistical inference! In actual fact, of course, one cannot treat the COLOGIT estimate as fixed and use the standard error of the MNL estimator to establish statistical difference. Rather, both are random and what one needs to do is calculate the difference of the coefficients over the standard deviation of the coefficients. Letting the estimate of party be a under the MNL and that of party be b under COLOGIT, then the the ratio (a – b)/root(var (a) + var(b) – 2 cov(a,b)) is what should be calculated. This can be done by the method of Cox (1961), but we can put some bounds on this ratio by the following. A minimization of the denominator is obtained by noting that cov(a,b) = sd(a)sd(b) when the correlation of a and b is set to one, so the denominator becomes root (sd(a)^2 + sd(b)^2 – 2 sd(a) sd(b)) which is |sd(a) – sd(b)|. Hence the ratio of the difference of the the party weight to the standard deviation cannot exceed (-0.313 – (-0.730))/|2.995 – 0.085| = .14 (approximately). So River’s Table 2 actually presents statistical evidence against his hypothesis of unequal weights.

]]>Numeric:

I appreciate your taking the time to comment. It’s true that my immediate reaction to these sorts of comments is irritation, but that’s really my problem, not yours. I recognize that not all of my research is convincing to everyone—this is social science, after all—and, given that you do have these issues with our project, I think it’s good for myself and others to see your objections.

I don’t have time to respond to every comment but I will respond to your points.

1. I consider a swing of 5% of party ID during a one-week period to be a major swing, and for various reasons I do not consider it plausible. My colleagues and I have discussed this issue in our 1993 and 2001 papers that addressed party ID and voting. In the Xbox study it is not an issue at all since we have panel data and the party ID question is asked only once. So even if party ID were changing by these large amounts, it would not be an explanation to what we saw in our data.

2. We will do our best work and then worry about whether any pollsters will take it seriously. I’ve talked to various people involved in politics and polling and they do take our findings seriously. Adoption of any new method takes time and won’t be universal; indeed, it was 13 years ago that my colleagues and I published our paper on poststratification by party ID and this is still not standard practice. It takes awhile. But I do think we are making progress and, as we continue to improve the methods and find interesting empirical results, I think pollsters and political professionals will move in this direction.

A lot depends on the goals of a polling organization. In the short term, a poll gets headlines by producing more fluctuations: noise = news = publicity. Longer-term, though, I think accuracy has to be the way to go.

2. I actually have no idea what you’re talking about regarding heterogeneity in models of electoral choice. But if you think this is an important topic, you should feel free to write such an article yourself. There’s no need to wait on me to do it.

3. I do criticize political science claims when they come to my attention and when they bother me. Recently here and on the sister blog I criticized the party-id-and-smell paper and I criticized what I considered to be exaggerated claims regarding political effects of subliminal stimuli.

If you’d like to see criticisms of work by my own colleagues, again, you should feel to write and publish such criticisms myself. I’m not planning any time soon to write withering criticisms of the work of Jeff Lax, Justin Phillips, Bob Erikson, Bob Shapiro, etc etc., for the simple reason that I think their work is good!

]]>Yes, it would be a major project — but I think it would be worth the effort.

Speaking for myself, I would be enthusiastic about contributing to something like this. And I am pretty sure there are others.

]]>Regarding the possibility that there are major swings in party ID during a one-week period during the campaign: For many reasons I don’t think this is plausible.

Major is an incorrect word and not my argument. Your argument is that short-term changes in candidate support following an “event” (a convention, a debate) are caused by some sort of anomie leading to a “slump” (decline in response rate). My argument is that self-reported partisan identification is endogenous to vote choice to some extent. In particular, the Pew self-report figure going from 55 to 48% is probably a result of changes in self-report PID as it is from slumping. This is not major (given the small sample size on surveys such as Pew, 7% difference for one party is probably just within the bounds of confidence, a concept you don’t like). But it may be systematic, and I believe this endogeneity is what is behind your results.

In particular, this is an empirically testable hypothesis, as you can cross-reference back to the registered voter file for states that report partisan affiliation. You should be able to do this

for a number of states from your X-box data, since you must have the names and addresses of these individuals or could relatively easily get them (note this is not doing a state-wide analysis, as you claim–this is substituting an exogenous variable for an endogenous variable–an obvious analogue is instrumental variables). Then it is trivial to match these back to the registered voter file and get the partisan identification at time of registration.

I think the overall point is that there is an obvious counter hypothesis to the “slump” hypothesis, one that is testable, and one that is not mentioned in your paper. I doubt if any pollsters will take it seriously otherwise (if the point is another academic trope, well, be my guest). As far a cynicism goes, though, I’m still waiting for the blog entry on the statistical methods in “Heterogeneity in Models of Electoral Choice.” As another somewhat cynical observation,

your targets for statistical ire appear to rarely hit your political science colleagues.

Numeric:

Lots of comments you have here, let me respond very quickly:

1. I don’t know anything about the polls or the election in Scotland. This is not to dismiss your remarks, just to say that I can’t really address them, one way or another.

2. You write, “Xbox sampling, actually, though one wonders how representative this is of the population as a whole.” The Xbox sample is indeed *not* representative of the population; we discuss this in our article.

3. Regarding the possibility that there are major swings in party ID during a one-week period during the campaign: For many reasons I don’t think this is plausible.

4. You suggest a state-level analysis. That could be a good idea. Go for it.

5. In the Xbox panel, party ID was asked only once, when the participant joined the survey. It was not asked repeatedly.

6. I don’t see the conflict of interest of which you speak. The survey was done using Xbox, not Yougov. I guess there’s a conflict of interest in that one of the authors works at Microsoft and we used Microsoft data but this seems pretty clear, I don’t really see this as any worse than a Columbia University researcher writing a paper using Columbia University data, etc.

7. You can be as cynical as you want. I think you could probably find better targets for your cynicism than Doug and me, but that’s your call.

]]>Hcg:

Sure, but the words “odds,” “likelihood,” “chance,” and “probability” are commonly used interchangeably in colloquial writing so this doesn’t really bug me.

]]>“A Bayesian calculation would start with one-third odds that any given door hides the car, then update that knowledge with the new data: Door No. 2 had a goat. The odds that the contestant guessed right — that the car is behind No. 1 — remain one in three. Thus, the odds that she guessed wrong are two in three”

She is conflating the probability of 1/3 with odds. Odds are p /(1 – p) or 1/3 divided by 2/3 here, so the odds are 1 to 2.

]]>Also, I see no conflict of interest disclaimers in this study. As Polimetrix was acquired by YouGov, Rivers in particular probably still owns a chunk of stock in YouGov (in fact, a cynical comment on this paper is that it is an attempt to drum up business for internet sampling). Every reputable scientific journal requires this, and if political science wants to be more scientific than political, this should be included as a matter of course in any article (whether required by the

]]>“anything you can do with Bayesian inference you can do in other ways”

I think Bayesian methods have come into their own as computing power has increased. There are modern approaches that are computationally tractable only with Bayesian methods. E.g. Bayesian Markov Chain Monte Carlo for parameter estimation?

]]>Certainly, my beef is not with Andrew’s recommendations, which make sense to me. And no, of course I don’t do the same thing in reviews. I understand the process you describe for instituting change. But that is a major project compared to the much more immediate goal I was talking about: simply implementing Andrew’s recommendations. I guess what you are saying is that one needs to take bigger steps to get to that point. Maybe.

]]>If you suspect you friend has a biased coin…..well you ought to just insist on the von Neumann protocol. :)

]]>It sounds to me like your beef is with the reviewers in your field(s), not Andrew’s applied statistics recommendations. Ironically, of course, you’re a member of the group you’re complaining about. Do you do the same thing in reviews? Could you edit a special volume of a journal? Or become an area editor and invite some specific papers to address the shortcomings? It’s not even that hard to start a whole new journal — you could talk to the Michael Collins and crew about their experience starting TACL (largely started, I believe, to deal with the incredibly slow and picky reviewing in the pre-existing Computational Linguistics journal).

It’s a chicken-and-egg problem. Once papers exist to cite and once journals see them being cited, they’ll want more such papers. And it’s easier to convince the grad students than the tenured faculty. The upside is that they’re the future reviewers whereas the current reveiwers are future retirees.

The field of psychology is changing, as witnessed by the success of Krushcke’s book, the number of tutorials on Bayesian models I’ve seen at psych conferences (mainly psycholinguistics — I have a biased selection), and the new book by Lee and Wagenmakers. I don’t remember seeing anything like this 30 years ago in my first romp through stats for social science.

Ironically, academia is very conservative and very slowly paced compared to what I expected going in. Getting tenure is supposed to give professors all the freedom in the world, but the whole enterprise winds up reinforcing very narrow and traditional research. Professors go along with it for tenure, promotion, and grant funding. Students are forced to go along with it under the reasoning that they need to get publications in order to get jobs. I think it may have something to do with the reward structure encouraging people to concentrate on narrow research areas and the age bias toward older professors on editorial boards.

]]>> real change happening and I just don’t see it

Speculated a bit to Fernando about this last week.

> if they plan to make a living as academics

Life is not fair, but it may get fairer in research as preregistration, reproducibility and replication become more prevalent than enhancing one’s reputation with work for which the quality cannot be assessed. (As Don Rubin once said, “smart people do not like being repeatedly wrong” and some smart people with resource control are starting realise believing much of the published research will do exactly that.)

But it will likely be the high (and difficult) road for some time.

Also why I worry about people reading to much into the successes of many who _might_ have taken the low road.

]]>This is very sad and is what the NY Times should be explaining to the lay public.

]]>+1

Bayesian is a great technique but * just not a cure-all (some pieces make it sound like that). *

e.g. In applications like spell check, or spam filters, or expert systems or handwriting / image recognition, or voice recognition, fault diagnosis etc. Bayesian is perfectly awesome.

In certain other areas I’m not so sold on it.

]]>You also have to bend the coin in a very obvious way according to the article to get get a coin where the author was quite confident that it is unfair. His power is not very big, though.

]]>True. I think I was actually reading the “weighted” more loosely as “something funky is going on with this coin” instead of literal weight changes. Weightedness as a metaphor, so to speak.

]]>Christoph:

Yes, bending. But not weighting, which is what was stated in the news article.

]]>But I have a strong feeling that Andrew’s recommendations for psychology and related areas come from a non-practitioner’s perspective. Editors routinely reject our papers *because* we have replications of key results in the paper (“replications add nothing new”), if we mark a speculative and tentative conclusion as such, the paper is rejected because the result is not convincing. Top journals routinely publish low power null results as if they are positive findings. Post-hoc explanations are dressed up as predictions before the experiment was even run.

If one were to really implement Andrew’s ideas, there would be no publication possible because the gatekeepers are not on board. From the perspective of a user of statistics in my research, I’m pretty frustrated that I don’t see any point in following Andrew’s advice, unless I want to self-publish my work on my home page. I could do that; but not my students.

I just wanted to point out how ineffective Andrew’s advice is in changing real practice. I guess if enough people adopted saner statistical practice things would change. But right now I see Andrew’s advice as good to know and good for real understanding, but practically not useful. I’m happy to be corrected; maybe there is real change happening and I just don’t see it; I certainly experience the absence of change in my daily drudgery of revise-and-resubmit actions.

]]>I was criticizing the article. Not you.

]]>Rahul:

Please read carefully! In the post above I clearly state that I did *not* actually say that. My suggested replacement was, “This could well be an even bigger problem with prominent journals.” Weasel words, sure, but that’s cos I honestly have no idea. But I do think it *could* be a bigger problem, especially if “problem” is defined to include impact of the errors.

“The proportion of wrong results published in prominent journals is probably even higher”

I think the crap is all spread out. I’d love to see evidence that top-journals are more wrong than others.

]]>Excellent summary I think. I agree.

I just hate it that newspaper stories have to exaggerate so much. They could’ve done without the *“impossible problems made possible now”* bit, I think.

Rahul:

In my opinion, anything you can do with Bayesian inference you can do in other ways. To me, Bayesian inference is a bit like calculus: You can do derivatives and integrals without calculus (indeed, mathematicians in pre-Newtonian times were able to compute limits, with care), but calculus makes it a lot easier. Similar, I find that Bayesian inference makes it a lot easier to combine information. For example, I’m sure that someone *could* do MRP non-Bayeisanly—and indeed there is a non-Bayesian tradition of partial pooling for small-area estimation in sample surveys—but I think it’s no coincidence that the widespread use of MRP has come along with the Bayesian approach.

If you look at my applied research papers, you’ll see a lot of analyses that maybe could’ve been done in non-Bayesian ways but in fact which my colleagues and I did Bayesianly, and which I suspect would never have been solved had we not had Bayesian tools.

There are also a lot of non-Bayesian success stories in statistics, but that’s fine, I don’t think that news article claims otherwise.

Bayesian inference is many things. It’s a set of tools for solving problems, also a framework for understanding statistical methods. Other statistical approaches similarly serve this dual duty, for example classical hypothesis testing is a set of methods and also a framework in which statistical inference is viewed as a set of testing problems. I don’t find that particular framework very helpful—indeed, I think it often gets in the way—but I do recognize that there are many problems for which methods developed in that tradition can be useful. Recall my recent discussion of lasso.

]]>“Now Bayesian statistics are rippling through everything from physics to cancer research, ecology to psychology. Enthusiasts say they are allowing scientists to solve problems that would have been considered impossible just 20 years ago.”

What are some examples of problems considered impossible in 1995 that are solvable today just because science embraced Bayesianism?

]]>