There are a bunch of methods floating around for estimating ideal points of legislators and judges. We’ve done some work on the logistic regression (“3-parameter Rasch”) model, and it might be helpful to see some references to other approaches.

I don’t have any unified theory of these models, and I don’t really have any good reason to prefer any of these models to any others. Just a couple of general comments: (1) Any model that makes probabilistic predictions can be judged on its own terms by comparing to actual data. (2) When a model is multidimensional, the number of dimensions is a modeling choice. (In our paper, we use 1-dimensional models but in any given application we would consider that as just a starting point. More dimensions will explain more of the data, which is a good thing.) I do not consider the number of dimensions to be, in any real sense, a “parameter” to be estimated.

Now, on to the models.

Most of us are familiar with the Poole and Rosenthal model for ideal points in roll-call voting. The website has tons of data and some cool dynamic graphics.

For a nice overview of distance-based models, see Simon Jackman’s webpage on ideal-point models. This page has a derivation of the model from first principles along with code for fitting it yourself.

Aleks Jakulin has come up with his own procedure for hierarchical classification of legislators using roll-call votes and has lots of detail and cool pictures on his website. He also discusses the connection of these measures to voting power.

Jan de Leeuw has a paper on ideal point estimation as an example of principal component analysis. The paper is mostly about computation but it has an interesting discussion of some general ideas about how to model this sort of data.

Any other good references on this stuff? Let us know.

Simon Jackman commented:

here is a quick litany of thoughts/responses on your paper with Bafumi & Park & Kaplan.

* if you don't know about it, Doug Rivers has a very nice paper on identification for multidimensional item-response models (with roll call analysis as a special case). he is in NYC with CBS for tonight's election call, so is probably busy (as well you might be too); let me know if you haven't see the paper.

* your Figure 1 says:

This graph also illustrates the nonidentifiability in the model: the probabilities depend only on the relative positions of the ability and diffculty parameters; thus, a constant could be added to all the alpha_j's and all the beta_k's, and the model would be unchanged. One way to resolve this nonidentifiability is to constrain the alpha_j's to have mean 0.

Not quite. Constraining the ability parameters this way is a first step (buying you invariance to translation) but I think you're not identified until you also do something to rule out scale invariance: i.e., set the ability parameters to have mean zero AND standard deviation one, or set two of the abilities to constants (e.g., Kennedy at -1, Helms at +1).

* Clinton, Jackman and Rivers which you cite as 2003 technical report is now in the APSR. It appeared this past summer, I believe. I think we are clearer about identification in the final version in APSR than in the intermediate working papers, citing Rivers etc.

* one of the things I like about the setup in my work with Doug & Josh is that (1) our statistical model falls out of a fairly standard formal-theoretic approach to roll call voting (the Euclidean spatial voting model with quadratic utilities over outcomes and local/conditional independence); (2) our setup maps directly onto the 2 parameter IRT model from educational testing, about which much is known… In this sense our approach is a little more model-driven than data-driven (i.e., contrast naive MDS or factor analysis or clustering etc). My experience is that anything too data-driven in this field tends to run into trouble within political science because it while it is one thing to toss more elaborate statistical setups at the roll call data, they tend to lack the clear theoretical underpinnings of the Euclidean spatial voting model. Put differently, what is the model of legislative decision-making that underpins any given statistical model?

This point is relevant even to something as seemingly innocuous as hierarchical modeling or robust fitting. What behavioral/political assumptions or processes suggest that we ought to do this when we model the data? The point here is that things that sometimes make good sense or seem attractive from a statistical perspective will often into a wall of skepticism from Congress people who like to see things built up from ground zero (legislators' utility functions…).

* the educational testing people are very big into nomenclature. I've been reliably advised not to call anything other than the 1 parameter model a Rasch model: i.e., Pr[y_{ij}=1] = F(a_j – x_i). anything more elaborate than that is NOT Rasch, and you need to call them 2-parameter IRT models. By the way, the number of parameters in the nomenclature doesn't include the ubiquitous ability parameter: hence, Rasch is a one-parameter IRT model, the usual roll call setup (with varying ability and discrimination parameters) is a 2-parameter IRT model; a three-parameter IRT model is usually reserved for applications in educational testing that try to deal with guessing; e.g, Pr[y_{ij} = 1] = g_j + (1-g_j)F(a_j – b_j x_i). So, while your model (like our model) has 3 parameters in it, I think you might confuse readers by calling it a 3 parameter logistic regression, when it is a re-parameterized 2 parameter IRT model.

In this light, your footnote 3 is misleading. Political science does not use Rasch models (1 parameter IRT models) to analyze roll call data, at least not the people you cite. Poole and Rosenthal, Clinton, Jackman and Rivers, and Martin and Quinn is all 2 parameter IRT. The 1 parameter IRT model imposes restrictions on bill locations that I think many political scientists would find implausible (although presumably one could test this with data).

* should Bayesian care about identification? Bayesian analysis can proceed with or without the model parameters being identified, since identification is a property of a likelihood. the point is that priors don't really "solve" identification problems, save in the degenerate case of a point-mass spike priors (i.e., parameter restrictions by any other means). I've been wrapped over the knuckles by Bayesians and non-Bayesian alike for saying things like "priors solve the identification problem…".

* post-processing is something I've been doing more of in my own work. That is, run Gibbs on the unidentified model, and then, with each iteration's output, translate/scale/rotate etc back into the identified parameter space. This is often helpful since WinBUGS grinds way rather slowly sometimes when you impose mean-deviations in the program, but it is a breeze to impose ex post on the WinBUGS output in R.

* reflection is trivial — with scale and location fixed you've got local identification, while the reflection invariance indicates lack of global identification (see the Rivers paper). I think this is not a big problem in one dimension; e.g., I have no problem asserting that Stephens

* your paper talks about "improper variance estimates" in the abstract and conclusion, but I didn't see anything on it in the body of the paper.

cheers

— simon

Andrew commented:

Hi Simon. Thanks for the comments. In response:

1. Nonidentifiabilities. The 3-parameter logistic model we fit has three nonidentifiabilities: the translational invariance that you quoted, the scale invariance you mention right after our paragraph you quoted, and the reflectional invariance you mention later in your comment. In the paragraph you cited, we were only referring to the first of the nonidentifiabilities (hence we wrote, "One way to resove this nonidentifiability…"

2. I agree with you completely on the desirability of an individual-level "agent" model, which ideally (as in your work) can imply a statistical model. In our paper, we took the statistical model as given and focused on how to fit, display, and check the model. This is not meant to detract from the goal of having an underlying model with a plausible story.

In fact, from the Bayesian point of view, having that good story is the first step toward being able to set up realistic informative prior distributions. (We discuss this, in a completely different context, in our JASA paper, "Physiological pharmacokinetic analysis using population modeling and informative prior distributions" from 1996.)

3. I will change from "3-parameter Rasch model" to "3-parameter logistic model".

4. Yes, identification is relevant in Bayesian inference. When building and improving a model, it's important to know what part of the model the information is coming from. From that perspective, looking at identifiability is a form of sensitivity analysis.

5. At the end of your note you mention the reflection invariance problem. We will have some new ideas (motivated by suggestions of Eric Loken) on this in the revision that should be done by mid-Nov.

Aleks commented:

From what I have seen, ideal points still seem to be best suited to the problem of explaining the correlations between voters, even beyond mere interpretability. However, one has more faith in a particular pattern in data if a number of models show it, and this is indeed what has happened.

Perhaps the data mining/information retrieval researchers will nevertheless find it interesting that the models designed for very different kinds of data yield 'useful' results in roll call analysis.

Aleks commented:

I'm not judging the methods either. In fact, ideal points still seem to be

best-suited to this kind of data. However, one has more faith in a

particular pattern in data if a number of models show it. We have tried to

write the paper partly for the data mining crowd, and they probably wouldn't

want to get into the congressional decision making models.

Agreed about the number of dimensions. I find it a question of pragmatically

choosing the "level of detail". However, it is important to verify that the

model isn't overparametrized. I really like the Jackman et al curved error

contours on the Flash page.

Your model is interesting: I familiarized myself with item-response, which I

didn't know before; I could also compare the de Leeuw's approach with your

Bayesian one – these two models are quite similar. BTW, an obvious next step

in this stream is to try to characterize both the votes and the voters, not

just the voters.

As for interactions, one way of dealing with this might be information

geometries, an emerging toolkit that has been originally thought up by R. A.

Fisher, has been pinpointed in computational neuroscience and information

theory, and is now slowly entering into machine learning. The main idea is

that one should express the parameter space as a topologically well-behaved

geometric object. In estimation, one then benefits from being able to move

around this manifold effectively. In a Bayesian context, however, one can

sample more intelligently by partitioning the manifold. Two relevant papers:

http://people.csail.mit.edu/u/j/jrennie/public_ht…

archy-01.pdf http://arxiv.org/abs/nlin.AO/0408040 A MCMC approach to

modelling a network of overlapping 2-way interactions is by Murray and

Ghahramani at

http://www.gatsby.ucl.ac.uk/~iam23/pub/04blug_uai… These

papers might be obscure from your point of view.