A question about poker

I’m too tired to think about this one, but maybe some of you out there have some ideas.

Chaz Littlejohn writes:

My question has to do with how to model for selection bias when the outcome equation is an unordered categorical variable which is only observed on the basis of an auxiliary selection equation.

I’m working with a dataset of texas hold’em poker hands and I’m trying to build a model for predicting the probability of a player holding each of 4 hand types pre-flop – high pair (AA-99), small pair (88-22), high cards (AK-JT & Ax), and small cards (everything else) from a set of X observable game characteristics. It seemed appropriate to model the outcome variable as an unordered logit/probit rather than an ordered since different factors likely influence the four different hand probabilities. For example, in some situations a player is likely to be holding a high pair or a complete bluff while in other situations the ranking may be more straightforward.

The second factor in my model decision is that we only observe hands that are shown at the showdown. Thus, if the error term of the ‘showdown’ selection equation is correlated with the error term of the ‘hand type’ categorical equation then estimation using only the complete data would yield biased coefficients. This is similar to what you’d see from trying to determine the wage offers made to individuals if you only observed wage rates for the individuals who worked. The ‘Heckit’ method of first estimating a probability of observing the outcome and then incorporating a transformation of these predicted probabilities as an additional explanatory variable can be used to correct for this bias.

Provided I figure out a clever way of estimating the the ‘showdown’ selection equation with variables uncorrelated with the ‘hand type’ outcomes, what options do I have for combining the selection model with the multinomial logit ‘hand type’ model? There is a heckman probit function (heckprob) in Stata but I don’t know of anything like that in R other than the two-step heckit for continuous outcomes. Neither statistical package has a native multinomial logit with selection function.

If I fit a separate heckprob model in Stata to for each hand type – that is, a heckprob regression for y = 1 vs y = 2, … , K; then if y>1 a heckprob regression for y = 2 vs. y = 3, …, K; and so on (as described in ARM p 123), should I be worried about excluding observed values of y=1 when estimating y = 2 vs. y = 3, …, K; when I have no way of excluding the records with unobserved outcomes? I’ve also read that selection can be modeled with a logit instead of a probit equation (Vella 1998). Would that better lend itself to a multilevel multinomial logit model with selection correction?

2 thoughts on “A question about poker

  1. I think the endogMNP package might be what you are looking for. It currently handles unordered probit selection and outcomes.

  2. This is a good question and I don't have a good answer for him.

    But he has another problem — I'm not sure about his hand categories. 99 and 88 are more similar than 88 and 22 (as an example) but 99 and 88 are in seperate groups while 88 and 22 are in the same group.

    All of his groups have that kind of overlap where the variance within the group is higher than at least some of the hand-to-hand variance between groups.

Comments are closed.