Comments on: Adiabatic as I wanna be: Or, how is a chess rating like classical economics?

By: Sherman Dorn

Sherman Dorn — Sun, 05 Apr 2015 16:25:28 +0000

Similar phenomenon in the intellectual history of demography — the mid-20th century is full of great analyses with a “stable population model” as the starting point. I put that phrase in quotation marks because it’s a term of art referring to equilibrium in fertility and mortality rates. In reality, a stable population can be growing or shrinking. (For the constant-sized population with constant fertility and mortality rates, you get an even more simplified stationary-population model.)

During the 1970s, several demographers worked to extend the stable population model to cases where mortality rates fluctuate across time and age — i.e., the real world. Eventually, Ansley Coale and Sam Preston came up with a new synthesis (published in 1982 https://www.jstor.org/stable/2735961 ). Only cited 184 times, according to Google Scholar. Sometimes the citation gods do not really want a closer approximation to reality.

By: Eric Rasmusen

Eric Rasmusen — Sat, 28 Mar 2015 02:55:21 +0000

Economics has a terminology problem. “Classical economics” refers to the economics from Adam Smith to Marx, before the idea of marginal utility and marginal product was developed around 1880. Before that, one of the very biggest questions was how to define “value” (e.g., the labor theory of value); after that, the question was seen as vacuous. People use “neoclassical economics” for modern price theory, but that’s an odd name, I think. So I don’t know what to call standard economics.

By: Bob Carpenter

Bob Carpenter — Thu, 26 Mar 2015 01:40:20 +0000

In reply to Rahul.

There was a Kaggle competition to beat Elo at predicting matches. I don’t think you got a lot of covariate info other than which color pieces each player had. I entered a hierarchical Bradley-Terry model I built in BUGS (I was just learning stats back then, so it was pretty naive), but it didn’t do so well.

There’s a fundamental problem in the Elo model with the basics — white has an advantage and there can be ties. Black should get a bump up for a tie and white a bump down, and the bump up for winning as white should be less than winning as black.

Here’s the competition:

https://www.kaggle.com/c/chess

I was entry HiBa at 0.72 log loss, #65 of 252 on the public leaderboard based on root mean square error, whereas the winner was 0.64. In more recent competitions, Kaggle’s moved to using log loss, which is what statistical models typically fit. I wonder what the leaderboard would’ve looked like under that eval.

Here’s a description of the winning system:

https://blog.kaggle.com/wp-content/uploads/2011/02/kaggle_win.pdf

I was amused by the comment in the conclusion of the note that L-BFGS tends to overfit. It’s just an optimization procedure and you either find the optimum of the function you’re trying to optimize or you don’t. Overfitting has nothing to do with it. Technically, what’s going on is that early stopping in stochastic updates doesn’t actually fit the model in question, but pefoforms a kind of ad-hoc regularization. This kind of procedure (and confusion with model fitting) is pretty widespread in machine learning, where unlike Andrew, people are quite happy with procedures if they have good predictive properties.

Here’s the BUGS model I used:

model {
  delta ~ dnorm(0,1) I(0,)
  gamma ~ dnorm(0,1) I(0,)
  tau ~ dnorm(0,1) I(0,)
  for (j in 1:J) {
    alpha[j] ~ dnorm(0,tau)
  }
  for (n in 1:N) {
    qw[n] <- exp((alpha[W[n]] + gamma) - alpha[B[n]])
    qb[n] <- exp(alpha[B[n]] - (alpha[W[n]] + gamma))
    qd[n] <- exp(delta)
    z[n] <- qw[n] + qb[n] + qd[n]
    p[n,1] <- min(1.0,max(0.0,qw[n]/z[n]))
    p[n,2] <- min(1.0,max(0.0,qb[n]/z[n]))
    p[n,3] <- min(1.0,max(0.0,qd[n]/z[n]))
    y[n] ~ dcat(p[n,])
  }
}

By: Bob Carpenter

Bob Carpenter — Thu, 26 Mar 2015 01:29:27 +0000

In reply to Andrew.

Thanks for the clarification. I understood that it’s described (at least on the Wikipedia) as a static Bradley-Terry model (though not using that terminology — Wikipedia’s not very good at connecting different definitions of the same thing).

But given the Elo procedure, couldn’t I come up with another model that it matched? Off the top of my head, I’d say it’s a way of estimating ability[t+1] given ability[t]. Of course, time isn’t really time, but number of games played. In that way, it looks like any other autoregressive model to me. Or something like how a Kalman filter is typically conceived (an update procedure over time rather than as a static hidden Markov model) or how sequential Monte Carlo attempts to fit general models with its updating procedure.

By: Rahul

Rahul — Wed, 25 Mar 2015 20:23:48 +0000

So how much better is Glickman’s rating system than Elo?

Or, to start at the basics, what’s a good quantitative metric to assess a rating system’s performance with?

By: Andrew

Andrew — Wed, 25 Mar 2015 14:54:35 +0000

In reply to Bob Carpenter. Bob: Elo as applied is a dynamic procedure---as you say, ratings change over time---but its estimates can be derived from a static statistical model in which underlying abilities are fixed. Glickman's point is that a dynamic model can yield improved inferences in the real dynamic world.

By: Bob Carpenter

Bob Carpenter — Wed, 25 Mar 2015 13:09:15 +0000

On a more serious note, I'm not sure what Andrew means by saying that Elo is a static model. It seems to me that its major trait in practice is how scores are dynamically updated after matches. Many chess fan sites track the ratings of players over time with the understanding that abilities change. From a modern (i.e., 1950s) perspective, Elo looks like the Robbins-Monro stochastic update algorithm (1951) applied to a (rescaled) Bradley-Terry model (1952). I have no idea when Elo invented Elo, but if it was later than 1952, it's another piece of evidence for Andrew's claim that every model someone dreams up was already developed by a psychometrician in the 1950s.

By: hjk

hjk — Wed, 25 Mar 2015 12:52:49 +0000

Interesting post, thanks.

Given the title though I was hoping for more ‘quasi-‘ prefixes…

By: Bob Carpenter

Bob Carpenter — Wed, 25 Mar 2015 12:42:28 +0000

In reply to Andrew. re. #2: Don't listen to Andrew, "ELO" is just an acronym for "evaluating levels of." Seriously, though, I wish Bugs [sic] was named after Mr. Bunny the way Stan is named after Mr. Ulam.

By: Andrew

Andrew — Wed, 25 Mar 2015 11:18:28 +0000

In reply to Rahul.

Rahul:

As I wrote: “improved predictions, more sensible estimates, and perhaps less need for fudge factors to correct problems.” All of these relate to the outputs. Again, the point is not that the model should include various aspects for its own sake, the point is that the previously-existing method had problems, and it makes sense that these problems can be reduced by adding a dynamic component to a model.

To draw a simple analogy, suppose you had a linear model that was behaving poorly at the extremes, and suppose that in addition you had good theoretical reasons for thinking that the underlying relation is nonlinear. Then a natural step would be to move to a nonlinear model.

And, yes, of course Glickman has compared his rating system to Elo. That’s the point. This whole field is much more empirical than you seem to imagine. The improvements in the model are directly motivated by problems with the existing approach. And, of course, it is a tribute to the existing approach that it has been used extensively enough for these flaws to become apparent.

By: Rahul

Rahul — Wed, 25 Mar 2015 03:50:15 +0000

In reply to Andrew. The core of my disagreement is with your #3. You seem to be judging the quality of a model on the basis of what goes into it & how the model itself looks like. I would judge a model more on its fidelity of its outputs than the richness of its inputs & structure. To me it is not at all self-evident that "A model that allows abilities to change should do better" Maybe it "should" but does it in practice? PS. No offense intended Mister Arpad Elo. PPS. When you say that in Chess "complex ratings models outperform the simple" do you have a specific model in mind? If so it might make sense to compare that specific model's features / performance to Elo ratings.

By: Andrew

Andrew — Wed, 25 Mar 2015 03:32:14 +0000

In reply to Rahul.

Rahul:

1. In the chess ratings example, the complex model outperforms the simple model. But if a simple model that has evident problems outperforms the complex model, then there is clearly a problem with the complex model and it should be fixed in some way, perhaps via soft constraints on parameters (i.e., informative prior distributions).

2. It’s Elo, not ELO. No big deal but Mr. Elo should get the credit here.

3. I’m no expert on this one. But, as I noted above, the Elo ratings are implicitly based on a model of unchanging abilities. A model that allows abilities to change should do better. This should show up as improved predictions, more sensible estimates, and perhaps less need for fudge factors to correct problems. The Elo system needs a bunch of fudges to keep it working, and some of these difficulties are analogous to the familiar age-period-cohort problem in demography.

To put it another way, the Glicko system was, I believe, motivated by various well-known empirical problems with the Elo system. It was not a case of art for art’s sake.

By: Rahul

Rahul — Wed, 25 Mar 2015 03:14:46 +0000

In reply to Andrew.

Sometimes when a simple model outperforms a complex one, it may be time to give the complex model a shave with Occam’s razor?

Do we dwell too much on the philosophical structure of a model when we ought to focus on it’s empirical performance instead?

e.g. In the context of a post mentioning ELO ratings, what’s the baseline performance metric? What was the predictive performance of ELO in (say) predicting all competition games last year? What is it exactly that we seek to improve and how bad it the current model?

By: Andrew

Andrew — Wed, 25 Mar 2015 02:21:48 +0000

In reply to zbicyclist.

Zbicyclist:

Agreed. In writing the above post, I’m not intending to say that classical economics or the Elo rating system is useless—far from it! Rather, I find it interesting that these two very useful frameworks have in common that they are static models built for the purpose of understanding and tracking changes.

By: zbicyclist

zbicyclist — Wed, 25 Mar 2015 00:35:10 +0000

In reply to Rahul.

Also, there’s a question of what logically is likely to come first. To oversimplify, Elo was a simple and elegant model, that worked enough to get the various warring factions of chess to adopt it. And it had flaws, but these are (a) relatively minor relative the metrics that existed before, and (b) didn’t all become apparent until the system was used.

If Glickman had come first, would his system been widely adopted? Or did he need to stand on the shoulders of giants?

Similarly, equilibrium models are simpler and predate dynamic models — but I know too little about the history of economics to comment further.

By: Andrew

Andrew — Tue, 24 Mar 2015 21:46:44 +0000

In reply to Rahul. Rahul: In the immortal words of Radford Neal,

Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well.

By: Rahul

Rahul — Tue, 24 Mar 2015 20:29:47 +0000

A more complex structure in a model does not necessarily lead to a more useful model, right? i.e. including non-equilibrium effects in a chess rating model is not guaranteed to make it any more accurate?

By: Corey

Corey — Tue, 24 Mar 2015 19:44:37 +0000

Reminds me of a little physics problem from my chemical engineering education: a large cylindrical tub with a small drain pipe at the bottom is filled with water to a given height, and the instantaneous flow rate of water out of the drain pipe is to be calculated. A technically correct calculation would model height, pressure at the exit, and flow rate as time-varying; the static approximation is that the height of water is constant, and this gives a highly accurate answer if the height of water is barely changing.

By: Phil

Phil — Tue, 24 Mar 2015 19:13:37 +0000

In reply to Clyde Schechter.

Yes, objects wear out and food is consumed, so unless new stuff is built and new food is grown — “energy is added to the system” — the system will stop. I don’t think Andrew is claiming any profound insights here.

By: michał

michał — Tue, 24 Mar 2015 18:40:18 +0000

Poland <3
https://pl.wikipedia.org/wiki/Glicko

By: Andrew

Andrew — Tue, 24 Mar 2015 17:58:42 +0000

In reply to Dale Lehman.

Dale:

I can well believe I’m mixing up some terms and ideas here. I think the concepts discussed in the above post are related to each other, but I’m neither an expert nor well-read in these areas (except for the part about statistical models for time-varying parameters), so I’m just trying to emit some thoughts. I don’t see this post as any kind of definitive statement. Comments like yours are helpful.

By: Clyde Schechter

Clyde Schechter — Tue, 24 Mar 2015 16:43:37 +0000

“Person A sells object X to person B at price Y because A and B have different resources and preferences; once the object is sold, under the usual theory it will not be sold back. In that sense, economic transactions go “downhill,” and the economy would grind to a halt if new “energy” were not added into the system in the form of individuals moving, growing, being born and dying, and sor forth.”

Maybe I’m missing something here. But for many, maybe even most objects, X will either be consumed (e.g. food) or will eventually wear out (e.g. clothing, cars, appliances), necessitating new transactions to replace or repair X. So I don’t see the economy grinding to a halt as described.

By: Dale Lehman

Dale Lehman — Tue, 24 Mar 2015 15:20:56 +0000

I find this post confusing. I think you are confusing statics/dynamics and equilibrium/disequilibrium. Nothing in economic theory requires that things be static. There can be an equilibrium and it will adjust as circumstances change. Thus, we move from one equilibrium to another and the usual tools of economics try to trace and disentangle these paths – such as identifying the supply and demand curves from historical equilibrium points.

There are other schools of thought (notably the work of Kornai) that critique the emphasis on equilibrium models. Equilibrium models generally say little about the adjustment to equilibrium (dynamics and disequilibrium). Arguably, these may be more interesting and more important than the static equilibria that are more typically analyzed by economists. But both types of work exist and it is debatable which is more important or relevant.

I think it is your comments about the economy grinding to a halt in equilibrium or becoming irrelevant if the economy were ever in equilibrium that confuse these matters. In equilibrium things only grind to a halt until something changes – and it always does. This need not undermine the usefulness of equilibrium theories. However, it might, as Kornai and others have argued. The issue is whether the adjustments from one equilibrium to another are more or less important than the properties of the equilibrium points themselves. And, that may differ depending on circumstances. In markets where information is symmetric and readily available, analyzing equilibria may be appropriate. The more imperfect and asymmetric information becomes, the more the dynamics and transition are really of interest and not the equilibrium points themselves.

How this affects empirical work is beyond my limited capacities at this point in my career, but I think that is what you should be focusing on. It may well be that some types of empirical analyses are more or less appropriate depending on whether you are studying diesquilibrium paths or using equilibrium points to estimate static functions that have been perturbed by exogenous factors. But I think the focus on “methods that are used to study change but are based on static models” largely misses the point.