What pieces do chess grandmasters move, and when?

The above image, from T. J. Mahr, is a cleaned-up version of this graph:

which in turn is a slight improvement on a graph posted by Dan Goldstein (with R code!) which came from Ashton Anderson.

The original, looks like this:

This is just fine, but I had a few changes to make. I thought the color scheme could be improved, also I wanted change the order of the pieces on the graph: it didn’t seem quite right to start with the bishop. I’d do some order such as Pawn, Knight, Bishop, Queen, Rook, Castling, King, which is roughly the order that pieces get moved (except that I’ve put Castling between Rook and King because it seems to make sense to go there).

I wonder how Anderson came up with the order in the above graph. Let me look at the code . . . OK, I see, it’s alphabetical! (B, K, N, O, P, Q, R). We don’t like alphabetical order.

But enuf complaining: I should be able to go to the code and clean things up. And, a half hour later, here it is, my (slightly) adapted code:

library(tidyverse)
theme_set(theme_minimal())

mt <- read.csv(url('https://gist.githubusercontent.com/ashtonanderson/cfbf51e08747f60472ee2132b0d35efb/raw/80acd2ad7c0fba4e85c053e61e9e5457137e00ee/moveno_piecetype_counts'))

mt$piece_type <- factor(mt$piece_type, levels=c("P","N","B","Q","R","O","K"))

mt <- mt %>%
  group_by(move_number) %>% 
  mutate(tot = sum(count),frac = count/tot)

p <- ggplot(mt %>% filter(move_number <= 125),aes(move_number,frac)) + 
  geom_area(aes(fill = piece_type), position = 'stack') + 
  scale_fill_brewer(type='qual',palette=3,name='Piece type', labels=c("Pawn","Knight","Bishop","Queen","Rook","Castling","King")) + 
  theme(panel.border=element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + 
  xlab('Move number') + ylab('') + 
  scale_y_continuous(labels = scales::percent, breaks=seq(0,1,0.2))

p

The only things I did were change the order of the pieces, cut down on the y-axis labeling (I'd also like to add tick marks and change the sizes and locations of the axis labels but I don't know how to do that in ggplot2). Also, just for laffs, I extended the x-axis to 125 moves, cos why stop at 80?

The result is the second graph above.

I prefer it to the original. To me, the generally monotone pattern allows me to see what's happening more clearly, whereas in the original, I had to spend a lot of time going back and forth between the legend and the curves. Even better would be to label the filled area directly; I don't know how to do that in ggplot2 either, but I'm sure it's easy enough for those who know the proper function call, and indeed Mahr did that for us to produce the graph shown at the top of the page.

Also there's some glitch where there's some white space in some of the early moves. I don't know where that's coming from, but I see some if it in the original graph too.

In any case, hats off to Anderson for posting his data and code (and Goldstein for sharing) so that the rest of us can easily play with it all.

P.S. Anne Pier Salverda made some new graphs for us:

And here's the code:

library(tidyverse)
library(ggridges)
theme_set(theme_minimal())

mt <- read.csv(url('https://gist.githubusercontent.com/ashtonanderson/cfbf51e08747f60472ee2132b0d35efb/raw/80acd2ad7c0fba4e85c053e61e9e5457137e00ee/moveno_piecetype_counts'))

mt$piece_type <- factor(
  mt$piece_type,
  levels = c("P","N","B","Q","R","O","K"),
  labels = c(
    "Pawn", "Knight", "Bishop", "Queen", "Rook", "Castling", "King"
  )
)

n_games = sum(mt[mt$move_number == 1, "count"])

mt <- mt %>%
  group_by(move_number) %>%
  mutate(
    tot = sum(count),
    frac = count / tot,
    frac_games = count / n_games
  ) %>%
  ungroup()

ggplot(mt %>% filter(move_number < 100),
       aes(x = move_number, y = piece_type, height = frac)) +
  geom_ridgeline(
    stat = "identity",
    col = "gray60",
    fill = "gray90"
  ) +
  labs(
    title = "Normalized by number of surviving games",
    x = "Move number",
    y = ""
  )
ggplot(mt %>% filter(move_number < 100),
       aes(x = move_number, y = piece_type, height = frac_games)) +
  geom_ridgeline(
    stat = "identity",
    col = "gray60",
    fill = "gray90"
  ) +
  labs(
    title = "Normalized by total number of games",
    x = "Move number",
    y = ""
  )
mt %>%
  filter(move_number < 125) %>%
  group_by(move_number) %>%
  summarize(n_games = sum(count)) %>%
  ggplot(aes(x = move_number, y = n_games)) +
  geom_line(col = "gray60") +
  scale_y_continuous(labels = scales::comma) +
  labs(
    x = "Total number of moves",
    y = "Number of games"
  )

Maybe we'd also like to see that first set of plots redone, normalizing by the number of pieces of that type remaining in the game.

31 thoughts on “What pieces do chess grandmasters move, and when?

    • I would guess white spaces are grandmasters knocking the board over and declaring they are too good to waste their time on such a stupid game.

      As one can clearly notice, grandmasters only practice this move at the second or fourth turn, otherwise it does not look so effective.

    • The white space is caused by the smoothing scale.
      It can be fixed by adding the following code after defining the piece_type factors:

      unique_moves <- expand.grid(move_number = unique(mt$move_number), piece_type = unique(mt$piece_type))

      mt <- merge(unique_moves, mt, all.x = T)

      mt[is.na(mt)] <- 0

  1. Whitespace is caused by the smoothing to continuous scale. Fixed it here:

    library(tidyverse)
    theme_set(theme_minimal())

    mt <- read.csv(url('https://gist.githubusercontent.com/ashtonanderson/cfbf51e08747f60472ee2132b0d35efb/raw/80acd2ad7c0fba4e85c053e61e9e5457137e00ee/moveno_piecetype_counts&#039;))

    mt$piece_type <- factor(mt$piece_type, levels=c("P","N","B","Q","R","O","K"))

    unique_moves <- expand.grid(unique(mt$move_number), unique(mt$piece_type))
    names(unique_moves) <- c('move_number', 'piece_type')

    mt <- merge(unique_moves, mt, all.x = T)

    mt[is.na(mt)] <- 0

    mt %
    group_by(move_number) %>%
    mutate(tot = sum(count),frac = count/tot)

    p % filter(move_number <= 125),aes(move_number,frac)) +
    geom_area(aes(fill = piece_type), position = 'stack') +
    scale_fill_brewer(type='qual',palette=3,name='Piece type', labels=c("Pawn","Knight","Bishop","Queen","Rook","Castling","King")) +
    theme(panel.border=element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
    xlab('Move number') + ylab('') +
    scale_y_continuous(labels = scales::percent, breaks=seq(0,1,0.2))

    p

  2. This would be the quick and dirty way to get direct labeling.

    library(ggplot2)
    library(magrittr)
    library(dplyr)
    theme_set(theme_minimal())

    mt <- read.csv(url('https://gist.githubusercontent.com/ashtonanderson/cfbf51e08747f60472ee2132b0d35efb/raw/80acd2ad7c0fba4e85c053e61e9e5457137e00ee/moveno_piecetype_counts&#039;))

    mt$piece_type <- factor(mt$piece_type, levels=c("P","N","B","Q","R","O","K"))

    mt %
    group_by(move_number) %>%
    mutate(tot = sum(count),frac = count/tot)

    p % filter(move_number <= 125),aes(move_number,frac)) +
    geom_area(aes(fill = piece_type), position = 'stack') +
    scale_fill_brewer(type='qual',palette=3,name='Piece type', labels=c("Pawn","Knight","Bishop","Queen","Rook","Castling","King")) +
    theme(panel.border=element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
    xlab('Move number') + ylab('') +
    scale_y_continuous(labels = scales::percent, breaks=seq(0,1,0.2))

    p +
    annotate("label", x = 60,
    y = c(.15, .4, .60, .70, .80, .95),
    label = c("King", "Rook", "Queen", "Bishop", "Knight", "Pawn")) +
    annotate("label", x = 10, y = 0.05, label = "Castling")

    https://imgur.com/niQ0KpC

    It would require some more manual fiddling with positions, though.

  3. After all the posts and comments we’ve had about choosing colors for graphs, in these graphs I cannot distinguish the colors for castling from queen (they both look like the same shade of red to me). And I can just barely distinguish king from bishop. I have moderate but not total red-green color blindness.

    For me, black or dark gray for castling would have worked. Its small area would prevent its darkness from dominating the visual impression.

  4. I am pretty sure the white space is the result of drawing smooth lines for the discrete data when some regions can have a value of zero. For whatever reason (maybe a bug, maybe a deliberate design choice), the software refuses to fill in the color at all prior to the move at which it first attained a nonzero value. For example, the leftmost white triangle in the first chart “should” be light green (bishop).

  5. I wonder about the oscillations increasing at higher move numbers. I can understand fewer pawn moves (fewer pawns, many blocked) but why the visible ups and downs for the percentages for the other pieces?

  6. I hate stacked area graphs. Whenever I see a y-axis that goes from 0% to 100%, my instinct is to interpret it just as I would the identical axis in a line or bar graph: points with higher y-coordinates represent higher percentages. So I start out thinking, for example, that the frequency of pawn use ranges from 0% to 100% as the game begins but converges to near 100% by the end. That makes no sense, of course, but in a world where people frequently use terrible Word/Excel defaults or even make up custom formats, it takes a moment for me to decide what I’m looking at. The problem is that, unique to stacked area graphs, the orientation of the y-axis is arbitrary: you’d interpret the graph the same way if the zero-point were at the bottom or the top or even in the middle.

    I guess the easiest fix would be if we all agree that the graph title of all such graphs should start with “Stacked area graph: …” and I’d know to shift my interpretation. But warning me of the arbitrariness of the scale doesn’t make it less arbitrary. I actually think that the most informative and intuitive scale, at least in this case, would be on the right side of the graph, with a bracket for each piece/area/color indicating its absolute proportion of the total at the end. Something like (%’s made up):

    } Pawn: 5%
    } Knight: 10%

    } Bishop: 15%

    } Queen: 20%

    } Rook: 20%

    } King: 30%

    Anyway, that’s my graphical soapbox.

    • I hate stacked area graphs.

      I’m not a big fan, either. It’s really hard not to see this as a bunch of upward trends, whereas it’s nothing of the kind. I think just a plot for each piece would work just fine here, faceted or overlaid.

        • Also, notice how deceptive the stacked-area chart is about the oscillations to the right. The oscillations look to be mostly due to king moves, but it ripples upwards because of the stacking. The areas above king might be very smooth, but who can tell?

    • In addition to the color problems, this kind of graph is simply bad because you can tell almost nothing quantitative from it, not even most of the trends. Which piece(s) tend to get moved more and more often above 75th move? We can’t tell. What percentage of moves at move 100 are knight moves? We can’t tell. Are bishops moved more often than queens in the end game? Hard to tell.

      • Tom:

        I disagree with your assessment of the graph as “bad.”

        Yes, more can be learned with more graphs, that’s fine. Indeed, one of the things that I say a lot about statistical graphics is that it’s a mistake to think that one graph can do it all. So, sure, make lots more graphs—that would be a great idea.

        But I, for one, found the top graph above to be fascinating and informative. It gives me a sense of the rate of pawn moves, it tells me that king moves stabilize at a bit under 40%, it tells me that rook moves are eventually more common than queen and bishop moves combined, it tells me that castling occurs mostly between moves 5 and 15, etc.

        You may be more of an expert in chess than I am, so maybe these patterns I noticed were things you already knew.

        In any case, yes, more graphs would be great. I’d guess there’s lots more that can be learned from this data archive.

        • Andrew, I shouldn’t have said “bad”. That was an effort to keep the comment condensed, but I agree, it’s too simplified. And I was never much of a chess player, either (I liked Go better, and wasn’t good at it either). But compare your version with the third graph, which you called “the original”. They look quite different, especially in the earlier parts of the game. I find it hard to decipher the differences and understand that it’s really the same data (that’s the case, isn’t it?).

          And to give your version (and even better, T. J. Mahr’s version) some credit, it looks attractive, and has the feel of presenting a coherent narrative. Those are worthy attributes.

        • Its interesting to me because it conveys the standard approach to and development of a game. The surprising thing in it to me is the relatively small number of moves by queens. Is this because they’re used to pin down sections of the board from a protected distance?

        • There is only one queen. Sounds like the analysis should be normalized to account for this.

          The more I think about what is being measured, the less I understand what it means.

    • Anon:

      I know, I know. I’ve always played by first moving the king a bunch, then the rooks, then the queen, etc., not really moving my pawns much at all until move 100 or so—that’s when things start to get interesting.

      Next time I’ll have to change the order of moves and see how things go.

      • Looking at the data, it looks like players castled in about 83% of the games, which is higher than I would have guessed. I always assumed that my affinity for what felt like a gimmicky move marked me as the novice I am. Which, okay, it still might do, but if the path to mastery is fine-tuning the order of moves, I’m further along than I thought.

  7. I can’t find the original lineage of that move data, but it would be interesting to differentiate the sets by the ultimate winner and the loser of the games… might dig into that myself sometime; I would be curious if it made a difference. Also maybe black vs. white? After that I have a million questions, but they’re all more involved. Very cool tho, full props.

  8. The graph might be a bit more technically correct if the horizontal axis were changed from “Move Number” to “Point Value of Remaining Pieces” (PVRP). The stage of the game is more important than the raw number of moves played. The opening is going to have a lot of pawn, knight, and bishop moves, while the end game is going to be more dominated by kings and rooks. You could also put vertical lines to delineate Opening, Middle Game, and End Game.

    Also, is sample attrition (because games end at different points) interesting in any way? When you get to the endgame with a queen, the endgame ends really quickly, while king-and-pawn endgames go on longer. So maybe queen endgames are underrepresented. You could normalize the horizontal axis to run from 0% to 100% of endgame moves rather than just raw moves to adjust for this.

  9. Would it also be interesting to have something in here about the number of pieces on the board as the moves progress. I would expect the pawns to have a high proportion of moves early on as there are lots of them. Similarly, the only piece that can be guaranteed to be present while the game is ongoing is the king, so you might expect a higher number of king moves later on. Or is this just a silly train of thought?

    • “the only piece that can be guaranteed to be present while the game is ongoing is the king, so you might expect a higher number of king moves later on. Or is this just a silly train of thought?”

      Looks to be a very subtle observation!

      Let’s simplify to get some intuition. Say there are only two types of ending, king-and-pawn, and king-and rook. Let’s also say the king only moves 40% of the time in each game, while the pawn or rook moves 60% of the time. Play is dominated by the non-king piece, but the calculated move counts will be:

      King: 40%
      Pawn: 30%
      Rook: 30%

      This could rock the foundations of Chess-Piece-Move-Frequency-Study departments around the world!

    • Your point is kind of profound.

      What are we actually measuring? We are measuring a combination of the probability a piece remains on the board and the probability the piece is moved given it remains on the board. I don’t know what this means or why we should care. Intuitively we really want the second, the probability a piece is moved given it remains on the board.

Leave a Reply to Henrique Velasco Cancel reply

Your email address will not be published. Required fields are marked *