World record running times vs. distance

Julyan Arbel plots world record running times vs. distance (on the log-log scale):

The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable.

Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution:

The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

21 thoughts on “World record running times vs. distance

  1. I think most folks who participate in or observe endurance events are used to thinking in terms of times and distances, so would like the existing plot more than they would like speed versus distance.
    (e.g. I know my best half-marathon and marathon times but couldn’t tell you the corresponding paces off the top of my head; similarly for world records).

  2. This is exactly analogous to metal fatigue, where lifetime is related to cyclic stress (force) or strain (displacement) through a power law relationship. Other failure mechanisms follow similar, if less-well-defined relationships.

  3. I’m missing what’s new here? Isn’t this well studied in exercise physiology? I am out of my area of expertise, but I won’t let that stop me.

    Instead of running why not use cycling where the energy expenditure (and therefore the physiological stress) is routinely and precisey measured with power meters? Then the relation could get even better resolution. Do a regression of power vs. time and work backwards to derive the aerobic beta of Savaglio and Carbone from Monod and Scherrer’s critical power model: http://www.tandfonline.com/doi/abs/10.1080/00140136508930810. See it in personal records rather than world records.

    In bike racing we calculate our training zones from “functional threshold power” which is roughly the same as critical power. You can even find online calculators: http://www.cyclingpowermodels.com/MonodCriticalPower.aspx

  4. I like the speed v. time better as well. It appears to show that people can sprint for about 100 seconds.

    It would be interesting to see similar graphs for other animals. Is this ~100s limit a universal physiological limit (a result of the ways our cells generate anaerobic energy?) or is the limit different in different species?

    • Yes. After ~2 mins all-out, you’ve completely used up the energy of both your phosphogen (5-15 sec) and your lactic acid/glycolytic systems, and energy can olny be delivered to muscles through aerobic metabolism.

  5. It seems plausible to me that the 100 km WR might be a little “soft”, which is why it falls a bit above the line extrapolated from other long distance WRs on the Arbel plot. The pool of people who attempt 100 km is much smaller than the pool of people who attempt marathon and sub-marathon distances. The world’s best ultramarathoners may actually be people who’ve never attempted an ultramarathon.

    It seems less plausible to me that the 100 m WR would also be a little “soft” compared to what you’d predict by extrapolating the WRs from other shorter distances. Looking at the Arbel plot, I wonder if the 200 m WR isn’t actually a slight outlier on the low side, rather than the 100 m WR being a slight outlier on the high side.

    • Epanechnikov:

      The linked paper is great, but I think it’s a different topic. Clauset et al. are talking about the probability distribution of a single variable, whereas the graphs above are plots of y vs. x for two variables. The plots look the same but, as far as I can tell, the problems are completely different. (I’m open to clarification on this, though.)

  6. Power-law regressions are indeed a separate problem from power-law distributions, calling for quite different methods. I have seen claims that when the noise in the response is additive, using nonlinear least squares is much more efficient than log-transforming both variables, but haven’t looked into that deeply. I have looked at some examples of power-law regressions where there is really very little evidence in favor of the power law as opposed to, say, a logistic response curve.

  7. Nice plots. Because I wanted to play around with the data a bit more and potentially also use it for a classroom example, I extended the original Abel data somewhat. In addition to distance and time, I have included gender (and collected female records as well), whether or not the distance is included in the olympic events, the type of race (track vs. road), athlete’s name, and date of record. All data was taken from the current “List of world records in athletics” Wikipedia page (see http://en.wikipedia.org/w/index.php?title=List_of_world_records_in_athletics&oldid=459175337). The data are provided below in CSV format (I hope the line breaks are preserved in this post…). And I also included a few very simple ideas for a first analysis in R. Hopefully, other readers will enjoy playing around with this as well.

    — Run2011.csv —

    distance,olympic,time,gender,type,name,date
    100,yes,9.58,male,track,Usain Bolt,2009-08-16
    200,yes,19.19,male,track,Usain Bolt,2009-08-20
    400,yes,43.18,male,track,Michael Johnson,1999-08-26
    800,yes,101.01,male,track,David Rudisha,2010-08-29
    1000,no,131.96,male,track,Noah Ngeny,1999-09-05
    1500,yes,206.00,male,track,Hicham El Guerrouj,1998-07-14
    1609.344,no,223.13,male,track,Hicham El Guerrouj,1999-07-07
    2000,no,284.79,male,track,Hicham El Guerrouj,1999-09-07
    3000,no,440.67,male,track,Daniel Komen,1996-09-01
    5000,yes,757.35,male,track,Kenenisa Bekele,2004-05-31
    10000,yes,1577.53,male,track,Kenenisa Bekele,2005-08-26
    10000,no,1604,male,road,Leonard Patrick Komon,2010-09-26
    15000,no,2473,male,road,Leonard Patrick Komon,2010-11-21
    20000,no,3386,male,track,Haile Gebrselassie,2007-06-27
    20000,no,3321,male,road,Zersenay Tadese,2010-03-21
    21097.5,no,3503,male,road,Zersenay Tadese,2010-03-21
    21285,no,3600,male,road,Haile Gebrselassie,2007-06-27
    25000,no,4345.4,male,track,Moses Mosop,2011-06-03
    25000,no,4310,male,road,Samuel Kosgei,2010-05-09
    30000,no,5207.4,male,track,Moses Mosop,2011-06-03
    30000,no,5257,male,road,Peter Cheruiyot Kirui,2011-09-25
    42195,yes,7418,male,road,Patrick Makau,2011-09-25
    100000,no,22413,male,road,Takahiro Sunada,1998-06-21
    100,yes,10.49,female,track,Florence Griffith-Joyner,1988-07-16
    200,yes,21.34,female,track,Florence Griffith-Joyner,1988-09-29
    400,yes,47.60,female,track,Marita Koch,1985-10-06
    800,yes,113.28,female,track,Jarmila Kratochvilova,1983-07-26
    1000,no,148.98,female,track,Svetlana Masterkova,1996-08-23
    1500,yes,230.46,female,track,Qu Yunxia,1993-09-11
    1609.344,no,252.56,female,track,Svetlana Masterkova,1996-08-14
    2000,no,325.36,female,track,Sonia O’Sullivan,1994-07-08
    3000,no,486.11,female,track,Wang Junxia,1993-09-13
    5000,yes,851.15,female,track,Tirunesh Dibaba,2008-06-06
    10000,yes,1771.78,female,track,Wang Junxia,1993-09-08
    10000,no,1821,female,road,Paula Radcliffe,2003-02-23
    15000,no,2787.70,female,road,Tirunesh Dibaba,2009-11-15
    18517,no,3600,female,road,Dire Tune,2008-06-12
    20000,no,3926.60,female,track,Tegla Loroupe,2000-09-03
    20000,no,3756,female,road,Mary Keitany,2011-02-18
    21097.5,no,3950,female,road,Mary Keitany,2011-02-18
    25000,no,5225.84,female,track,Tegla Loroupe,2002-09-21
    25000,no,4793,female,road,Mary Keitany,2010-05-09
    30000,no,6350,female,track,Tegla Loroupe,2003-06-07
    30000,no,5903,female,road,Liliya Shobukhova,2011-10-09
    42195,yes,8125,female,road,Paula Radcliffe,2003-04-13
    100000,no,23591,female,road,Tomoe Abe,2000-06-25

    —Run2011.R —

    ## read and transform data
    Run2011 <- read.csv("Run2011.csv")
    Run2011 <- transform(Run2011,
    date = as.Date(date),
    age = as.numeric(Sys.Date() – as.Date(date)) / 365.25,
    speed = 3.6 * distance/time)

    ## plot full data
    library("lattice")
    xyplot(log10(speed) ~ log10(distance), groups = ~ gender, data = Run2011)

    ## similar to Savaglio and Carbone
    panel_smooth <- function(x, y) {
    panel.xyplot(x, y)
    panel.loess(x, y, span = 1)
    }
    xyplot(log10(speed) ~ log10(time) | gender, data = Run2011,
    subset = distance %in% c(200, 400, 800, 1000, 1500, 1609.344, 3000, 5000, 10000, 42195),
    panel = panel_smooth)

    ## first regression analysis (ignoring the changes in distance coefficient)
    m 400 & distance < 100000)
    summary(m)

  8. What about looking at the relationship between time, distance, speed, gender and age? For instance, the NYC Marathon posts the best times by age and gender. There must be similar sites for other events, maybe the Wikipedia site posted above has it. Where is the kink in the performance curve by age? How different are the best times that humans can produce by age by event? How different are men and women by age?

Comments are closed.