World record running times vs. distance

Posted on November 15, 2011 9:18 AM by Andrew

Julyan Arbel plots world record running times vs. distance (on the log-log scale):

The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable.

Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution:

The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

21 thoughts on “World record running times vs. distance”

Brent Buckner on November 15, 2011 10:35 AM at 10:35 am said:

I think most folks who participate in or observe endurance events are used to thinking in terms of times and distances, so would like the existing plot more than they would like speed versus distance.
(e.g. I know my best half-marathon and marathon times but couldn’t tell you the corresponding paces off the top of my head; similarly for world records).
Jon Peltier on November 15, 2011 10:38 AM at 10:38 am said:

This is exactly analogous to metal fatigue, where lifetime is related to cyclic stress (force) or strain (displacement) through a power law relationship. Other failure mechanisms follow similar, if less-well-defined relationships.
Jim on November 15, 2011 11:54 AM at 11:54 am said:

I’m missing what’s new here? Isn’t this well studied in exercise physiology? I am out of my area of expertise, but I won’t let that stop me.

Instead of running why not use cycling where the energy expenditure (and therefore the physiological stress) is routinely and precisey measured with power meters? Then the relation could get even better resolution. Do a regression of power vs. time and work backwards to derive the aerobic beta of Savaglio and Carbone from Monod and Scherrer’s critical power model: http://www.tandfonline.com/doi/abs/10.1080/00140136508930810. See it in personal records rather than world records.

In bike racing we calculate our training zones from “functional threshold power” which is roughly the same as critical power. You can even find online calculators: http://www.cyclingpowermodels.com/MonodCriticalPower.aspx
- Andrew on November 15, 2011 12:00 PM at 12:00 pm said:
  
  Jim:
  
  I never claimed this was new. I just like the graphs. It was new to me, and I thought it might interest others, hence the blog post.
  - Jim on November 15, 2011 1:29 PM at 1:29 pm said:
    
    I assumed (incorrectly?) it was presented by Nature as new research.
Jennie Dusheck on November 15, 2011 12:08 PM at 12:08 pm said:

I like the speed v. time better as well. It appears to show that people can sprint for about 100 seconds.

It would be interesting to see similar graphs for other animals. Is this ~100s limit a universal physiological limit (a result of the ways our cells generate anaerobic energy?) or is the limit different in different species?
- Jim on November 15, 2011 1:25 PM at 1:25 pm said:
  
  Yes. After ~2 mins all-out, you’ve completely used up the energy of both your phosphogen (5-15 sec) and your lactic acid/glycolytic systems, and energy can olny be delivered to muscles through aerobic metabolism.
Jeremy Fox on November 15, 2011 12:09 PM at 12:09 pm said:

It seems plausible to me that the 100 km WR might be a little “soft”, which is why it falls a bit above the line extrapolated from other long distance WRs on the Arbel plot. The pool of people who attempt 100 km is much smaller than the pool of people who attempt marathon and sub-marathon distances. The world’s best ultramarathoners may actually be people who’ve never attempted an ultramarathon.

It seems less plausible to me that the 100 m WR would also be a little “soft” compared to what you’d predict by extrapolating the WRs from other shorter distances. Looking at the Arbel plot, I wonder if the 200 m WR isn’t actually a slight outlier on the low side, rather than the 100 m WR being a slight outlier on the high side.
- Paul on November 15, 2011 12:29 PM at 12:29 pm said:
  
  The ‘soft’ 100m WR in comparison to the 200m WR is due to the time required to accelerate to full speed from a stationary start.
  - Jeremy Fox on November 15, 2011 2:09 PM at 2:09 pm said:
    
    Good point.
Epanechnikov on November 15, 2011 1:38 PM at 1:38 pm said:

See however

http://arxiv.org/pdf/0706.1062v2

(Lots of distributions give you straight-ish lines on a log-log plot)
- Andrew on November 15, 2011 1:43 PM at 1:43 pm said:
  
  Epanechnikov:
  
  The linked paper is great, but I think it’s a different topic. Clauset et al. are talking about the probability distribution of a single variable, whereas the graphs above are plots of y vs. x for two variables. The plots look the same but, as far as I can tell, the problems are completely different. (I’m open to clarification on this, though.)
Cosma Shalizi on November 15, 2011 2:12 PM at 2:12 pm said:

Power-law regressions are indeed a separate problem from power-law distributions, calling for quite different methods. I have seen claims that when the noise in the response is additive, using nonlinear least squares is much more efficient than log-transforming both variables, but haven’t looked into that deeply. I have looked at some examples of power-law regressions where there is really very little evidence in favor of the power law as opposed to, say, a logistic response curve.
Cosma Shalizi on November 15, 2011 2:13 PM at 2:13 pm said:

(hit post too soon)
More broadly, though, Andrew’s right that the specific techniques in our paper don’t apply here.
Achim Zeileis on November 16, 2011 7:54 AM at 7:54 am said:

Nice plots. Because I wanted to play around with the data a bit more and potentially also use it for a classroom example, I extended the original Abel data somewhat. In addition to distance and time, I have included gender (and collected female records as well), whether or not the distance is included in the olympic events, the type of race (track vs. road), athlete’s name, and date of record. All data was taken from the current “List of world records in athletics” Wikipedia page (see http://en.wikipedia.org/w/index.php?title=List_of_world_records_in_athletics&oldid=459175337). The data are provided below in CSV format (I hope the line breaks are preserved in this post…). And I also included a few very simple ideas for a first analysis in R. Hopefully, other readers will enjoy playing around with this as well.

— Run2011.csv —

distance,olympic,time,gender,type,name,date
100,yes,9.58,male,track,Usain Bolt,2009-08-16
200,yes,19.19,male,track,Usain Bolt,2009-08-20
400,yes,43.18,male,track,Michael Johnson,1999-08-26
800,yes,101.01,male,track,David Rudisha,2010-08-29
1000,no,131.96,male,track,Noah Ngeny,1999-09-05
1500,yes,206.00,male,track,Hicham El Guerrouj,1998-07-14
1609.344,no,223.13,male,track,Hicham El Guerrouj,1999-07-07
2000,no,284.79,male,track,Hicham El Guerrouj,1999-09-07
3000,no,440.67,male,track,Daniel Komen,1996-09-01
5000,yes,757.35,male,track,Kenenisa Bekele,2004-05-31
10000,yes,1577.53,male,track,Kenenisa Bekele,2005-08-26
10000,no,1604,male,road,Leonard Patrick Komon,2010-09-26
15000,no,2473,male,road,Leonard Patrick Komon,2010-11-21
20000,no,3386,male,track,Haile Gebrselassie,2007-06-27
20000,no,3321,male,road,Zersenay Tadese,2010-03-21
21097.5,no,3503,male,road,Zersenay Tadese,2010-03-21
21285,no,3600,male,road,Haile Gebrselassie,2007-06-27
25000,no,4345.4,male,track,Moses Mosop,2011-06-03
25000,no,4310,male,road,Samuel Kosgei,2010-05-09
30000,no,5207.4,male,track,Moses Mosop,2011-06-03
30000,no,5257,male,road,Peter Cheruiyot Kirui,2011-09-25
42195,yes,7418,male,road,Patrick Makau,2011-09-25
100000,no,22413,male,road,Takahiro Sunada,1998-06-21
100,yes,10.49,female,track,Florence Griffith-Joyner,1988-07-16
200,yes,21.34,female,track,Florence Griffith-Joyner,1988-09-29
400,yes,47.60,female,track,Marita Koch,1985-10-06
800,yes,113.28,female,track,Jarmila Kratochvilova,1983-07-26
1000,no,148.98,female,track,Svetlana Masterkova,1996-08-23
1500,yes,230.46,female,track,Qu Yunxia,1993-09-11
1609.344,no,252.56,female,track,Svetlana Masterkova,1996-08-14
2000,no,325.36,female,track,Sonia O’Sullivan,1994-07-08
3000,no,486.11,female,track,Wang Junxia,1993-09-13
5000,yes,851.15,female,track,Tirunesh Dibaba,2008-06-06
10000,yes,1771.78,female,track,Wang Junxia,1993-09-08
10000,no,1821,female,road,Paula Radcliffe,2003-02-23
15000,no,2787.70,female,road,Tirunesh Dibaba,2009-11-15
18517,no,3600,female,road,Dire Tune,2008-06-12
20000,no,3926.60,female,track,Tegla Loroupe,2000-09-03
20000,no,3756,female,road,Mary Keitany,2011-02-18
21097.5,no,3950,female,road,Mary Keitany,2011-02-18
25000,no,5225.84,female,track,Tegla Loroupe,2002-09-21
25000,no,4793,female,road,Mary Keitany,2010-05-09
30000,no,6350,female,track,Tegla Loroupe,2003-06-07
30000,no,5903,female,road,Liliya Shobukhova,2011-10-09
42195,yes,8125,female,road,Paula Radcliffe,2003-04-13
100000,no,23591,female,road,Tomoe Abe,2000-06-25

—Run2011.R —

## read and transform data
Run2011 <- read.csv("Run2011.csv")
Run2011 <- transform(Run2011,
date = as.Date(date),
age = as.numeric(Sys.Date() – as.Date(date)) / 365.25,
speed = 3.6 * distance/time)

## plot full data
library("lattice")
xyplot(log10(speed) ~ log10(distance), groups = ~ gender, data = Run2011)

## similar to Savaglio and Carbone
panel_smooth <- function(x, y) {
panel.xyplot(x, y)
panel.loess(x, y, span = 1)
}
xyplot(log10(speed) ~ log10(time) | gender, data = Run2011,
subset = distance %in% c(200, 400, 800, 1000, 1500, 1609.344, 3000, 5000, 10000, 42195),
panel = panel_smooth)

## first regression analysis (ignoring the changes in distance coefficient)
m 400 & distance < 100000)
summary(m)
Xi'an on November 16, 2011 12:19 PM at 12:19 pm said:

Actually, Julyan has nicer graphs on a more recent post of his. It evaluates the density of the joint variable (difference from average time per km on the nth first km, rank in the category) using SAS KDE proc…
- Julyan on November 16, 2011 1:20 PM at 1:20 pm said:
  
  Xian, these ones are Jérôme’s :) I don’t know SAS actually, what is it?
Julyan on November 16, 2011 4:27 PM at 4:27 pm said:

Merci Andrew pour votre intérêt! Thanks for pointing out the different slopes, I’ve worked out why it was less visible on my plot: powers are more sensitive in a speed vs. time than in a time vs. distance plane. Details here http://statisfaction.wordpress.com/2011/11/16/power-laws-choose-your-x-and-y-variables-carefully/
Tom on November 18, 2011 7:25 AM at 7:25 am said:

What about looking at the relationship between time, distance, speed, gender and age? For instance, the NYC Marathon posts the best times by age and gender. There must be similar sites for other events, maybe the Wikipedia site posted above has it. Where is the kink in the performance curve by age? How different are the best times that humans can produce by age by event? How different are men and women by age?
Daniele on November 24, 2011 5:42 AM at 5:42 am said:

This may be of some interest to test cycling and power:
http://connect.garmin.com/activity/116268248
Sandra Savaglio on December 15, 2011 10:18 AM at 10:18 am said:

Here some more analysis we have done, considering longer distances and gender differences:

http://www.mpe.mpg.de/~savaglio/Sports%20Science_files/scaling_law.pdf

Sandra

Comments are closed.