Skip to content
 

Dow Jones probability calculation

Here’s a cute one for your intro probability class.

Karen Langley from the Wall Street Journal asks:

What is the probability of the Dow Jones Industrial Average closing unchanged from the day before, as it did yesterday?

To answer this question we need to know two things:
1. How much does the Dow Jones average typically vary from day to day?
2. How precisely is the average recorded?

I have no idea about item #1, so I asked Langley and she said that it might go up or down by 50 in a given day. So let’s say that’s the range, from -50 to +50, then the probability is approximately 1/100 of there being no change, rounded to the nearest integer.

For item #2, I googled and found this news article that implies that the Dow Jones average is rounded to 2 decimal places (e.g., 27,691.49).

So then the probability of the Dow being unchanged to 2 decimal places is approximately 1/10000.

That’s a quick calculation. To do better, we’d want a better distribution of the day-to-day change. We could compute some quantiles and then fit a normal density—or even just compute the proportion of day-to-day changes that are in the range [-10, 10] and divide that by 1000. It should be pretty easy to get this number.

Yet another complexity is that there’s a small number of stocks in the Dow Jones average, so it might be that not all prices are even possible. I don’t think that’s an issue, as it seems that each individual stock has a price down to the nearest cent, but maybe there’s something I’m missing here.

Another way to attack the problem is purely empirically. According to this link, the Dow being unchanged is a “once-in-a-decade event.” A year has approximately 250 business days, hence if it’s truly once in a decade, that’s a probability of 1/2500. In that case, my 1/10000 estimate is way too low. On the other hand, given that prices have been rising, the probability of an exact tie should be declining. So even if the probability was 1/2500 ten years ago, it could be lower now. Also, an approximate ten-year gap does not give a very precise estimate of the probability. All these numbers are in the same order of magnitude.

Anyway, this is a good example to demonstrate the empirical calculations of probabilities, similar to some of the examples in chapter 1 of BDA but with more detail. I prefer estimating the probability of a tied election, or maybe some sports example, but if you like stocks, you can take this one.

27 Comments

  1. John Hall says:

    The day-to-day differences in the DJIA have a much different volatility over time than to day-to-day percent changes in the DJIA. Typically what you would do is calculate the volatility from the log changes and then map that to changes in levels. Ignoring the fact that the log changes in the DJIA do not exhibit constant volatility over time, the changes in levels are on another level. So the 50 might work today, but not in the past or in the future.

    Here’s something that may also be an issue, but I’m not 100%. The DJIA is calculated as the sum of the prices of the underlying index components. Presumably, they are doing the rounding on these stocks first to get the closing prices and then they are aggregating up to the final index calculation. This implies that you actually need to estimate the multivariate distribution of those prices first.

    • Andrew says:

      John:

      1. Yes, the 50 is just a guess. I’d recommend computing some simple measure of volatility, for example the range of the central 1/3 of the probability distribution of day-to-day changes in the averages, each year, and then plotting these over time to get a sense of this. It should not be hard to get a reasonable empirical probability here.

      2. I discuss the discreteness issue in the 3rd-to-last paragraph of the above post. I agree that if they’re rounding the individual stock prices to the nearest dollar (rather than the nearest cent), this could restrict the possible fractional cents that can happen. It would be easy to check this by simply looking at past Dow Jones average numbers and seeing if all hundred values of the cents are possible. In any case, there’d be no need to estimate the multivariate distribution; you can just treat the distribution of the averages as an empirical quantity.

      • John Hall says:

        Andrew, thanks for the reply. On 1, the raw data in financial time series have distributions that change over time. Even the distribution of price changes in years 1950-1975 will look very different from the distribution of price changes in 2000-2010. The simplest approximation of stock prices is geometric brownian motion. The volatility depends on the level of the stock price. Atillio Meucci puts it as focusing on the invariants or focusing on distributions of things that are constant over time. Then convert that to the market price you care about. Much easier.

  2. Jonathan (another one) says:

    The current dow is the sum of the prices of the 30 components (each of which is reported to the nearest penny) divided by 0.14744568353097, the so-called Dow divisor. The divisor changes every time a new stock enters the index (replacing and old one) or a stock in the index splits and is recalibrated to give continuity on the day of the change. Note then that 1 one penny change in any component price causes about .07 change in the index. So the only way that you can get no change is for the sum of prices on one day to be exactly the same as the sum of prices on the next day, irrespective of the fractional reporting of the index. Now the stocks are of very different nominal sizes, and their volatilities are different as well, so the probability of any given sized move is different for each stock. But without measuring any volatility, the fact that every price is quoted to the nearest penny means that the sum of 30 changes can always be zero.

    • I remember the good old days (about 1998) when I was writing a parser for the S&P Comstock ticker feed (delivered over a custom DSL line I think) and all the prices were quoted in binary fractions, 1/4, 1/8, 1/16 etc

      there were a bunch of special symbols for each possible fraction so as to compress the feed by using fewer bytes than it would take to write out things like 7/16, if I remember correctly.

      Sad to think that we had dual redundant Sun workstations dedicated to parsing and storing this feed and it could just barely keep up in real time. Pretty sure a Raspberry Pi could do it in the background while decoding a video today.

  3. I like the way this post approaches the problem from a variety of angles and doesn’t get hung up on excessive precision. Sometimes it’s necessary to do back of the envelope calculations and get order of magnitude estimates. It’s surprisingly valuable.

  4. zbicyclist says:

    I downloaded DJIA data for 8771 days (since 1/29/1985) from Yahoo finance. This has occurred 18 times. This makes the probability .002, or 1/500. The data series has two decimals all the way back.

    4/22/1985
    10/7/1986
    6/17/1987
    3/14/1989
    7/17/1990
    12/19/1990
    7/19/1991
    9/5/1991
    3/12/1992
    1/18/1994
    7/21/1995
    8/8/1995
    12/2/1996
    11/3/1998
    12/24/2001
    4/24/2014
    11/27/2015
    11/12/2019

    • hmmm… about once every year from 1985 to 2001 and then a 12.5 year gap from 12/24/2001 to 4/24/2014

      Obviously as the index increases exponentially while the precision stays to 2 decimal places, the prices are quoted to an increasing relative precision. You expect the events to become less common, but a sudden gap of that length seems suspicious. Not out of the question, but suspicious.

      Now I’m going to simulate a geometric brownian motion and see what the distribution of events looks like…

      • library(ggplot2)

        ## how often does a stock index close the same when rounded to 2
        ## decimal places? Start with a geometric brownian motion type
        ## simulation, round the data and see what happens over the long haul

        initprice=100.0
        logvol = log(1+100/20000) # about
        tdays = 5/7*365 #about
        growth = .12 #per year
        lx0 = log(initprice)
        lx = rnorm(round(tdays*400),0,logvol)
        drift = log(1+growth/tdays)
        lxsum = cumsum(c(lx0,drift+lx))
        closes = exp(lxsum)
        dat=data.frame(t=1:NROW(closes),close=closes)
        ggplot(data=dat,aes(t/tdays,closes))+geom_line()
        rcloses = round(closes,digits=2)
        rclosediff = diff(rcloses)

        whichtimes = which(rclosediff==0)

        ggplot(data=data.frame(x=diff(whichtimes)),aes(x/tdays))+geom_density()+labs(“Distribution of times between unchanged days”,x=”Time in Days/TradingDaysPerYear”)

        So from that density plot at the end, it looks like you’d expect around every 1 to 5 years, and 12+ year gap is a real outlier!

    • A different point of view says:

      My hypothesis would be that the lower values of the stocks in the past are prone to lesser deviations. That would mean that for the same percentage change, the deviation of the stock price is closer to its value in the past. Whether stocks obey changes based on percentages or numbers is purely conjectural, though. I suspect it is an interplay of both. This may be the reason this phenomenon has occurred more often in the past.

      Market psychologists would have studied this phenomenon: Whether the markets look for “good numbers” or “good percentages” when making choices.

      As a trader I have looked for good numbers in the past, but, ironically, an historical analysis is always based on the percentage changes. Whether what rules markets obey is open to question…

    • A different point of view says:

      I would say that there is a lesser range of daily prices in the past (as price points are lower). Lower numbers of the index, and the lower ranges, in the past would imply that there are a fewer numbers for traders to “hit”. Hence, perhaps that is why in this case the event has happened with more frequency in the past.

    • David says:

      This is interesting.

      I would imagine that the chance of this event happening would increase as you go further back in time.

      I would reason as follows:

      In the long-term the values of stocks increase.

      In the past we are dealing with smaller numbers. If these numbers move by a certain percentage daily, or whatever the probability distribution of the daily returns may be, since the set of the sample space of numbers is smaller, the chance of selecting a number from that set is greater (generally speaking). That would imply that the chance of selecting the same opening and closing price would be higher, generally speaking, if there are a fewer numbers in the set. Please notice that I say generally speaking because the actual situation is more complex.
      Therefore, if we were to draw a trend line over time of this event happening it would have a negative slope.

    • David says:

      Looking at this data, it is curious that it happened 5 times between 1985 and 1989. This means the index remained unchanged around 13 times between 1990 and 1999. The ratio in the latter half of 80s is about 1 to 1, while in the 1990s it is about 1.3:1.

      This sounds contrary to the hypothesis that, generally speaking, flat days would become uncommon as the values of the stocks increase.

      But, I suppose, there is a second factor operating here. In the 1980s, the stock market increased around 4 times. This was a period of great volatility in the stock market. If the average daily volatility is higher, or much higher in this case, the chance of having a flat day would decrease (since the sample set of the closing prices would be relatively more despite the lower numbers of opening prices) . The 1990s were a relatively less volatile period for the stock markets.

      This paper says that the 80s were a period of great volatility.

      https://www.jstor.org/stable/2632471?seq=1#page_scan_tab_contents

      It would be interesting to go back further in time and see if the regression line indeed has a negative slope.

      Another important point to note is that there was a recession in 2001 followed by the great recession of 2007. The latter lasted for years and the stock prices were very volatile during the early part, and the latter part, of the first decade of the 21st century. I suggest the higher volatility precluded the possibility of a flat day between 2001 and 2014.

    • David says:

      This is interesting:

      https://www.fool.com/investing/general/2016/02/22/is-todays-market-more-volatile-than-in-the-past.aspx

      The gap in the first decade of the 21st century can be explained by the fact that the daily volatility was very high in the 2000s.

  5. zbicyclist says:

    Note that until April 9, 2001, prices for individual stocks were quoted in fractions (sixteenths), as Daniel Lakeland alluded to above. At that point, stocks changed to decimal pricing. I note that 14 of the instances above occurred in the 26 years 1985-2000 and only 4 in the 19 years 2001-2019.

    That 11/27/2015 date I listed above looks suspicious to me. Not only is the adjusted close the same as the previous day (which is 11/25/2015, due to Thanksgiving), the volume is the same. (The Open, High, and Low are different, but it doesn’t make sense that both the closing and the volume would be the same.

  6. zbicyclist says:

    Although all the data prior to April 9, 2001 have a decimal close, there are definite spikes at .00, .10, .20 etc which occur about 2% of the time, roughly twice as often as the expected 1%. There’s no such blindingly obvious pattern since.

    Enough. Signing off now.

  7. As the Dow index goes up in value, the odds of an unchanged close go down. The odds also decreased markedly when stocks went to penny increments (as noted above, in 2001). Looking at zbicyclist’s list, we see 15 zero-change days in the first 20 years or so, and just 3 in the last 15 years (because of a gap in the event, it is 3 for any period from 6 years to almost 18 years).

    There are an average of 253 trading days per year. So at 15 years we have seen about one in 3800 days.

    The standard deviation of daily Dow changes is way more than 50: It is 242.1 for the last year.
    The distribution of daily changes looks fairly normal in a histogram, although it is somewhat left-skewed and has positive kurtosis.

    As noted above (by Jonathan), the index moves in increments of about .07.

    Calculating in either of two ways (below), we have an expected probability of 0.000115, or one in 8670.

    SD = sd(df$Change)

    > 0.5 – pnorm(.07, mean=0, sd=SD, lower.tail=FALSE)
    [1] 0.0001153502
    > dnorm(0, mean=0, sd=SD) * .07
    [1] 0.0001153502

    So we have seen these events at about double the expected (from SD) rate. This may be random chance, or it may be due to a modest tendency to have rounded closing prices (to .1 or .05 instead of .01).

    • My simulation of a geometric brownian motion rounded to 2 digits suggests that you should get this condition every few years or so (averaged over 400 years of simulation including a 12% annual growth rate). I reran it with different volatility similar in size to your 242.1 here. The 12 year gap in the data by zbicyclist is *very weird* under this class of model. Do you have any idea why it occurred?

    • Carlos Ungil says:

      I agree with this analysis the most. Ignoring pre-2001 levels (i.e. before the underlying stocks traded in cents) the 12-m rolling standard deviation of daily changes in the index varies between 63 and 286 points (and is now in the high end of the range). The distribution is approximately normal and while it’s not centered the bias is a fraction of the standard deviation (using 12-month rolling statistics again the drift is between -22 and +26 points per day, between -0.12 and +0.27 times the standard deviation) so we can assume that the probability of the daily change is more or less constant in the region of interest (well within a 5% tolerance).

      For the 63 or 286 standard deviation bounds above we expect the daily change to be within a 1 point interval around 0 with probability 0.63% or 0.14%. This is lower than the 1% assumed by Andrew (by a factor of 6 using the volatility over the last year). However, his second error brings him close to the right solution! Not all the cents between -0.50 and 0.50 are equally likely. The index divisor has ranged between 0.12 and 0.16, which means that the minimum movement in the index has ranged between 0.06 and 0.08 points. Taking 0.07 there are 14 possible values in the 1 point interval so the probability of a flat day is in the 0.01% – 0.05% region. Once every 9-38 years. Interestingly, my range goes from Andrew’s estimate to the once-in-a-decade estimate.

      Looking at the actual changes since 2001, there have been 3 flat days. This is a relatively high count, because there have been only a few “one-increment-away” days (3 below [-7,-7,-8] but 0 above). And there was only 1 “two-increments-away” day (+0.15). In total 36 days between -1 and +1 points out of around 4900 days, giving as probability of being in a 1 point interval 0.36% which is quite in line with the 0.14% – 0.63% estimate above. We would expect then only one of those 36 days to be flat (36 occurrences in a 2-point interval / 28 steps in that inverval), but three is not so unlikely. (I agree that the prices of the constituents may have a bias towards rounded figures, but I think the effect will be marginal.)

    • I made a simple error in my long post above. There would have been 3 unchanged closes in the last 3800 days, not 1 in 3800 days.

      Given that the 11/27/2015 unchanged close in the Yahoo Finance data was apparently faulty, we now have 2 unchanged closes in 3800 days, or 3 unchanged closes from 4/9/2001 through 11/13/2001 (this seems the least arbitrary period), so 3 in 4680 days, or 1 in 1560.

      The Dow Divisor was 0.144521 on 1/1/2002 and 0.147445 recently, so it hasn’t changed much, and I will ignore this change. But point volatility has tended to go up with share prices. While there was an October 2008 peak of volatility, most of the period before and since has had lower volatility than has the past 12 months. The standard deviation of daily change for the whole 4/9/2001 through 11/13/2001 period has been 142.

      As the index moves in increments of about .07, we have an expected probability of an unchanged day of 0.000196, or one in 5087 days:

      > SD = sd(df$Change)
      > SD
      [1] 142.0487
      > dnorm(0, mean=0, sd=SD) * .07
      [1] 0.0001965943
      > 1 / (dnorm(0, mean=0, sd=SD) * .07)
      [1] 5086.617

      Given a Poisson distribution where each day has a 1 / 5087 chance of unchanged close, the odds of having at least 3 unchanged closes in 4680 days is just 6.6%:

      > 1 – ppois(2, 4680/5087)
      [1] 0.06618886

      So either we have a somewhat unlikely number of unchanged closes, or there is some stickiness factor or tendency for stocks to more closely correlate when the market is approximately flat.

  8. zbicyclist says:

    Sometimes the longer you look at data, the more curious it seems (Alice would say “curiouser and curiouser”).

    This is the distribution of DJIA changes that are near zero (to two decimal places). Note there’s a spike at exactly 0 (18 cases), but no observations nearby, and certainly no spikes.

    (0.22) 1
    (0.19) 1
    (0.19) 2
    (0.18) 1
    (0.18) 2
    (0.14) 1
    (0.12) 1
    (0.11) 2
    (0.08) 1
    (0.07) 1
    (0.07) 1
    0.00 18
    0.12 1
    0.14 1
    0.15 1
    0.17 1
    0.19 1
    0.22 3

    That oddity will be tough to model :)

    • Is there also a spike at 0.5 and 0.25 ? If you graph the size of these near-zero differences through time, are they different in different eras, like before the switch to decimal stock prices in 2001 vs after?

      I wonder if this isn’t really a data quality issue in terms of Yahoo’s source data, which might have rounded to the nearest integer or half-integer at some point for example.

      • zbicyclist says:

        Daniel, you are on to something. I didn’t find a spike at .25, but not the spikes at -3.50, +.50 +3.75, +9.50 and 15.00.

        Only changes repeated 4 or more times are shown below. Blog comments are really not idea for showing data.

        Note also that these clusters are much more likely to happen during the fractional price era (1/16th, etc.).

        Decimal pricing Fractional pricing Total
        (4.71) 0 5 5
        (3.50) 0 4 4
        (1.49) 0 4 4
        0.00 4 14 18
        0.28 0 4 4
        0.50 0 6 6
        3.75 0 6 6
        9.50 0 4 4
        15.00 1 3 4

    • Terry says:

      Was the stock exchange closed on some of the zero-change days? Perhaps the database reported the previous day’s closing prices for the closed day.

      The NYSE has had a fair number of unplanned closures, e.g., on July 14, 1977, it was closed due to a blackout in New York City.

      Here is a historical list of closures: https://s3.amazonaws.com/armstrongeconomics-wp/2013/07/NYSE-Closings.pdf

  9. Jonathan (another one) says:

    Langley published her article, without Andrew’s help. https://www.wsj.com/articles/when-the-dow-closed-at-27691-4854488934just-like-the-day-before-11573748858?mod=hp_featst_pos3 Salil Mehta did what I probably would have done, using volatility estimates for each of the 30 components, and comes up with an estimate of 1/2000, but admits that that’s an approximation as well. I did like the line: “Today the Dow finished sharply unchanged.”

    • Carlos Ungil says:

      “Salil Mehta, an independent statistical and risk consultant, analyzed each Dow component’s daily volatility as of late to evaluate the odds of an unchanged day. Using that method, he estimates the odds of such an event at less than 1 in 2,000 any given day, though he cautioned that arriving at the true probability is more complicated.”

      That’s all the explanation given in the article. I dont’t think looking at the 30 correlated volatilities and using them to calculate the volatility of the sum adds anything to the much simpler analysis on the index changes when the divisor is properly accounted for. The 1/2000 probability seems a bit to high to me, but to be fair he says « less than » and it is not bad as an upper bound.

Leave a Reply