More on martingale property of probabilistic forecasts and some other issues with our election model

Edward Yu writes:

I’m wondering if you’ve seen Nassim Taleb’s article arguing that we should price election forecasts as binary options. You seem to be generally fine with this approach, as when Nate Silver asked your colleague:

On the off-chance our respective employers would allow it, which they almost certainly wouldn’t in my case, could I get some Trump wins the popular vote action from you at 100:1? Or even say 40:1?

You did not critique the assumption that your predictions could be interpreted as betting odds, but instead your reply was:

I wouldn’t be inclined to bet 99-1 on Biden winning the national vote, and apparently Elliott wouldn’t either?

But OK, if we’re interpreting your model’s predictions as options or betting odds, then Taleb’s critique makes sense and we must factor in the possibility of game changers occuring in the period between your prediction date and the election. I argue that you should either document that your predictions are not interpretable as betting odds but instead answer the question “Who would win a hypothetical election held today?”, or adjust the model to account for future uncertainty.

My [Yu’s] full thoughts are in an article here.

My reply:

My colleagues and I discussed some of these issues in sections 1.6 and 2.6 of this article.

Regarding Nate: I have no reason to think that Nate is “clueless” about probability, but it’s my impression that Nate is following journalistic, rather than academic rules, in the sense that, when a problem is revealed with his forecast, his inclination seems to be to ignore it rather than investigate it deeply.

Regarding your final point: No, our model is very explicitly a forecast of public opinion on the day of the election. We do address the question of opinion today, but our main forecast is of opinion on election day itself.

Relatedly, commenter Fogpine points out some problems with our model when fit to past elections: apparently the tails of our state-level predictions are too narrow, in that some events with stated 0.001 probability have been happening. I can believe it. In evaluating our model on past and current data, we’ve been looking a lot at national electoral college and popular vote predictions but not so much on tail probabilities for individual states. When tweaking the model a few months ago, we tried some longer-tailed distributions but they didn’t do much to the forecast of who would win nationally or by state, so we didn’t pursue it. But in retrospect, our evaluations were too focused on winners and losers, and we weren’t fully making use of information about the continuous vote outcomes. I don’t think we have any plans to change the model before the election, but it makes sense for us to go back and look more carefully at these extreme events. We wouldn’t go so far as put Trump winning California within “the range of scenarios our model thinks is possible,” but some slightly wider tails would make sense.

124 thoughts on “More on martingale property of probabilistic forecasts and some other issues with our election model

  1. No real comment on the merits of your model vs. Silver’s, but in case you’re wondering if Taleb’s criticism has any substance, you can stop wondering. His argument is incoherent nonsense, as I explain in my response paper here:
    https://arxiv.org/pdf/1907.01576.pdf

    The main points:
    1) Nothing is gained by thinking of forecast probabilities as binary option prices, and doing so only confuses the issue. For example, as you mention in your paper, forecast probabilities must be martingales if they’re calculated with Bayesian updating. Option prices in the real-world are not martingales because of the existence of risk premia. He uses a martingale model in the risk-neutral measure to price the options (standard financial math technique) but needn’t have bothered.
    2) His intuition that option prices/forecasts should price in more future volatility than they typically do may have some basis in fact, but he claims that if forecasts did so they would, counterintuitively, be less volatile. So the fact that, say, 538’s forecast bounce around a bunch means they’re not properly accounting for future volatility. This intuition is wrong. If the underlying “asset,” meaning public opinion were more volatile, and the prices correctly anticipated the future volatility, the effects would actually cancel. Bottom line: there’s no way to tell if 538’s forecasts include too much or too little volatility.
    3) His choice of pricing model is completely arbitrary and silly, chosen out of thin air because it produced a bounded process for which he knew how to compute prices. Absolutely no justification is given for why it’s more reasonable than any other model, and in fact it produces the same dynamic behavior as the 538 forecasts.

    • Aubrey – we’ve responded to you in QF (https://www.tandfonline.com/doi/abs/10.1080/14697688.2019.1639803) before, but I did want to dig into this statement.

      “there’s no way to tell if 538’s forecasts include too much or too little volatility.”

      That was exactly the volatility trade in our response. There seem to be a few issues in the language we use – so we let me try a statisticians. We seem to disagree on ex-ante analyses of a forecast but you seem to also say there are no ex-post tests for martingality of a time series?

      • Yes, I was suggesting we can simply check FiveThirtyEight’s time series of predictions for martingality. One easy check: is the difference sequence stationary?

        • What does it even mean for a vote prediction to be a martingale. We have exactly one observation of the vote… so what are we observing? the prediction itself? but the prediction isn’t a number it’s a posterior distribution. If you’re publishing the expected value say of the vote split, then obviously these are all expected values, but they’re expected values from a different measure each time (since we have new information each time)

          I think this is a confusion between Frequentist/mathematical probabilists and applied Bayesian modelers… Frequentist are always thinking about repeated realizations… in this case there is exactly one election… so what are we “realizing”

        • The probability of any one candidate (of two lets say) winning time series is what should be a martingale. I think Aubrey can even agree to that! That time series can then be analyzed for martingality independent of the final outcome.

        • So if someone has a Bayesian model, and always publishes the probability of the candidate to win under this model, then the published value is always the expected value of the WIN variable under the current measure… so it’s trivially a martingale with respect to that measure.

          martingality isn’t a property of observed predictions it’s a property of the measure… different people can have different measures based on different models. Nate silver’s predictions can be a martingale with respect to Nate’s model while being not a martingale with respect to Andrews model. you can’t test a sequence of predictions to see if they are a martingale

        • I believe it’s acknowledged that well formed bayesian models are trivially martingales, I think the real contention is that many forecasts aren’t really doing bayesian updating. Examples of naive strategies that don’t produce martingales which I expect people are following are:

          * Regressing historical elections vs polls, GDP, and unemployment, then plugging in those numbers over the course of the campaign season
          * Forecasting current public opinion using polls + MrP to produce a “NowCast” for a hypothetical election today, then applying a shrinkage parameter towards 0.5 to account for uncertainty leading up to November

          In that respect I think the martingale criticism is great! It shows clearly why these strategies are bad in a way that gets down to decision-theoretic fundamentals, it’s grounded in the basic philosophy of interpreting probabilities. Once people start evaluating forecasts, there’s a kind of probability bullshitting people can do along the lines of “well, 1 percent isn’t zero, so I’m not technically wrong” that this cuts through.

          What annoys me is that neither paper will provide a concrete example of which bad, non martingale forecasts they’re talking about. They’re clearly related to some twitter fights about 538, but then in the paper they say things like “There is no mention of FiveThirtyEight in Taleb (2018),and Clayton must be confusing scientific papers with Twitter debates.” I guess maybe because we don’t actually know precisely what 538 is doing, and can only infer whether or not their methodology based on the forecast outputs, they feel the need to carefully circumscribe their claims? But here we are, clearly talking about Nathaniel again. Is this about 538 or not?

        • Somebody:

          Fivethirtyeight’s forecast is clearly not using Bayesian updating so we would not expect it to follow the martingale property. Whether a forecast can be useful as betting odds if it does not follow the martingale property, that’s another question. Even a fully Bayesian forecast won’t be perfect, as it’s only conditional on the information it includes. A major news event won’t change the forecast until it gets reflected in the polls. So we’re always dealing with degrees of imperfection.

        • @somebody, but then what does it mean

          “One easy check: is the difference sequence stationary?”

          This isn’t a necessary property of Bayesian updating.

          The assumptions here are much stronger: that there’s some kind of “predictor die” that 538 is rolling and it gives IID (stationary) increments each time.

          The point isn’t predicting the future prediction from the past predictions, it’s predicting the future VOTE from all available information today.

        • @Andrew

          Right, but it does seem like Nathaniel in 2020 has smoothed it over with a cubic temporal drift uncertainty. That seems fine and comparable to other strategies that have been discussed (linear increase in variance over time, constrained random walk) but it does seem like the motivating factor for that, and for the original Taleb paper, is the large and obvious periodicity in the 2016 forecast.

          Except, I don’t know if that’s the real rationale. Nathaniel’s posts and twitter seem to be actively dismissing that anything was wrong with the 2016 model and take the position that “we’re doing extra work because this one is extra important.” Taleb and Madeka said explicitly that Nathaniel wasn’t the subject, but then keep talking about him anyways. Including rather than obfuscating the motivating examples makes the math much easier to follow–I like to try and follow the line of reasoning back to the observables as a sanity check. I’m not exactly a fan of dry academic passive-aggressive obscurantism either, but I feel like all the bluster has made this conversation hard to follow.

          @Daniel
          Ed cited this
          http://www.planchet.net/EXT/ISFA/1226.nsf/9c8e3fd4d8874d60c1257052003eced6/35822efeb009804cc1257afe006b0063/$FILE/11park.pdf
          which I haven’t read. My quantitative finance is pretty weak, but my best guess is they’re computing the equivalent of a test statistic, something like the probability of realizing some square magnitude of probability drift over time given that the generating sequence is a martingale

        • I think the biggest problem is failure to acknowledge that tomorrows prediction of win probability is not just dependent on all the “observed predictions so far”. In fact, it’s not dependent on the observed predictions so far *at all*. It’s dependent on the full information set of the predictors so far.

          Gelman and Nate and soforth are not predicting the future values of the predictors (ie. what will be the polling tomorrow) though they may have some predictions for the fundamentals.

          This is in contrast to the typical situation with say stock prices, where the theory often just says that the future log stock price will be the current log stock price plus a normally distributed random number with mean 0.

          Anyone with “real” information about the stock (like information about a new product they’ll be releasing soon, or information about fraud committed by the CEO or something) will NOT have martingale predictions for future stock prices, and SHOULD NOT.

        • I should have said, the informed person’s predictions are not any martingale given *only* the history of the actual prices, they require the inside information to make them a martingale.

          And if the inside information is hidden and unobserved and varying, then there’s no way for an external person to verify whether the informed person’s predictions are martingales with respect to their measure or not.

        • > I think the biggest problem is failure to acknowledge that tomorrows prediction of win probability is not just dependent on all the “observed predictions so far”. In fact, it’s not dependent on the observed predictions so far *at all*. It’s dependent on the full information set of the predictors so far.

          The point isn’t to evaluate predictive performance of past predictors on future predictors, but rather that predictability in the sequence of predictors indicates failing to notice a trend in the full information set. In the language familiar to this blog, if things don’t look right and you change your prior distributions, it just means that you had prior information that wasn’t incorporated rich the first time. In this case, not looking right means oscillations and the prior is the structure of the model and estimation procedure itself.

          The point of interest here is that it was already knowable from data that polling at 80% probability 10 months out isn’t more informative than polling at 60% probability 1 month out and according to Taleb, many unnamed pollsters were not correcting for that foreknowledge. The argument I think* is based on the large sinusoidal oscillations in 538 predictions over time, which should be unlikely with a well specified model of all uncertainty. I think an equivalent bayesian argument can be invoked without betting if you presuppose that national opinion *does* drift stochastically but use a “nowcast” without any drift model, you see that you get underestimated variance. The arbitrage conditions and martingale arguments about the sequence of forecast realizations are ways of showing how underestimated uncertainty violates rational probabilistic reasoning in terms of Savage-style philosophical foundations that are intuitive to them as quants—dollars and cents.

        • +1 to Dan Lakeland’s comments, especially ‘martingality isn’t a property of observed predictions it’s a property of the measure… different people can have different measures based on different models.’

        • > you can’t test a sequence of predictions to see if they are a martingale

          However, can do a test to see if these predictions come from a model of the world that looks compatible with observed reality, in the same way that you can test a set of predictions to see how accurate they are in aggregate. For independent forecasts, there are statistical measures that can be used to check if the model is well calibrated.

          For a sequence of predictions, we know it is expected to behave in some way if the model is right. We can do a statistical test to check how well the sequence of predictions behaves. For example, if my prediction jumps every few days from 1% to 99% and back it may be a martingale when expectations are calculated using my model but my model looks wrong and doesn’t seem to reflect real-world expectations.

        • They’re saying that the sequence of predictions themselves should satisfy the martingale property since you can buy and sell bets at various points in time using prediction odds. If the sequence of predictions is itself predictably swingy, which they believe Nathaniel’s model is, you can arbitrage it.

          To construct a faux example with the undesirable properties, suppose I make a prediction every day of the probability of Biden winning for 100 days before the election, and the prediction oscillates as a bounded dampened sinusoid to 0.6 for Biden on the day before the election. Even if my predictions converge to a good, calibrated probability that perform well in backtests, a betting man can arbitrage it. When my predictions reach 99% 100 days out, a trader will know that my prediction will almost certainly go back down, so they can buy a large bet against Biden priced using my odds really early on, then wait for a quarter period to sell the bet for a profit.

        • This is reasonable, but a strawman.

          Yes, 99% 100 days out under the election circumstances we’ve observed would surely look arbitrageable. Now if we’re talking Putin or Hussein’s election, not so much (if anything, it would’ve been arbitrageable in the other direction).

          But 538 never predicted 99% at 100 days out. In fact, they never went above 85% for Clinton. And they never even put it as high as 80% at 100+ days out.

          Does 2016’s periodicity seem unlikely for a non-arbitrageable model? Yes. Does it seem impossible? No, of course not. I don’t have the statistical tools in my toolbox to test the likelihood that a non-arbitrageable model could have produced 538’s prediction set — in fact, I don’t even know enough stats to say whether such a tool exists.

          Taleb makes the similar strawman argument in this video: https://youtu.be/YRvPF__du9w , at one point discussing how a 0% prediction for one side would be clearly arbitrageable. Well, no kidding — but no one is making that sort of prediction.

        • If there’s a probability model used anywhere at all in a prediction scheme and if it is a model that has *any* grounding in the world at all; then it must be grounded somehow or other to someone’s choice (and model thereof) of a class(s) of relevant evidence. To the extent that the “reference class” is described in probabilistic terms, the probability nature of the reference class propagates downstream into the predictions made conditional upon that class. The “repeated realizations” are implicit in any probabilistic model to the the extent that the model really does stand upon some reference class or states-of-the-world which is posited to contain the evidence supporting the model. My “beliefs” about the matter at hand — if they are at all rational — indeed must be anchored in my experience (first, second, third hand … etc); and my experience — if it be robust enough on which to support a useful model — must comprise histories or collections of events, phenomena, situations. I.e. classes of evidence used ceteris-paribus. That *classes” are at bottom the root of rationally devised probability model — irrespective of whether one prefers the adjective Bayesian or not — is the reason why “repeated realizations” cannot be swept away. Probabilistic modeling is forever supervened under the larger head of “induction”: that is learning from experience.

        • That’s all fine, but it doesn’t change the fact that a sequence of predictions can be trivially a martingale with respect to one measure, and not a martingale with respect to another.

          You can’t look at a sequence of numbers and determine “if it is a martingale”

        • The polls are mean reverting (see here) but the forecast should not be. The expected mean reversion should be accounted for in the forecast (which is what Bayesian inference does).

        • I think most of my confusion comes from the difference between a Bayesian Martingale concept and a Frequentist martingale concept.

          Suppose I have a presidential election model. It includes in the model a piece of space junk which I am taking very careful measurements of. I have every reason to believe it will hit the earth in North America Oct 25th say and wipe a couple states off the map. Of course as an evil genius I have decided to hang out in my lair in Australia and get rich betting on prediction markets…

          now each morning I have a supercomputer cluster projecting the trajectory forward in time and figuring out which states will be obliterated. the orbit of this junk is chaotic, and so my estimate of win probability is quite volatile.However, according to my probability measure, the probability to win is always the expected value **under my measure** of the variable “Trump wins the electoral college”.

          Unbeknownst to me, Vladimir Putin in his Bat Cave has been rigging election computers so that key states who have still not got auditable paper trails will fall to his intelligence asset Trump… furthermore he knows about the space junk and brings much more serious radiotelescope measurements to bear. To him his prediction is Trump wins 100% every day til the election

          Now, do you deny that my predictions are a martingale, since they are based on the expected value of the future event under my measure?

          Do you deny that Putin, who knows the outcome with certainty, can arbitrage me, and everyone else?

        • Daniel, it seems to me that Taleb and Madeka are getting a lot of mileage out of a mean-reversion intuition or assumption or result. I can’t quite tell which it is, tbh. Andrew appears to disagree and locates the mean-reversion in polls (i.e. the conditioned on predictors) rather than the forecasts per se.

          What’s interesting is how stable this year has been by contrast to 2016. For instance Biden/Trump started close to 50-50 in Gelman model, and then diverged north of 75/25 and haven’t been below since.
          https://projects.economist.com/us-2020-forecast/president

          My Q is: having just seen the April-> June/July movement, would the mean-reversion intuition have applied? Would that imply, according to Taleb and Madeka, an arbitrage opportunity against Gelman model because it was deviating unreasonably from 50-50? To me it seems that if you had placed such bets, you would have lost big, because in fact the probability went up even a little higher and has turned out to be remarkably stable and non-volatile.

          Anyhow, I agree that martingality as a property only makes sense to me conditional on a model, which in turn could very well differ between different observers based on understanding and information. To the extent that Gelman “believes in” this model, these predictions have to be trivially a martingale.

        • Chris,

          In my opinion, the Martingality results are only really of interest when there’s a single fundamental shared physical aspect of the uncertainty in the thing being predicted.

          For example, I have a cryptographically strong random number generator, and every day I output a new value which is the previous days value plus a random normally distributed increment generated using the crypto-RNG.

          Now, we stipulate that there’s no one who has hacked my crypto RNG and knows its internal state… Therefore, all people are *in the same state of information* which is that all anyone in the whole world knows is that tomorrows number will be today’s number plus a fundamentally un-predictable Normal(0,1) random number.

          In this context, there’s no question about “different probability measures” since the whole world has the same state of information.

          in finance, say with large exchange traded funds, this approximation often holds, since there are literally billions of participants and each one has a changing internal state, but no-one has access to even a small fraction of the information necessary to predict the future value of the ETF better than “a normal random increment”

          But with prediction of one-off real-world processes, where different people have different levels of understanding of the mechanisms of how polling works, and what kinds of shenanigans there are going on in the election… it’s totally legitimate to think that there are MANY different probability measures that legitimately could exist… If each one is Bayesian for example, then the predictions will all be the expected value of the probability for tomorrow… but these will be vastly different across the different participants.

          What makes each one a Martingale is that tomorrow’s prediction is the expected value of todays probability measure…

          I think it’s possible to argue that the meaning of Martingale is that *all the information that’s available is the past “price”* and if that’s the case, then the Bayesian model we’re talking about *is not a Martingale* because it relies on other information than its own history of predictions…

          I’d be ok with that, and maybe that really is the proper definition. In which case we can simply say: Bayesian predictions of future events based on polling and physical modeling and soforth *are not and should not be martingales*

          That’d be fine

        • For example, suppose there’s some “theorem” about martingales being non-arbitrageable …

          Now we take my scheme where someone uses a crypto RNG and outputs a new number every day for 100 days… you can place bets on the final price.

          Except this time, someone first sucked down the Crypto RNG data and wrote it to a disk… and a hacker has infiltrated this computer and read that file… The hacker now has 100% certainty about the entire path that will be output in the future. His probability measure is very different from everyone elses.

          So we have “everyone else” and “the hacker”.

          The hacker can clearly buy contracts when they are under the final price and sell them later when they are above the final price. The hacker has ZERO risk, and makes a guaranteed profit.

          And yet everyone agreed that the crypto RNG output was a martingale and couldn’t be arbitraged… So whatever that proof is it either assumes something like “there is fundamentally only a single ‘real’ source of randomness that is shared by everyone and no-one has knowledge of the internals of it” which is just *clearly wrong* in elections and in the crypto-rng with hacker case… or the theorem is wrong.

        • By the way Edward, I vastly disagree with this statement:

          “I believe Nate Silver is answering a subtly different question with his election forecasts. Each data point that Silver produces is answering the question: if the election were to happen today, what is the probability of each candidate winning? I argue that this is a valid and useful formulation. ”

          Thats just not true. Thats the *explicit* difference between 538’s nowcast and forecast in 2016. Im interested in his *forecast*, whether the nowcast is a feature to that or not is irrelevant. Its information, but its evaluating a true counterfactual

      • No, that is not what I’m saying. My point is that you cannot tell from how variable the 538 forecast probabilities were whether they included too little uncertainty about the future state of the vote share on election day, which acts as the underlying asset for option pricing. If the realized volatility of the polls matches the amount of volatility that’s baked into the forecasts, the two will cancel. Simple example: if the underlying asset followed a Brownian motion then halfway between the inception date and maturity the binary option price has a uniform distribution, regardless of the volatility. So it’s possible that 538’s forecasts embed just as much uncertainty about the future as has been observed in the past.

        Also, your argument is not that the 538 forecasts failed to be martingales; it was the stronger claim that they violated no-arbitrage constraints. As I say in the paper, any set of probability forecasts will form an F_t-martingale with respect to the filtration used to compute the expected values; this makes them automatically arbitrage-free unless you can argue that your probability measure and the forecaster’s are not equivalent (which you don’t).

        • Let me ask in a different way, if you have a bounded box (y is [0,1], x is [0, T]) do you believe the square of the box is unbounded?

        • I don’t know how you can argue from a single realized path of the 538 forecasts that the quadratic variation is unbounded, but good luck with all that. It’s not the argument from Taleb’s original paper, which doesn’t contain the phase “quadratic variation” at all.

          My point concerns the relationship between the QV of the forecast probabilities and that of the underlying process. Do yourself a favor and compute a simple example; maybe you’ll see what I mean:
          Y_t = \sigma * W_t, with W_t being standard Brownian Motion
          K = winning threshold at time T; forecast probability B(t,T) = P_t[Y_T > K]
          If, say, K = W_0 = 0, then B(t,T) doesn’t depend on \sigma, so has QV that doesn’t depend on how volatile the asset is. In other words, the paths of B(t,T) look just as volatile no matter much “uncertainty” you price into them, as long as you use that same uncertainty to generate the realized paths.

        • Im not really sure who you’re arguing with – Im trying to build up to an argument to our paper in QF 2019. You seem to think i need “luck” to argue a path-wise result? Do you not believe in those? Do you think that an ex-post test of martingality for a single series doesn’t exist?

          To your second point – ill “do myself a favor” (no idea why but ill trust your infinite wisdom) and ask doesn’t B(t,T) go to 0.5 as \sigma -> \infty *at each point*. (aware of your unconditional vs conditional argument).

        • (…a paper that also does not use the phrase “quadratic variation.”) I’m just saying if you want to argue that 538’s forecasts are not martingales, you’ve picked a funny way to go about it. “Having bounded quadratic variation” isn’t a property you can hope to establish by looking at a single path because the property of being bounded isn’t testable with a single sample from a distribution. “Having quadratic variation bounded by X” for some specific X would be more possible, as would “Having increments with mean 0,” but I don’t see that anywhere either.

          Maybe someday you’ll build up to an argument that makes sense, but I’m not holding my breath on that one. In the meantime, I just want to reassure Andrew and anyone doing election forecasting that they can safely ignore any Taleb-esque sermons about the essential lessons of quantitative finance, because there really are none that are applicable here. There’s only vaguely threatening rhetoric about “stark errors” and “arbitrage boundaries” without any clear specification of what those boundaries are, how to tell if some model has crossed them, and why that should matter.

        • do you really not think that \int^T_t=0 \sigma^2_t dt isnt the total qv over a period for that process? (bounded by the strategy to the right in the equation). Im not sure why “words” is important as opposed to “equation”. Its one way to test for martingality of a time series. There are many others (maximal inequalities, Vovk and Shafer’s book has a ton more). The general approach is tests for martingality – im not sure how thats not a “argument that makes sense”.

          Id say, if you’re willing to have a real conversation happy to. But so far, you’ve said “tests for martingality” dont exist and dont “make sense” and asked me to do myself a favor. Very poised!

        • Ok, I concede. You and your co-author have a secret test for martingality (which according to you is equivalent to being arbitrage-free), involving the quadratic variation in some yet-unspecified way, that you can use to determine if a set of forecasts crossed the “arbitrage boundary,” despite the fact that you can’t reveal where that boundary is. Can you help me out, though? I have this set of forecast probabilities and I can’t determine whether they’re arbitrageable. Is this a martingale?

          https://docs.google.com/spreadsheets/d/1u8tn_PULyZ9UotNPmGluzLTFj8sf1ZB0TnVQdXBmoDg/edit?usp=sharing

        • Yes, the famous “arbitrage by eyeball” test :). Surely there’s more to the method than just that, though, right? Surely these assessments are based on some precise criteria and not just on Taleb’s gut feel about what a martingale “looks like.”

        • Ahh, it’s rare for me to say this – but it’s a deep shame you seemed to have largely missed the argument. I had no idea that saying that there are events that occur with diminishing probability for martingales was so contentious. Good luck! I hope we can have a more productive discussion when you actually read the paper and the bounds.

        • Dhruv, I think it’s great that you are willing to participate on this board. One of the things that I most appreciate about Gelman’s blog is that I get to learn from some really smart people, especially when they disagree!
          Anyhow, from my point of view, the two most important objections to your paper are this:
          1. The point that Lakeland, myself and others have been making that the property of martingality/arbitrage-ability, in this context, is totally conditional on your model (including the goal of the model), and your set of information. Daniel has provided several colorful scenarios whereby a trivially martingale set of forecasts could be arbitraged by an agent with superior information. The property does not inhere only in a realized set of forecast probabilities!
          2. With that in mind, you and Aubrey seem to be talking past each other a bit about the issue of bounds. Ignoring the philosophical objection in #1, is there or is there not a specific quantitative test for martingality/arbitrage-ability for a realized time series of forecast probabilities in your view? (This is different than the general assertion that such bounds must exist). If yes, could you walk us through its application in Aubrey’s simulated fake time series of forecast probabilities? If you are only arguing that such bounds must exist in general, but there is no specific test, then how does this not just come down to eye-balling such a time series and going on intuition whether or not it has the properties you would like to see?

        • Dhruv, I think the contentious part at a Bayesian blog is the clear underlying assumption that there is only one “true” model and everyone knows what it is.

          the model y[i+1] = y[i] + Normal(0,1) is clearly a Martingale, yet if someone has information about what specific draw will come out of the Normal(0,1) process, they will arbitrage the heck out of you.

          The concept “martingality = non arbitrageable” can only hold if you assume something like “all participants have the same state of information about the future”

          Bayesians don’t believe that the world really actually is a RNG, the Martingality is located *inside their head* as a description of how much they know, not *in the world* as a description of the fundamental unpredictability of a process.

        • I’ve not followed the discussion closely but the all the talk about arbitraging seems to bring more confusion than insight.

          The martingale property comes from the basic fact that when you apply your probabilistic model today’s prediction for some future event is the average of tomorrow’s predictions over all the possible scenarios weighted according to their probability according to the model.

          You can calculate the distribution of some statistics (like the volatility in the sequence of predictions) according to the model and use a statistical test to see if it looks as expected. If it doesn’t it may suggest that your model is wrong but you may also be unlucky.

          Say for example that we predict the number of heads at the end of a sequence of ten flips. If our model is that the coin is fair, we would start predicting 5. If we get four heads in a row, our prediction would climb to 7 at that point. Such a big change in our prediction was unexpected. If we get four tails in a row then the prediction will drop again to 5. The was also unexpected. But it’s not a proof that our model is wrong (but it can strongly suggest it). It doesn’t necessarily create an arbitraging opportunity, no matters how we define it. Even statistical arbitrage depends on our model being wrong.

          By the way, having the right model doesn’t mean we don’t open arbitrage opportunities for others. I could have a perfect model predicting exchange rates and someone else could make consistent profits taking my bets and hedging them on the market.

        • Thanks everyone – Ill try one more post. I was willing to acknowledge that how quants use the word “arbitrage-free” is different from how statisticians use it (happy to go into this but this is getting tiring). The difference in language was the benefit of the doubt I was willing to give Clayton – but he doesnt seem to have really understood that. I tried to explain this as a ex-post test here, including references for other *single-path* tests (Foster and Stine is *explicitly* a single path test for drift). Its *tougher* to reject on a single path (for ex. Foster and Stine need *twenty fold* beat of the market on a single managers returns), but not “magical”. Im beginning to think Clayton has largely missed the point – and wont engage because of that. Thats fine! But Im with Andrew that twitter is largely useless, and Claytons responses have resembled that more than usefulness – I would *strongly* encourage you to read the Rabin paper, Foster and Stine and our response to Clayton.

          The filtration does not change the measure. The triple is defined (\omega, F, P) for a reason. The output of the model has to be a martingale – this can be achieved in many ways (Nassims, Nate seems to have cubic stochastic vol). This is not a question of the measure you’re generating from.

          Clayton incorrectly seems to think a single path does not have a test. He may not like our volatility bound, maybe he prefers Foster and Stine. Corollary here on Page 7: https://core.ac.uk/download/pdf/132271344.pdf

          “Corollary. A manager’s performance over a given period of time allows us to infer that
          he can beat the market with 95% confidence provided his portfolio grows by a factor of at
          least twenty‐fold relative to the total growth of the market portfolio over the same period. ”

          This is a *single path* test for drift. There are tests for autocorrelation, quadratic var/variance. (The test is in this paper, run a trade proportional to deviation from the initial point (or the market) and see the money it generates. Thats a test. Do you get significant alpha from this trade? But again this is getting absurd)

          If you prefer the Bayesian interpretation of uncertainty reduction: http://faculty.haas.berkeley.edu/ned/AugenblickRabin_MovementUncertainty.pdf

          yes, an unbounded martingale can have an arbitrary path with an arbitrary probability. A bounded martingale cannot.

          @Carlos – yes you might be unlucky. This is what p-values and rejection regions are for.

        • So I take it you’re still not willing/able to show how your test works with actual data? Hint: you may recognize the forecasts I provided if you look at them the right way.

          Just to be clear, here is the complete description of the “volatility test” from Taleb and Madeka (2019):

          “Essentially, if the price of the binary option varies too much, a simple trading strategy of buying low and selling high is guaranteed to produce a profit.”

          (for which a reference is given to Dupire’s notes for the final exam in his class at NYU in spring 2019. Such a handy reference!)

          If you really do want to have a productive discussion, I’d like you to please specify exactly what constitutes varying “too much,” what the trading strategy is to take advantage, and how you know that it is guaranteed to produce a profit. Then, please apply that test to the sample data I provided and/or some historical forecast data of your choosing. Until you do that, I agree this discussion is mostly useless and I don’t think we’re going to make any progress here.

          I think it *is* entirely possible that you and I are using the word “arbitrage” differently. I mean arbitrage in the usual quant finance sense of, e.g., Shreve (2004): an arbitrage is an admissible portfolio with zero initial investment, no probability of loss, and positive probability of profit. This does not, in particular, mean that forecasts have to be martingales to be arbitrage-free (nor does it preclude mean-reversion), despite what you and Taleb seem to be hung up on. The Fundamental Theorem of Asset Pricing says that arbitrage is impossible iff there’s an *equivalent probability measure* under which (discounted) prices are martingales. My point (1) in the initial comment above is that if a set of forecasts is performed using Bayesian updating, meaning that they represent the conditional probability of some future outcome at time T with respect to a filtration (meaning some E[I(T)|F_t] where I(T) is the indicator function of the event at time T), then those forecasts are automatically martingales with respect to the probability measure used to construct the forecasts, i.e., E[E[I(T) | F_t] | F_s] = E[I(T) | F_s] by the tower property. So, if you want to claim that some forecasts violated the no-arbitrage condition, you’d need to show they were constructed using a probability measure that’s not equivalent to yours/the market’s. Meaning this whole test of martingality is an unnecessary tangent to begin with, but go ahead and prove me wrong.

          By the way, for Daniel, Chris and others feeling confused, this is, I think, a partial resolution to the thought-experiments provided upthread here. There are certainly arbitrage opportunities if one party has information making an outcome certain when the other does not; this means their probability measures are not equivalent (differ on null sets). But unless the profit is certain, there’s no arbitrage. The probabilities used to construct forecasts can be completely personal to the forecaster, and whether this produces forecasts that “actually are” martingales is immaterial. For example, the fortunes of a person betting on a casino game with negative expected value don’t represent arbitrage, even though it seems like “surely” you could profit by betting against them. In any finite time, your profit is highly probable (according to realized frequencies/the “physical measure”) but not guaranteed.

        • No I’m not willing to engage what seems like an insulting link – I haven’t really even looked at it (seeing your words and the responses to it).

          I described the trade above – feel free to run it if the link wasn’t a joke. tell me the alpha it generates, and we can go from there.

          For context, the trades are given *literally* a sentence after that. (in scrolling order): “The stochastic integral
          …. can be replicated at zero cost, indicating that the value of … bounded by the maximum value of the square difference on the right hand side of the equation”

          The trade is obvious from that, its given by the stochastic integral! What is missing here for you! I described it above too! Its a mean reversion trade that “replicates” variance (also holds if vol isnt necessarily constant – it replicates the total QV \int^T_t \sigma^2_u du), we even describe it in the paper!

          Thanks for the description – but when I say that the SVI smile is not arbitrage free, I think you and I mean different things. This is what Im willing to acknowledge is the language difference. It means *if* you know that the trader is using an SVI model then it would fit into your description. This is not how quants use it when they say, “the model is not arbitrage free.”

          To the second point, the outputs must be a martingale. So yes, tomorrow you or Nate could adopt the betting market and according to Aug and Rabin – it wouldn’t work. Thats not the point we’re making – and the language difference hits here again. I was willing to acknowledge that – but that doesn’t diffuse the point that bounded martingales cannot vary excessively.

          I was willing to make the concession that this is a different language – and thats the disconnect. But now youre talking about words “you dont use the word quadratic variation” and you have now said what amounts to once again, not reading any of the papers mentioned above – do you agree Foster and Stine is a single path result? Do you also think A and R’s result is “magical”??

        • Actually, lets tie this out since you seem to be more interested in twitter style dialogue than in understanding – check the P&L on the trade described above and in our response (you’ll have to read further than where you seem to have stopped given your previous incorrect response – hold proportional to deviation!) and apply Foster and Stine to the resulting P&L. r_f would be 0 here to adopt the notation of their paper!

          Let me know if you actually decide to read! You don’t seem to have scrolled past page 1 of our response and given your lack of engagement with the other papers, Im guessing you haven’t opened (or perhaps understood) them yet.

          Good luck!

          Dhruv

        • Aubrey, thanks for your explanation, and for the definition of Arbitrage that you are using. this is helpful. I have only one question. if there should be zero probability of loss and positive probability of profit… I only have the question of according to whose Bayesian probability measure?

          When a trader gets an inside tip that a certain event seriously affecting the value of the stock will be announced tomorrow for sure. He assigns 100% probability to a lower close tomorrow and shorts the stock today… he believes an arbitrage opportunity occurred.

          when tomorrow the news is announced and also the government announces a big bailout of the company… the stock price rises slightly…

          it’s an admissable state of Bayesian information to believe a crash is a sure thing… but the real world doesn’t have to play along. a-priori there is no way to disprove the incorrectness of the sure thing, a-posteriori it is too late.

          For a Bayesian, it seems that we can only ever talk about whether our information and assumptions lead us to believe arbitrage is possible, not whether it “truly in fact is possible”

        • This is actually, despite all the venom, surprisingly helpful. Believe it or not, I think we’re making progress toward uncovering the real issue here — which is that you and I use the word “arbitrage” to mean different things — and I’m beginning to understand your argument. Honestly, thank you for the explanations. Maybe we can call a truce and discuss further with the temperature lowered a bit?

          Here is your “volatility test” argument as I now understand it. Tell me if I’ve got this right: Treating the forecast probabilities as prices, the portfolio that holds 2(B_0 – B_t) units of the “asset,” continuously rebalanced, has zero initial cost and final value given by _T – (B_T – B_0)^2, where is the quadratic variation (that’s just Ito). And since the process is bounded in [0,1] this means the total quadratic variation _T can’t be greater than 1 with probability 1, otherwise this portfolio certainly has a positive value and earns a guaranteed profit. So in the end it’s not actually about the process being a martingale, it is about it being arbitrage-free. Is that about it?

          The problem I have then remains that it requires you to know ahead of time that the forecast probabilities were going to be that variable. In order to be strict arbitrage, you’d need to know the QV would cross the required threshold with probability 1. For an arithmetic Brownian motion as in your paper, that’s maybe ok (although I think you run into problems with the convergence of the QV process and the assumption that B is bounded — basically you’re saying a BM can’t be a.s. bounded, which is obvious — probably you want B_t to be a doubly reflected BM instead or something), but in general the QV_T is going to have some probability distribution that includes some probability of being greater than the threshold and some below. Observing a single sample from the process isn’t going to establish whether it was “arbitrageable,” even if the path is so variable that the particular realized QV_T is > 1.

          The data I provided is not insulting, I promise. I think we can all benefit from a concrete worked example, as mentioned below in reply to Carlos as well. In that spirit I’ve calculated the portfolio values using your strategy, and I’ll also tell you where the forecasts came from: this is a sample path from Taleb’s binomial option pricing model, and not at all an atypical one. As you’ll see it has a final QV around 0.89, but many of the random samples have QV this high. The 538 forecasts in 2016 were nowhere near this variable, so I don’t know how can you look at one and say it’s evidence of arbitrage while the other is a rigorously updated arbitrage-free model. As I say in my note, even for a simple random walk model it’s easy to produce forecast probabilities that varied more than 538’s, and by construction they’re necessarily martingales.

          But anyway, you’re not actually talking about arbitrage in that strict sense. You’re defining arbitrage in some looser sense of earning higher average returns than should be reasonably attainable, like “alpha.” So this is like saying “this fund strategy outpaced the market by a lot this year, therefore this is evidence of some inefficiency in the market.” The Foster-Stine-Young argument only appears to apply using this definition as well (again, this is not referenced in your paper, so I’m playing catch-up here). With the particular strategy above I’m not sure how you’d use their test here, since the initial portfolio value is 0, how do compute the return? And what’s the comparable market index? Maybe you can fill in the rest, if this is now the real test you want to use. What’s the result of applying the same test to the 538 data, and how do you assess that their forecasts were wrong?

        • (Sorry, formatting got messed up here, I put brackets around B to represent the quadratic variation but it turned the text bold. Hopefully it’s clear what I meant.)

  2. Maybe I’m missing the point, but it seems to me that the fact that I wouldn’t take 100:1 odds on Trump wining the popular vote doesn’t say anything at all about the validity of the model. I am risk-averse and there are lots of fair bets that I’m sure I wouldn’t take. (Personally, I wouldn’t even take even odds on a coin toss.) I don’t know how to back out the risk-aversion from my personality in order to generate my “real” intuitive estimate of what a fair bet would be. I rather doubt that professional statisticians are different: that basically what you’re saying when you say “That wouldn’t be much of a fun bet.” So I just don’t see the relevance of subjective intuitions about remote probabilities. Unless there is some reason to believe that betting markets are risk neutral, I don’t see why they are any different.

    • Norman:

      There are all sorts of problems with betting. Most simply: some people bet for fun, other people don’t like to bet, and any bet implies that there’s someone on the other side who might have private information. But probabilities can be calibrated in other ways. I can be concerned about coverage of my predictive intervals even if I never make a direct bet on the probabilities. Also we can see where a prediction’s probability models come from.

    • Norman: Let’s say, as a thought experiment, one had to bet. That takes care of the risk aversion/betting for fun argument. (I also don’t think in a US election one has to seriously worry about information asymmetry at all, to Andrew’s point.). Which side would you rather take at 100 to 1? Or does 100 to 1 seem like fair odds, such that you would be indifferent between the two sides? And if so, would 100 to 1 also have seemed fair to you back in June? That is the key point. The betting argument, which imho statisticians are far too quick to dismiss, is not about substituting intellectual discussion for a money wager, or “putting your money where your mouth is”. It’s a thought experiment, and it’s just about the only “real world” test one has for a US election model – since we won’t be able to actually run an election 40k times to find out whether the model is properly calibrated or not. It’s totally fine to be risk averse – or even, say, to have a moral objection to betting. But 100 to 1 odds against trump winning popular vote in june, or – even worse – 40k to 1 odds against biden winning ohio yet losing the electoral college is evidently so far off from “fair odds” that it seems to me, with all due respect, that the only proper thing to do is to throw away the model and start again. There are only two other options: a) claim that those *are* actually fair odds. One cannot help but suspect this argument being made in bad faith, and here one does feel tempted to suggest a real-world bet to call the person out. Or b) claim that those probabilities do not actually represent fair odds. But what, pray tell, do they represent then?

      • I don’t think 100 to 1 odds against Trump winning the popular vote, even in June, would be terribly unreasonable. He didn’t win the popular vote in 2016, and basically every factor seems worse for him – Biden is much less disliked than Clinton, Trump is more disliked than he was in 2016 (and doesn’t have the “let’s try an outsider” factor anymore; he’s a known quantity), and the economy is bad, etc.

        The *only* way I could see Trump possibly winning the popular vote would be some event that massively depressed Democratic turnout, but I’m not sure I can think of anything plausible that’s significant enough. Even if Biden caught COVID, much of the Democratic voting is probably as much anti-Trump as pro-Biden. (And from a “back in June” perspective, there’s be the possibility for that to be resolved before the election, anyway; it’s only really a problem if his health is uncertain *while people are voting*. If Biden had caught it in June and been recovered by July…).

  3. I would like to see a follow up post summarizing the tail probability issues. Is it already discussed in your election modeling paper? You had a couple posts that seemed to claim 538’s forecast had some substantive technical issues with fat tails/lower between state correlations. Now it sounds like you are saying you have not done the research to see how well calibrated your forecast is for rare events.

  4. Edward:

    Keep in mind that reporting a 99% winning probability for Biden doesn’t mean that they expect Biden to win with 99% probability: they expect Biden to win with 99% probability unless something unexpected happens or their model is wrong.

    “In the election context, these could relate to shifts in the polls or to unexpected changes in underlying economic and political conditions, as well as the implicit assumption that factors not included in the model are irrelevant to prediction. No model can include all such factors, thus all forecasts are conditional.”

    https://www.youtube.com/watch?v=pjvQFtlNQ-M

    • Carlos:

      Just to be clear, (a) we don’t give Biden a 99% win probability, and (b) shifts in the polls or to unexpected changes in underlying economic and political conditions are part of our model’s error terms. Our forecast does not assume no shifts in the polls and no unexpected changes in underlying economic and political conditions. Yes, all models are conditional, but our model is not conditional on the polls not shifting! Indeed, a big part of our model is the model for how polls can shift.

      • “Indeed, a big part of our model is the model for how polls can shift.”

        So how do you account for this? Presumably you’ve calibrated your model against past elections; some of those past election have major shifts in poles; thus your model implicitly accounts for shifts in poles via past elections.

        Is this correct or do you have some explicit model for potential shifts? if so I can’t imagine how you would do that.

        • > do you have some explicit model for potential shifts? if so I can’t imagine how you would do that.

          This wouldn’t be hard at all.. for example if you’re focusing on sudden strange events, you could have a model for the rate at which weird things happen, and the size of the poll shift caused by sudden weird things. Then you can project forward in time how much polls will shift based on the sum total of all the weird things that happen between now and the election.

        • “This wouldn’t be hard at all.. ”

          Of course it’s not hard to make a model! :) See COVID-19.

          What’s hard is making one that reflects reality. Don’t see COVID-19.

        • Speaking of bad COVID modelling, I’m getting a hoot out of the “herd immunity” crowd, running around the world and finding places to apply their aspirational models, only to be have them shot down time and time again as a second wave of infections washes across the supposedly herd-immune community.

          I guess it just goes to show you…[your answer here]

        • Well, a couple of things:

          – In regards to COVID, people are using the phrase “herd immunity” to mean two different things; the essentially total elimination of the disease from an area, or the reduction of the disease from “pandemic/epidemic” to “endemic” status (which doesn’t mean zero cases or deaths, just a lower sustained level which doesn’t grow exponentially and is either closer to constant or seasonal like colds/flu).

          I agree the first hasn’t happened; the second may well have happened in many places.

          – Have we actually seen a second wave of infections “time and time again”? I’ll give you Madrid (which *shouldn’t* have herd immunity; its % infected does not seem to actually have been that high) but most of the second waves in the US (even if they are in the same *state*) either don’t seem to actually be happening in the same place, or are small localized upticks.

          I mean, pandemics ended before vaccines were invented. So some form of ‘herd immunity’ is certainly real. That doesn’t mean that any particular place has reached it yet for COVID – though it’s hard to believe that e.g. some parts of Latin America haven’t.

        • “the second may well have happened in many places.”

          I guess that’s my point. So far, from what I’ve seen, each time the “herders” claim they’ve found a new place where they think “the second” has happened, another wave of infection rolls through that community and kills the model.

          It may well be that it has happened in many places. AFAIK, we just haven’t found them yet. There may be a pot of gold at the end of the rainbow. Really I don’t know, I’ve never been there.

        • See I don’t think current numbers tell us much, we’ll need another month or two to see what happens, because…

          – endemic doesn’t mean zero cases/deaths, it just means a steady rate rather than unusual outbreaks/exponential growth. So even if there are places where COVID has become endemic rather than pandemic/epidemic, we should *still* observe some rise as people start going back to normal. (Because endemic + no restrictions would be a higher “steady state” than endemic + restrictions).

          – not all communities within a large jurisdiction (like a US state or European nation) will have the same levels of immunity

          Are there places that both had a high % infected in spring, and are seeing widespread high levels of infection now? Madrid wasn’t up to even highly-optimistic estimates for herd immunity (20% or so); NYC current levels aren’t high enough to be distinctly a “second wave” rather than localized outbreaks.

          And what do we expect “endemic” COVID to look like?

        • IE, how many places have actually had two distinct waves (in the same place)?

          The US as a whole, and some states like Louisiana, appear to – but on the local level, each individual place only had one wave. Some Great Plains states (eg South Dakota) had early meatpacking-plant outbreaks and are now being hit more widely.

          The only places I’m aware of that have seen anything like a true “second wave” are places like Madrid in Western Europe, that shouldn’t be expected to have herd immunity even under highly optimistic expectations (20-25% infected).

          Am I missing something?

        • Confused:

          You’re following this in more detail than I am. However, Tyler Cowen at Marginal revolutions has pointed to several instance with what he claims are repeated waves.

          However, the fact that no second “wave” strikes a specific community doesn’t require herd immunity by any means. It could just mean people are taking more precautions, combined with some level of existing immunity. Where I live, people are obviously taking more precautions. Masks are everywhere. Bars are open but have strict capacity limits and outside seating. Even out hiking its rude to pass someone without putting your mask on *and* social distancing. Despite the lack of any serious centralized effort to understand the transmission of COVID, people are figuring out how to prevent it.

          And while my reading on herd immunity is admittedly light, most of what I’ve read is just like everything else regarding the pandemic: wild guessing. People running around applying poorly thought out models with crappy data to anything they can apply it to.

          Maybe I’m wrong.

        • > NYC current levels aren’t high enough to be distinctly a “second wave” rather than localized outbreaks

          Localized mostly in places where people just don’t care about masks and other preventive measures because they think they have herd immunity already.

          And contained by closing a hundred of schools and thousand of businesses.

        • “However, the fact that no second “wave” strikes a specific community doesn’t require herd immunity by any means.”

          Absolutely!

          I’m not claiming that herd immunity *is* active in NYC or anywhere else. Just that current data doesn’t seem to provide much information one way or another.

          As Carlos Ungil points out, even if these outbreaks die out, that could just be re-imposition of measures, rather than anything about immunity.

          I don’t think we can say anything useful about “herd immunity” until more time has passed in some of the places with very limited measures. Sweden seems to have rising cases but extremely low deaths; in a month or two we might have a better picture.

          (I also think death rates in March/April were ‘anomalously’ high due to poor understanding of the disease/treatment. So even if immunity protection is insignificant we should not see March/April style death rates in any fall wave, as predicted e.g. by the IHME model – though their numbers have dropped somewhat over the last few weeks – IMO from “completely ridiculous” to “very implausible”.)

        • I had correspondents from Michigan in April announce to me, “See it’s just petering-out by itself”. They could not bear the thought that the mitigations (everyone staying home) had anything to do with that. Why? Because they quite naturally resented the mitigations. Once the thinking takes this turn, it is capable of overturning any antecedent understandings that might get in the way. So along those lines, the same correspondent “discovered” that Dr. Snow’s investigations in London ‘did not actually cure cholera — it petered out by itself’. The substratum of this attitude is: either scientists are wizards (and can fix *anything*) or they are worthless charlatans — and there is *no* middle ground between.

        • Yeah.

          It’s kind of like the Sweden thing; it’s either a disaster or a dramatic success. Whereas I think the more accurate statement is that Sweden did significantly worse than its immediate neighbors (at least so far, pending fall waves) but not nearly as badly as March-April predictions suggested (eg, overwhelmed hospitals).

          I am not making any particular claim about herd immunity being achieved or not achieved in any particular place; in fact I don’t think the data supports any strong claims (the places which I think likely have the highest % infected, and thus the best prospects for herd immunity, are generally those with rather poor data; and we’re not far enough into fall to tell what is happening).

          One other point: government actions on public health are only as effective as the public’s following of them. I think this messes up the correlation between “measures” and “actual results” because some places have e.g. mask mandates in place but they are widely ignored, while other places do not have them but people are largely careful.

        • > The only places I’m aware of that have seen anything like a true “second wave” are places like Madrid in Western Europe, that shouldn’t be expected to have herd immunity even under highly optimistic expectations (20-25% infected)

          As opposed to the places that should be expected to have herd immunity? What places are those?

        • Depends what you think the herd immunity threshold is.

          Some cities in South America seem to have hit 60%+ though (and there are probably a lot more that haven’t had serology studies done…)

          But some of the “network” models have suggested herd immunity thresholds can be much lower in less homogeneous social networks.

          But I think serology studies suggest Madrid was only a bit over 10%, so no way – but it had a disproportionately high death rate due to lack of understanding of treatment back in March + age of population.

          Again, though, I think there is a lot of confusion as to what people mean by “herd immunity”. When pandemics end naturally (e.g. 1918-19, before vaccines) the virus doesn’t disappear, it just stops being “at epidemic levels” and becomes part of the normal background.

        • I guess those South American cities are not yet out of the first wave so they could hardly see anything like a true “second wave” yet.

          In summary, the places with a second wave are places that shouldn’t be expected to have herd immunity while the places without a second wave are places that shouldn’t be expected to have herd immunity. Makes sense.

        • The idea that there are always multiple waves is just wrong. I mean, if you run a standard SEIR model it has *one* wave, everyone gets infected until herd immunity.

          Of course in real epidemics, where the disease is something more than a simple sniffle/cold, people take steps to protect themselves, and maybe immunity wears off, and there are mutations, and we get a complex feedback between cases and actions, and you see a certain kind of chaotic dynamics resulting in “waves”. But those waves aren’t just “natural” they’re a result of the choices and actions of the people.

          The fundamental issue with “Herd Immunity” ideas is that if you induce the “Herd Immunity” by giving people the virus… you’ve just capitulated completely. It’s like here comes the Panzer division, let’s all lie down in front of the tanks and make it easy for them to roll over us.

          All the theoretical “we could get to herd immunity with only 26% of people having the virus if the connectivity graph just happens to have this particular form and we just give it to this particular subset” stuff is total garbage. It’s like “you could hit the golf ball into the hole in one shot”…yes you could… but almost never does that happen.

          Herd Immunity is only an idea that makes sense with respect to vaccines, or an endemic virus where everyone already had something like it in the past.

        • @ Carlos Ungil:

          Well, I’m mostly saying that I don’t think anyone knows enough to make firm statements about herd immunity for COVID, due to the lack of places that have all of:
          1) decent data showing a high % infected
          2) significant time since end of first wave/surge
          3) low enough restrictions that lack of resurgence can’t be attributed to restrictions alone

          >>The idea that there are always multiple waves is just wrong.

          Yep.

          >> The fundamental issue with “Herd Immunity” ideas is that if you induce the “Herd Immunity” by giving people the virus… you’ve just capitulated completely.

          Thing is, though, there are two different statements here. If this is an ethical statement that we *shouldn’t* pursue herd immunity by natural infection, fine.

          What I object to is statements/implications that herd immunity by natural infection *can’t* happen (or *won’t* happen in the absence of measures or vaccination). If this were true, humanity wouldn’t have survived before the invention of vaccines.

        • @confused: yes of course herd immunity does occur naturally, I don’t deny that. But most likely it will require upwards of 50% of people to have had the virus within a 6 month period, and possibly upwards of 60 or 70%.

          But yes I was making a statement about both practically and ethically, we shouldn’t pursue it. We have strong evidence based on many many vaccines in trials now that there will be an at least moderately effective vaccine within a few more months (say 3 to 8). So pursuing a herd immunity strategy prior to that means maybe up to tripling the number of people who’ve had the virus in the US over the next 3 to 6 months, rather than waiting until that vaccine is available and we can use it instead. A terrible strategy.

        • Oh, I didn’t mean that *you* thought it was impossible. But I’ve seen that claim repeatedly – and the original claim in the thread by jim seemed to at least be saying that no place had achieved herd immunity (I don’t think we can be *sure* of any place since not enough time has passed – but I would be *very* surprised if the hard-hit parts of Latin America with 60%+ seroprevalence haven’t, for example, and I wouldn’t rule out even some parts of the US).

          And yeah I am fairly optimistic about vaccines at this point.

        • Quite possibly none yet, though I’m not keeping close track of results outside the US. Though *if* you believe the 20-25% estimates, possibly NYC, and maybe Stockholm? (Do we actually have any decent seroprevalence from Stockholm, or only estimates?)

          My point is exactly that the data isn’t sufficient to make useful statements one way or another.

          Also, I think one would need a fairly strong expectation about what “post-pandemic/endemic” COVID would look like, to make any testable statements, anyway. What kind of pattern of hospitalizations and deaths would be compatible with an “endemic” state, and what kind of pattern would disprove it?

          Clearly continued exponential growth, or anything stressing hospital capacity, would disprove it. But what about short of that?

        • Jim:

          We have a random-walk time series model for the state-level opinions with a pre-specified variance. So public opinion is allowed to slowly drift during the campaign. We played around with a model allowing occasional large jumps, but based on past elections this did not seem necessary.

          Our model also allows for nonsampling error in polls, and we also have a time series for differential nonresponse error for polls that don’t adjust for partisanship of their sample.

        • So you’re not throwing in some term for earthquakes or even modest tremors. You’re allowing modest random variation.

          How do you come by your pre-specified variance? Based on past elections? Cuz if that’s it’s lard instead of shortening or vegetable oil, in the end it’s the same cake. No?

          Oo! I spy vast literary and rhetorical potential in the mixed cake/earthquake metaphor.

        • Jim:

          Yes, our model allows for variation over time of about the same amount as past elections. Or maybe a bit less, as we are also aware that political polarization is associated with more stability in public opinion.

          Allowing for variation that’s consistent with the level of variation in the past is not the same as setting that variation to zero. To go with your analogy: using the same fat in the cake as before is not the same as baking a cake with no fat.

        • “Allowing for variation that’s consistent with the level of variation in the past is not the same as setting that variation to zero.”

          I agree.

          My intended analogy was that allowing variation through an explicit specified variance that’s derived from previous election variance is about the same thing as building the model replicate past elections without an explicit term for variance.

        • Except that, as you pointed out, you have or can reduce the specified variance to account for trends in variance due to polarization.

        • 538 shows their model results for an election on Nov 3rd, as we get closer to election day, if the polling remains exactly that same, Biden’s chance of winning will go up simply because there is less time for something to go wrong. The other week they actually disclosed a second model result which was what was the probability of Biden winning if the election was THAT day as opposed to Nov 3rd. The difference was something like 12%, 91% chance of Biden winning that day, 79% chance on Nov 3rd. That difference or discount or fudge factor or whatever you want to call it is for all of the stuff that could happen between then and election day. Every day the fudge factor will get smaller until in election dat it becomes 0.

        • I’m sorry for giving a misleading quote. It seems that what you mean is that when you communicate those probabilities (91% of winning, 99% for the vote share [*]) you do take into account shifts in the polls, unexpected changes in underlying economic and political conditions and model inadequacy.

          I thought I remembered you writing something like “the forecast is good except as long as nothing unexpected happens” but I either read that elsewhere or I did misunderstand this quote when I read it the first time.

          [*] By coincidence these are the same probabilities that you didn’t believe in https://statmodeling.stat.columbia.edu/2020/07/31/thinking-about-election-forecast-uncertainty before you updated the model increase the level of uncertainty. Of course they do not seem so over-confident now that the election is three weeks away rather than three months.

  5. I do think there is something off with the tiebreaker in your senate model–apparently, in the event of a 50-50 split, there’s a 73% chance a republican will break the tie, i.e., Trump will win. This is inconsistent with the presidential model–it essentially assigns an 8% probability to republicans winning exactly 50 seats AND Donald Trump winning the presidency. Seeing as Trump has a 9% chance of winning overall, I’m skeptical that 8 out of 9 scenarios in which Trump wins involve republicans winning no more and no less than 50 senate seats.

  6. Out of curiosity, how does voter registration fit into your model? Big net gains for Republicans in PA and FL but you have those going strongly Democratic. Also, any thoughts on white collar work from home tending due to COVID being Democrats and willing to answer surveys? There may be sampling issues if the baseline is 2012 and 2016.

    • Tom:

      1. We do not have voter registration in our model: your guess is as good as mine.

      2. Regarding survey response: good surveys do adjust for age and education, but, sure, there’s always the possibility of bias. Indeed, the big reasons why our model has the uncertainty it has, is that we are allowing for the possibilities of major polling bias (presumably from differential nonresponse not accounted for by the survey organizations) or a big swing between now and the election.

  7. For those (“confused”) arguing that there is never more than one local peak, in my county there have been three so far. (https://www.c-uphd.org/champaign-urbana-illinois-coronavirus-information.html). On could argue that the third one represents a new population, incoming students, but that still leaves two very distinct ones well outside statistical fluctuations. Now you could say that one was centered in one zipcode and the other was more centered in another zipcode, but heading down that rabbit hole turns your claim into an unfalsifiable prediction.

    • I am not arguing that “there is never more than one local peak”; I specifically said that Madrid is having a second.

      I said that I am not aware of a second local peak *in places that are very hard-hit*, which (from that page) doesn’t seem to be the case in Champaign County. (27 deaths; googling shows ~210,000 total population, so well below the US overall ~0.07%.)

      Also, that page shows three peaks in deaths, but the sample size is small enough (27 deaths total) that I am not sure that means anything.

  8. Thanks for considering my previous comments — I appreciate it and I know many researchers ignore issues that are pointed out to them, so it is great to see you and your team respond.

    I do think the tail overconfidence is more of a problem for the Economist model than you suggest. Here’s why:

    The Economist model assigns Trump 8% probability of winning the election and 1% probability of winning the popular vote. How reliable are those predictions? When I checked model performance on past years, I found that state events assigned 8% tail probability actually occurred in 18%, 16%, and 18% of instances in 2016, 2012, and 2008. Similarly, state events assigned 1% tail probability actually occurred in 10%, 8%, and 6% of instances in 2016, 2012, and 2008. That is worrying‼ At first look, it suggests the model should not be trusted.

    However, state results are nonindependent, which means the problem may not be as bad as first looks. To address this, I analyzed rare events, which can be used to sidestep the nonindependence issue, as explained in comments from me that you link to in your blog post. I found that the model assigned less than 1 in 50 000 probability to state outcomes being as far from the mean model predictions as actually happened in 2008, 2012, and 2016. In summary, analyses that avoid the nonindependence issue still show bad model tail performance for states.

    A model that assigns incorrect probabilities to state results can still assign correct probabilities to national results, like the probability of Trump winning — I absolutely agree with you on this. But given that Economist model’s state election predictions are incorrect, why should you or I have confidence in the reliability of its national election predictions? This is the issue for me.

    If you have any questions about my analyses I’m happy to answer them here or at the email address submitted with my comments.

    • Fogpine:

      We’re working on it right now. Short answer is that we are indeed concerned about tail probabilities; also I don’t think this will affect our main forecast much, because if you get the tails right by adding another percentage point or two, that might just change Biden’s win probability from 0.91 to 0.89 or whatever. In any case, it’s worth tracking this down. Model failures are an opportunity to learn.

      • Andrew:

        Ah, thanks. I agree it could leave the main forecast basically the same — I tried checking myself, but I’m new to stan and have difficulty understanding the .stan file.

        I was also thinking about whether there is some way to check calibration against not-extremely-rare state election outcomes (10% tails, etc) by working under the assumption that your state covariance matrix is correct. Maybe that thought is useful to you, maybe not.

      • After more analyses, I think the stated probability of Trump winning would be changed by using longer tails to calibrate state predictions. However, the analysis methods that I used are hacky, so maybe better work from your team would be somewhat different.

        Here’s what I did:

        I logit-transformed all state result simulations to work on a scale where they are normal.

        For each combination of state and previous election year, I replaced the state’s normal distribution of predictions with a shifted-and-scaled t distribution of the same mean and sd, as well as high degrees of freedom (DoF). Since the t approaches the normal distribution for high DoF, this left model predictions unchanged. I then reduced DoF until state tail predictions appeared generally calibrated. That happened for about 3.5 DoF.

        For 2020, I took the simulated outcomes matrix (electoral_college_simulations.csv: 40 000 rows of draws x 51 columns of states). For each entry in the matrix, I computed its CDF value from the state’s normal distribution of outcomes, then replaced it with the same percentile from the shifted-and-scaled t distribution of 3.5 DoF. For example, if the matrix entry was 1.96 and the state’s 2020 simulations had a standard normal distribution, then I replaced the matrix entry with the 97.5th percentile of a t distribution with 3.5 DoF, which is 2.94. This procedure preserves mean predictions for each state and the Spearman correlations between states, but not Pearson correlations between states.

        Altogether, these steps increased Trump’s 2020 win probability by 50% — from 8% to 12%.

        My methods are hacks and will be different from the correct approach of working t distributions directly into the stan model, which I still can’t get running. But I think my results should be similar to the correct approach because non-tail parts of predictions shouldn’t change much between normal and t distributed predictions. The end message is that a DoF low enough to calibrate state results is likely to substantially increase Trump’s stated win probability.

        I also learned a useful lesson: As Trump’s chances reduce, the assigned Trump win probabilities become increasingly different for narrow- vs. wide-tailed models. This may explain why you did not see changes when trying wide tails in the past (Trump’s win probability was higher then). Moreover, if Trump’s chances continue to reduce, then using normal vs. wide tails will produce larger differences in Trump’s win probability in the future. So wide tails may become necessary even if you don’t think they matter now.

        I’m happy to answer any questions here or at the submitted email. Thanks.

  9. [the thread was too deep, this is a quote from Aubrey Clayton above https://statmodeling.stat.columbia.edu/2020/10/12/more-on-martingale-property-of-probabilistic-forecasts-and-some-other-issues-with-our-election-model/#comment-1549544 ]
    > Just to be clear, here is the complete description of the “volatility test” from Taleb and Madeka (2019):
    > “Essentially, if the price of the binary option varies too much, a simple trading strategy of buying low and selling high is guaranteed to produce a profit.”

    I also find a bit confusing what’s being claimed. The term “guaranteed” in that quote suggests it’s about arbitrage in the sense you mention of “zero initial investment, no probability of loss, and positive probability of profit” (as do the references to “arbitrage-free” and “arbitrage pricing” in that paper or the original one from Taleb).

    However, the “simple trading strategy of buying low and selling high” is not about replicating portfolios if it means buying now and selling later. This interpretation is unambiguous in the original paper:

    intertemporal arbitrage is created, by “buying” and “selling” from the assessor

    someone can “buy” from the forecaster then “sell” back to him, generating a positive expected “return”

    A “high” price can be “shorted” by the arbitrageur, a “low” price can be “bought”, and so on repeatedly.

    If it’s about trading with the forecaster at different times to generate a positive expected return it’s not guaranteed. And what’s the relevance in that context of arbitrage-free pricing using a risk-neutral measure?

    If it’s about true arbitrage, where the arbitrageur can trade with the forecaster and with other market participants zero-investment portfolio with a guaranteed non-negative payoff, this is unrelated to the forecasts being a martingale or the forecaster being somehow wrong.

    Imagine the forecaster has a perfect insight and produces a sequence of forecasts that passes all the tests for real-world martingality. He thinks, and he’s right, that the true price of the binary option is 0.52. However, the market/risk-neutral price is 0.48. The market-implied probability is lower than the real-world probability. The forecast is right.

    Arbitrage is possible even thought the forecaster is accurate. The forecaster will be happy to buy at 0.50 (for a expected profit of 0.02) and the arbitrageur will be happy to sell at 0.50 (for a sure profit of 0.02, as he can just buy it from someone else at 0.48). The arbitrageur is not making his profit at the expense of the forecaster.

    Conversely, if this was a bad forecaster with a bad model and a sequence of predictions failing the martingality tests who valued the option at 0.48, in line with the market price, arbitrage wouldn’t be possible. (But someone with a better prediction may make money, on expectation, trading with the forecaster.)

    • Thanks Carlos. this was a very useful illustration of some of the issues with the whole conversation.

      To me intertemporal arbitrage strictly speaking is not possible. At any moment an asteroid could strike your house killing you before you close out your position causing you to lose money, so the risk of loss is never zero intertemporal my.

      true arbitrage must always be between two different people at essentially the same time (buy low from Joe, sell high to Fred).

      however statistical intertemporal moneymaking is a real thing… there are strategies we can discuss where having better information results in a very high probability of making money. are quants calling this arbitrage?

    • +1. This whole conversation would be easier to follow with a fully worked example – from Dhruv, or Dhruv + Aubrey – starting from unambiguous definitions of the relevant terms and demonstrating careful workflow.

    • Carlos:

      Thank you for this comment. I hope the new additions in the thread above are helpful. Per your (and others’) request, here’s another simple example to illustrate some of the key concepts here as I understand them:

      Suppose someone is going to play 3 spins of roulette and bet on Red each time for 1 unit per spin. Before each spin he’s also going to forecast the probability of finishing the game above where he started. Initially he computes the probability to be p^3 + 3*p^2*(1-p), where p is the chance of winning each spin. Call this the forecast probability at time 0, F_0.

      Then, supposing he wins the first spin, his new forecast probability is p^2 + 2*p*(1-p), since he needs to win at least 1 out of the following 2. Call this the forecast at time 1, F_1(W). If he loses the first spin his updated forecast is p^2, because he needs to win both of the remaining times. Call this F_1(L).

      It happens then that F_0 = p*F_1(W) + (1-p)*F_1(L), and this pattern holds in general, meaning the forecast probabilities are a martingale. However, this is true no matter what probability p he uses. According to his probability measure, the forecast is always a martingale, by the tower property of conditional expectation.

      Now, imagine an investor sees this and wants to bet on whether the forecasts are going to go up or down. Maybe the investor knows the real probability of winning roulette. So maybe the gambler/forecaster thinks his forecasts are going to behave like a martingale, but the investor knows otherwise. They can earn a *likely* profit by betting the right direction — say the gambler thought the game was 50/50, so his realized forecast win probabilities are going to steadily drift downwards. The investor would sell the initial forecast short and buy it later when it’s low. So, what I said above is that strategy may earn a likely profit but it’s not arbitrage in the strict sense of a guaranteed chance of profit with no possibility of loss. If the forecaster’s probabilities are equivalent to the investor’s, meaning they agree on what events are possible just with different probabilities, there’s always some probability the investor will lose money. The Fundamental Theorem of Asset Pricing guarantees this under general conditions.

      But if, say, the player thought there was some probability of winning 2 units on the final spin, say, when that really is impossible, the forecasts he came up with would be a martingale but with respect to a probability measure that’s not equivalent to the “true” measure. This is why I said it’s not actually a requirement that the forecasts “actually are” martingales to be arbitrage-free. However, they should be martingales if the probabilities they embed are properly calibrated, meaning they agree with the frequencies with which the events occur.

      Obviously, all this talk about true probabilities and frequencies runs afoul of subjective probability assignments based on information, but the usual setup in this domain is that there is some true underlying “physical” probability measure, maybe just representing the consensus of the market. There’s also an assumed filtration that represents the “information” available to all market participants, so they’re all able, say, to assign probabilities to the same events at the the same time.

      Hope that helps.

      • Thanks for your reply. I think I agree: a wrong model (which produces a series of predictions which is not a true martingale) is not guaranteed to lose in general when confronted with a good model (based on the true probabilities) but only when it’s so bad that it bets on impossible outcomes. (I’m not sure I agree with the last paragraph but anyway it seems unrelated to the preceding discussion.)

        • I said “a wrong model (which produces a series of predictions which is not a true martingale)” but it’s not a necessary condition. A sequence of constant predictions would be a martingale and that doesn’t make a model right. Also, the idea that “not being a true martingale” indicates that a model is wrong has to be understood in a statistical sense. The model can only be proved wrong when things that are supposed to be impossible are observed.

      • This is helpful. Clearly, if someone assigns nonzero probability to an impossible state of affairs (say, somehow “winning twice on one spin” as you mention, or let’s say trading in a stock below a price that the rules of the market prohibit for example) then it’s risk free to bet against that and true arbitrage becomes possible.

        in any other situation, the question of “who is right” is not decidable ahead of time, and so we always have a risk of loss, and hence no arbitrage.

        However I wonder if the notion of “arbitrage” being used by Dhruv is a “free lunch with vanishing risk” (https://en.wikipedia.org/wiki/No_free_lunch_with_vanishing_risk)

        I don’t understand the technical definition given there exactly.

        • The processes being considered here are all bounded on a finite interval, so no need for the no-free-lunch version. The ordinary kind of arbitrage and Fundamental Theorem applies.

          I should have mentioned, intertemporal arbitrage is definitely possible, if the forecaster is also asked for conditional probability forecasts (e.g. “What will your ultimate win probability be if you win in the next spin?) It’s possible to show that any combination of prices for the three contracts — event A, event B given A, and event A and B — has to satisfy Bayes’ rule in order not to admit arbitrage.

          In practice, though, we never get those forecasts from say 538. Instead we just get all the conditional probability updates based on whatever happened, without the missing pieces to see if it was consistent with some Bayesian updating for some probability measure.

        • > It’s possible to show that any combination of prices for the three contracts — event A, event B given A, and event A and B — has to satisfy Bayes’ rule in order not to admit arbitrage.

          Sure, that makes sense. I guess I was already assuming a Bayesian probability measure.

          Thanks for the clarifications.

          As far as your “Obviously, all this talk about true probabilities and frequencies runs afoul of subjective probability assignments based on information, but the usual setup in this domain is that there is some true underlying “physical” probability measure, maybe just representing the consensus of the market.”

          I think there is no problem with Bayesian states of information. It’s always the case that a person with *better* predictions about the future can make money from someone with worse predictions, but it’s not arbitrageable so long as the Bayesian isn’t assigning positive probability to things that logically can’t happen. If that occurs, then the counter-party can know for sure they will make money by betting the thing *wont* happen.

          The worst case of course is where the outcomes are perfectly predictable (ie. a completely rigged game). When that’s the case then the rigger knows that certain outcomes have zero probability, and can arbitrage the Bayesian.

        • About assuming the existence of a probability measure: yeah, just wanted to include that for completeness. Taleb talks about it as though it’s something new, but the argument goes back to de Finetti. I think it just underscores my point above that framing things in terms of option prices and applying the machinery of no-arbitrage pricing really adds nothing of substance to forecasting, since probabilistic forecasts already satisfy all the desired properties those prices are supposed to have anyway. The hard direction is entirely the other way (that is, starting with prices and seeing them as probabilities): if there’s no arbitrage there must be an equivalent martingale measure, intertemporal arbitrage in particular implies prices satisfy Bayes’ rule, etc. But conditional expectation with respect to a probability measure automatically produces martingales and satisfies Bayes’ rule. So we can dismiss the arbitrage concerns out of hand and get back to the real questions of how to come up with probability models based on data and how to evaluate them against outcomes.

        • > framing things in terms of option prices and applying the machinery of no-arbitrage pricing really adds nothing of substance to forecasting

          I agree.

          “Thus we are able to show that when there is a high uncertainty about the final outcome, 1) indeed, the arbitrage value of the forecast (as a binary option) gets closer to 50% and 2) the estimate should not undergo large changes even if polls or other bases show significant variation.”

          Sure, if there is high uncertainty about the final outcome the probability is close to 50% (because the uncertainty is high) and changes are not large (because the probability is close to 50%).

          “a binary option reveals more about uncertainty than about the true estimation, a result well known to traders”

          I think non-traders also understand that when there is high uncertainty about a binary outcome the probability is close to 50%. (Disclaimer: I took a course on stochastic processes with applications to finance in grad school.)

        • I think the harder case is something like my asteroid hitting the earth… Conditional on the data today we can project forward the trajectory and see which states will be wiped out… but there are many bits of mass we can’t see that perturb the orbit so tomorrow the trajectory will be different… on the other hand we may not have a model for the perturbations… someone with better information may have such a model and they can make money by betting against us, but they can’t make *guaranteed* money.

          in the context of Andrews model the state level polls play the role of the asteroid. Sure someone may say “hey the polls are highly variable, this person is always updating on them without projecting their future variability… I can make money off this… Perhaps true. But you can’t make GUARANTEED money unless you can guarantee something about the future of the polls.

          not all money making on the basis of better information is arbitrage

        • +1 to Carlos’ comments, and thanks all for the discussion. Every time we follow a point through, this whole thing starts to really seem like a Motte and Bailey situation. The Motte is the intuition that high uncertainty -> 50% (in a binary setting) and a couple other uncontroversial assertions; the Bailey is the ‘no-arbitrage pricing’ machinery. The unconvincing link is how their argument doesn’t just come down to a variant of, ‘an agent with better information/models will be able to generate positive returns against a forecaster with worse information/models’. Everyone agrees that correct Bayesian updating is necessary, and thus that a forecaster’s forecasts must trivially be martingales. To my understanding, that is actually just De Finetti’s derivation for subjective probability – the axioms are the only way to not get Dutch-booked!

      • Ahh I started this thread when a coworker sent it to me (with your tweets) with such hope of a truce. But, seems like its a false hand extended – specially after reading comments like “Quant finance apparatus isnt useful” (er, how would someone even validate something like this? Isnt the point here to bridge the two? Isnt the apparatus of martingales best developed in QF?)

        And seeing your tweets Aubrey. Like I said, let me know via email once you’ve read and understood (google Dambis Dubins Schwartz etc) and are willing to engage productively. Im not sure you’re ready to reduce the temperature and actually try to have a conversation – willing to engage when you are!

        • One last question, maybe… When you say

          “Essentially, if the price of the binary option varies too much, a simple trading strategy of buying low and selling high is guaranteed to produce a profit”

          do you really mean that when you buy “low” (whatever it means) you are *guaranteed* to have the opportunity to sell higher in the future?

        • Exactly, guaranteed, as in it’s logically impossible for it to be otherwise, probability = 1. That seems like a much stronger statement than is actually warranted.

          If he just said “by incorporating the patterns in variation in price I can come up with a trading strategy such that if my description of the pattern is reasonably good, and it appears to be, I will have a high probability of making money” then I think we could just all move on, because there’s nothing particularly controversial about that. More good/real information about the future = making money is widely understood.

  10. WARNING: toy model and pointless ramblings ahead

    The forecasting problem: it’s the 8:00 on January 1st. I give you a widget with a LED, which is off. Every midnight, the state of the LED may change. I ask you to give a probabilistic forecast of the state of the light on January 31st at noon. Every morning, you will have an opportunity to check the current state and update your forecast.

    On the last day the prediction will be easy, if it’s on (off) in the morning the probability that it’s on at noon is 100% (0%). In previous days you need some kind of model

    One simple model is to assume that there is correlation in the sequence of states and it flips each night with probability x. If x is less than 0.5 we will forecast that it’s more likely than not to end in the current state and our confidence gets higher as we approach the final day.

    The charts in https://imgur.com/a/4UbZWHN show the sequence of assigned probabilities produced by forecasters that use the model above with x=1%, 2% and 3% in five different scenarios. The predictions flip from one side to the other each time the state of the light changes. They end at 0 or 1, as there is no uncertainty left on the final day.

    The models do not seem very good, as they would expect on average one flip or less over the month, not several. But even if we don’t know anything about the model used, from the sequence of predictions we can sometimes say that the predictions seem to move too much. Whatever the model, the sequence of predictions should be a martingale. The expected value of the sum of the square of daily increments does equal the difference between the initial and final values of p*(1-p). The final value is 0 here, the last prediction in the sequence is either 0 o 1.

    The expected value of sum(delta^2) is 0.18 for the forecaster with the 1% model who gives the more confident forecasts. There is a clear bound on the maximum value that can happen with more than 5% probability (0.18*20). If we observe a value over 3.6 we can calculate a p-value<0.05, etc. (That sum is called "movement" in the Augenblick and Rabin paper mentioned in another comment, they mention that tighter bounds can be obtained but I've not really looked beyond the introduction yet.)

    The charts were generated from sequences of states that changed at midnight with probability 25%. I took five realizations at random, didn't The dashed lines show the predictions of a forecaster that had used the right model.

    This kind of test doesn't detect all the "bad" models but it detects "seemingly inconsistent" predictions (sequences with an unlikely high movement). A forecaster that gave a constant prediction of 50% (except on the final day) may not be good if the sequence of observed states is really informative but wouldn't be inconsistent. The movement would be 0.5^2, just as expected, whatever is the true data generating process.

Leave a Reply to fogpine Cancel reply

Your email address will not be published. Required fields are marked *