If you don’t want to read the whole thing, skim down about halfway to the table that says “polling bias is not very consistent from cycle to cycle”.

]]>https://quadrant.org.au/opinion/qed/2019/06/election-fraud-and-the-aec/

https://morningmail.org/electoral-rorting/ ]]>

What you want to look at is the probability that you assigned to the outcome that happened. So if you assign 100% to “heads” and you get a tails… you assigned 0 to that, and it should count against you.

If you always assign 100% and you always are right.. you’re obviously doing a good job, you have a lot of information, and you need very little correction… log(1) = 0 so you have 0 bits of error per prediction.

suppose you assign 75% to the outcome that occurs each time..

-log(.75)/log(2) = .415 bits of error per prediction.

suppose you assign 1 to 50% of outcomes, and you assign .5 to the other half of the outcomes..

.5 * log(.5)/log(2) + .5 * log(1)/log(2) = .5, you have about half a bit of error per prediction.

so I think what you want is 1/N * sum(-log(p(actual_outcome[i]))/log(2)) for the average bits of prediction error

]]>There’s probably some metric already well defined, but intuitively a perfect model predicts a long string of bits without any “additional” information, and the worse the model is, the more “correction bits” you need to get from the prediction to the right answer… If you know already how to define such a thing, let me know, otherwise you might think about it and figure it out, or I can come back to it maybe monday.

]]>We are in the process of opening everything up including data and model. If you’re in Australia then I’m presenting a paper about the model on 2 and 3 December in Monash and the ANU, respectively.

Always keen to involve anyone who is interested, just get in touch.

Rohan

]]>1 0 1 1 0 1 0 0 1 1

.8 .2 .8 .8 .2 .8 .2 .2 .8 .8

Every prediction is in the same sense that your “.9 .1 .9 .9 .1 .9 .1 .1 .9 .9” but one of them is more certain, the other less certain.

The first one is clearly the best… if the outcome is actually 1011010011

But what if the outcome doesn’t always match the favoured prediction? Say they get seven right and three wrong.

Would you say that the model doing predictions with certitude is better, because it makes “better predictions” (converging to 100%, as you said) in most cases?

]]>1011010011

Now clearly 60% of these occurred…

Now suppose prior to finding out what happened I had two models, one of them gave probability to occur (give a 1 result) of:

.9 .1 .9 .9 .1 .9 .1 .1 .9 .9

the other gave

.58 .58 .58 .58 .58 .58 .58 .58 .58 .58

clearly the first one is off by frequency… The expected number of outcomes is .9*6+.1*4 = 5.8

the other one obviously is off by frequency…. but its expected number of outcomes is 5.8 as well…

The frequency error in both of these is -.02 which seems really close, so perhaps they’re both basically just as good as each other?

Now, what happens if we bet on the first model? Specifically, suppose when you win you get .4 dollars, and when you lose you get -.6 dollars

What would your outcomes have been? Under the first model, you’d have won .4*10 = $4

Under the second model you’d have won 6 of the times, and lost 4 of the times…

0.4 * 6 – 0.6 * 4 = 0

From the perspective of the *purpose* of a prediction market, the goal should be to predict accurately… we should measure goodness in terms of information content… the second model here has essentially no information content, other than a good approximation of the ultimate frequency. The first model has a LOT of information content, it’s almost completely accurate at each event.

]]>What criteria do you propose then to evaluate prediction markets?

Say that for a series of 100 football matches two different prediction markets A and B make the same predictions, and get it right 80 times, but with different levels of certitude:

A) predicts the winner for each match as a sure event (100% certitude)

B) predicts the winner for each match as a likely event (75% certitude)

Is prediction market A better?

Do you think that any prediction market can make better job predicting simply by rounding to 0% or 100%?

]]>Fundamentally, every prediction market question is *always* an N=1 event. There can be no question of “calibration” all we have is, did you put high credence on the event that occurred, or not. It’s a question of whether there existed good information in the world that was aggregated in a good way, or whether either the information didn’t exist, was highly biased, or didn’t get aggregated well.

]]>when you place 92% credence on an outcome and instead an outcome with 8% credence occurs, you have done a poor job predicting

Suppose one makes N probabilistic event predictions of 92%. To be calibrated, y ~ binomial(N, 0.92) of those events should obtain. If N = 100 and y = 100 or y = 80, you have a good reason to suspect your estimate of 92% is off. But in this case, we only have 0 ~ binomial(1, 0.92). We can’t even reject that forecaster by the p < 0.05 criterion.

]]>I don’t know about Australia. In the U.S. we looked at state polls over several elections and didn’t find any systematic error toward either party. But in any given election there is error. We found non-sampling error to be of the same order of magnitude of sampling error. So I think it should be easy to find errors in both directions, and it could be that some errors are more salient to you, or to some other observers.

]]>Fundamentally, the purpose of a prediction market is Bayesian, and when you place 92% credence on an outcome and instead an outcome with 8% credence occurs, you have done a poor job predicting.

Obviously prediction markets will not always converge to 100% on the correct outcome seconds before the outcome is revealed, but the way we should evaluate them is the extent to which they approximate that… If they converge to 50/50 at least they are honest about no one having any idea what will happen. if they converge to 92/8 and the 8 happens… it’s an indication people have done a poor job evaluating the question.

]]>Also, there is the possibility that polling itself has become unreliable technique, as those polled may be refusing to engage or outright lie. Any phone calls I get that start talking about a survey I cut off immediately, because so many “polls” are push-polls, disguised slanders. Or are loaded questions intended to be passed off as popular disapproval or support.

But, overall, given enough elections, isn’t it likely that even statistical projection with a properly estimated margin of error will still fail?

]]>Check out this track record:

https://electionbettingodds.com/TrackRecord.html

Looks fine to me

]]>