I don’t know if it’s your conservative bias, but it’s you: https://fivethirtyeight.com/features/the-state-of-the-polls-2019/

If you don’t want to read the whole thing, skim down about halfway to the table that says “polling bias is not very consistent from cycle to cycle”.

]]>Multiple voting, false enrolments, fraudulent how-to-vote ads, party “advisers” permitted to “assist” voters in nursing homes, etc.,… among a compliant, negligent public, and a politically infiltrated electoral commission and skewed judiciary… Your point is valid. The Australian Electoral Commission is even presently trying to defend, in court, its inaction regarding ads by Liberals that were admittedly designed to look like AEC ads, with no party branding, telling Chinese voters that the “correct way to vote” is to Vote 1 liberal. The AEC is siding with the Liberals in this case. 1000s of multiple votes each election the commission admits to; because Australia has no identity checks when voting, and the Commission does not validate addresses anyway.

https://quadrant.org.au/opinion/qed/2019/06/election-fraud-and-the-aec/

https://morningmail.org/electoral-rorting/

I think what I’m proposing below is the surprisal per prediction… that makes sense to me

]]>But this also shows that you *shouldn’t* round off to 1/0 as you asked… because you *will* get some error, and then you’ve got infinite bit cost. From this perspective it makes some sense to use a prior that excludes certainty/zero, like a beta(1.1,1.1) rather than something like the standard beta(1,1) for inference on a binomial outcome…

]]>Breaks down quite a bit if you assign p=0 to the actual outcome obviously, as this costs you infinite bits, but the point is that this does indicate a big error… and again that assigning say 0.08 to the actual outcome is a bad situation… it costs you 3.6 bits

]]>Daniel, cross-entropy (or log-loss) is a popular metric. The Brier score is another. But if you don’t like calibration you may not like them either.

]]>see comment here: https://statmodeling.stat.columbia.edu/2019/11/09/australian-polls-failed-they-didnt-do-mister-p/#comment-1160696

What you want to look at is the probability that you assigned to the outcome that happened. So if you assign 100% to “heads” and you get a tails… you assigned 0 to that, and it should count against you.

If you always assign 100% and you always are right.. you’re obviously doing a good job, you have a lot of information, and you need very little correction… log(1) = 0 so you have 0 bits of error per prediction.

suppose you assign 75% to the outcome that occurs each time..

-log(.75)/log(2) = .415 bits of error per prediction.

suppose you assign 1 to 50% of outcomes, and you assign .5 to the other half of the outcomes..

.5 * log(.5)/log(2) + .5 * log(1)/log(2) = .5, you have about half a bit of error per prediction.

so I think what you want is 1/N * sum(-log(p(actual_outcome[i]))/log(2)) for the average bits of prediction error

]]>Australian elections are run by the Australian Electoral Commission and are widely regarded as very reliable. Pacific Islanders shifted as a group, particularly in Queensland, and that may have been missed by the pollsters’s samples.

]]>I’ve been thinking about what the proper information metric is, but it being a weekend and a lot of other stuff going on I haven’t come up with it. Intuitively for the binary case I want to do something with log base 2 of the error…

There’s probably some metric already well defined, but intuitively a perfect model predicts a long string of bits without any “additional” information, and the worse the model is, the more “correction bits” you need to get from the prediction to the right answer… If you know already how to define such a thing, let me know, otherwise you might think about it and figure it out, or I can come back to it maybe monday.

]]>We are in the process of opening everything up including data and model. If you’re in Australia then I’m presenting a paper about the model on 2 and 3 December in Monash and the ANU, respectively.

Always keen to involve anyone who is interested, just get in touch.

Rohan

]]>By the way, the coment above was also me.

]]>Say there are two additional models giving predictions with the following probabilities:

1 0 1 1 0 1 0 0 1 1

.8 .2 .8 .8 .2 .8 .2 .2 .8 .8

Every prediction is in the same sense that your “.9 .1 .9 .9 .1 .9 .1 .1 .9 .9” but one of them is more certain, the other less certain.

The first one is clearly the best… if the outcome is actually 1011010011

But what if the outcome doesn’t always match the favoured prediction? Say they get seven right and three wrong.

Would you say that the model doing predictions with certitude is better, because it makes “better predictions” (converging to 100%, as you said) in most cases?

]]>Imagine for example that I have 10 binary outcomes, they turn out to happen as:

1011010011

Now clearly 60% of these occurred…

Now suppose prior to finding out what happened I had two models, one of them gave probability to occur (give a 1 result) of:

.9 .1 .9 .9 .1 .9 .1 .1 .9 .9

the other gave

.58 .58 .58 .58 .58 .58 .58 .58 .58 .58

clearly the first one is off by frequency… The expected number of outcomes is .9*6+.1*4 = 5.8

the other one obviously is off by frequency…. but its expected number of outcomes is 5.8 as well…

The frequency error in both of these is -.02 which seems really close, so perhaps they’re both basically just as good as each other?

Now, what happens if we bet on the first model? Specifically, suppose when you win you get .4 dollars, and when you lose you get -.6 dollars

What would your outcomes have been? Under the first model, you’d have won .4*10 = $4

Under the second model you’d have won 6 of the times, and lost 4 of the times…

0.4 * 6 – 0.6 * 4 = 0

From the perspective of the *purpose* of a prediction market, the goal should be to predict accurately… we should measure goodness in terms of information content… the second model here has essentially no information content, other than a good approximation of the ultimate frequency. The first model has a LOT of information content, it’s almost completely accurate at each event.

]]>> Obviously prediction markets will not always converge to 100% on the correct outcome seconds before the outcome is revealed, but the way we should evaluate them is the extent to which they approximate that…

What criteria do you propose then to evaluate prediction markets?

Say that for a series of 100 football matches two different prediction markets A and B make the same predictions, and get it right 80 times, but with different levels of certitude:

A) predicts the winner for each match as a sure event (100% certitude)

B) predicts the winner for each match as a likely event (75% certitude)

Is prediction market A better?

Do you think that any prediction market can make better job predicting simply by rounding to 0% or 100%?

]]>My experience have been that young people are the least likely to respond to surveys. However, if these samples are quota sampling young people then I would expect the more educated young people to be over-sampled then the less well educated young people. But I would have expected them to be more conservative then the lesser educated young people which would make the latter more likely to be working class and so vote Labour.

]]>This is exactly by the “calibrated” (frequency) criterion that I rejected though. Those other events have *nothing* to do with the event in question, and the people involved in betting on them are totally different people with different information.

Fundamentally, every prediction market question is *always* an N=1 event. There can be no question of “calibration” all we have is, did you put high credence on the event that occurred, or not. It’s a question of whether there existed good information in the world that was aggregated in a good way, or whether either the information didn’t exist, was highly biased, or didn’t get aggregated well.

]]>when you place 92% credence on an outcome and instead an outcome with 8% credence occurs, you have done a poor job predicting

Suppose one makes N probabilistic event predictions of 92%. To be calibrated, y ~ binomial(N, 0.92) of those events should obtain. If N = 100 and y = 100 or y = 80, you have a good reason to suspect your estimate of 92% is off. But in this case, we only have 0 ~ binomial(1, 0.92). We can’t even reject that forecaster by the p < 0.05 criterion.

]]>Eric:

I don’t know about Australia. In the U.S. we looked at state polls over several elections and didn’t find any systematic error toward either party. But in any given election there is error. We found non-sampling error to be of the same order of magnitude of sampling error. So I think it should be easy to find errors in both directions, and it could be that some errors are more salient to you, or to some other observers.

]]>I disagree, the point of a prediction market is to aggregate information about a specific event. It’s not the make sure that the p assigned to a each of a wide range of events on average is equal to the frequency with which the events occur.

Fundamentally, the purpose of a prediction market is Bayesian, and when you place 92% credence on an outcome and instead an outcome with 8% credence occurs, you have done a poor job predicting.

Obviously prediction markets will not always converge to 100% on the correct outcome seconds before the outcome is revealed, but the way we should evaluate them is the extent to which they approximate that… If they converge to 50/50 at least they are honest about no one having any idea what will happen. if they converge to 92/8 and the 8 happens… it’s an indication people have done a poor job evaluating the question.

]]>That certainly seems to be the case in the UK. Conservatives did better than polls expected in 1992 and 2015 General Elections. While I’m no expert I believe the problem is partly about differential non-response bias, with young highly educated people who are disproportionately unlikely to vote Conservative being more likely to respond to surveys. All that said, in the most recent election in 2017, Labour did better than most pollsters expected. (From my left-wing perspective I think the polls are over interpreted by left-wing media and commentators due to some kind of wishful thinking.)

]]>Also, there is the possibility that polling itself has become unreliable technique, as those polled may be refusing to engage or outright lie. Any phone calls I get that start talking about a survey I cut off immediately, because so many “polls” are push-polls, disguised slanders. Or are loaded questions intended to be passed off as popular disapproval or support.

But, overall, given enough elections, isn’t it likely that even statistical projection with a properly estimated margin of error will still fail?

]]>Check out this track record:

https://electionbettingodds.com/TrackRecord.html

Looks fine to me

]]>