Rebecca sent in this example of a common statistical error:

Public opinion about the project seems guardedly supportive, with a majority of residents saying they favor it, though more than a quarter want its size to be reduced. The polls, taken for a local newspaper, use small samples, 500 people, limiting their usefulness as a gauge of popular sentiment in a city of one million.

Actually, if it’s a random sample, then it’s not a problem that the sample size is only a small fraction of the population size.

Indeed, one should trust this poll (if random) much more so than the TV poll with many thousands in the sample but self-selected

Interestingly, the offending language seems to have been removed from the story as now posted. But the offending language is in the paper version of my morning Times.

I don't see the second line of the above quote in the original article, the line with the mistake. Perhaps it was removed after publication?

Another example: here.

[Self publicity] Which I blogged about here

When is a small n a problem here? Is 50 okay? Any sources to look to on this?

Could someone disabuse me of my statistical ignorance here?

I thought that the standard formula for the error variance of a sample mean as a measure of a population mean was ((1-f)/n)*Variance (where f is the sampling fraction, and n is the sample size).

Doesn't the fact that this is decreasing in f (and n) suggest the sampling fraction does limit the usefulness of the small sampling fractions? What if the sample was a random sample of 1? Surely that's not very useful as a measure of sentiment?

Conchis,

Your formula is correct. My point is that once f is small, it doesn't really matter how small it is. For example, a sample of 500 out of 5000 gives 1-f=.9, a sample of 500 out of 500,000 gives 1-f=.999. These aren't really so different. The formula still works even as the sampling fraction approaches zero.

The dependence on n is more of a big deal. When n=1, your variance will be pretty big, as you can see from the formula.

"…if it's a random sample, then it's not a problem"

_____________

Hmmm…. exactly what would be the problem with 'non-random' samples in "opinion polling" ?

Most well-known polls by American news organizations rely upon ~1,000 distilled national telephone-samples.

Due to high non-response-rates (typically >60%) — many more than 1,000 contacts must be attempted to achieve that desired sample size.

A 60% non-response-rate would seem to indicate a non-random sample; however, professional pollsters say they adequately compensate for that anomaly by use of various scientific weighting & stratification techniques.

[..folks over at Pollster.com frequently address this issue: (http://www.pollster.com/blogs/cell_phones_and_political_surv.php#more) ]

_________

Is a random-sample 'always' required for valid statistical sampling ?

If not, what are the basic process alternatives ?

Stratton,

I said "if", not "only if"!

They printed a correction! (See the bottom of the page). Perhaps b/c of this blog?

Thanks for the clarification Andrew.

'Hmmm…. exactly what would be the problem with 'non-random' samples in "opinion polling" ?'

Dewey Defeats Truman?

The Times story now has this correction at the bottom of the article:

Correction: July 7, 2007

An article on Thursday about a German backlash against plans for a mosque in Cologne, known for its Gothic cathedral, referred incorrectly to the size of polls taken for a local newspaper there, assessing the popularity of the mosque. The sample of 500 people was sufficient for a scientific poll; that sample was not “small,” nor did its size limit the poll’s “usefulness as a gauge of popular sentiment in a city of one million.”

A very small, non-random sample of a population could still reveal accurate overall data about that target population.

Polling 30 people today at your local supermarket might perfectly predict the 2008 U.S. Presidential election results. But the fundamental problem with non-scientific sampling is that one can not be sure how accurate the result is.

Andrew correctly stated the sample size issue, but ducked the issue of a mandatory random sample requirement for scientific statistical sampling.