The article makes some strong claims about the statistical validity of their res...

shalmanese · on June 26, 2009

Counterintuitively, if the sample is truly randomly distributed, you gain very little additional information as you go beyond 300 samples. This is why every political poll has an error margin of + or - 3%.

jules · on June 27, 2009

Right, but that doesn't mean that 300 (or 3000) samples total is enough. You can't make the detailed map about burning the national flag with 3000 samples. More data is helpful until you have 300 samples per pixel.

Retric · on June 26, 2009

The real problem is most samples are not random. So, you are bound by the bias of your methods and you can't really get all that accurate. In theory when you double your sample size you do reduce your margin of error by a reasonable degree, but reality does not mesh until you start taking a large percentage of the population.

Think of it like a coin, that has a 1% bias you want the percentage to some accuracy (say 4 digits) how many flips do you need?. Now what if the problem is not the coin but the person doing the flipping. At some point more testers help more than more flips.

Eliezer · on June 26, 2009

Glad someone pointed this out.

jimboyoungblood · on June 26, 2009

The claim it makes is this:

And a word about statistical validity: the best questions on OkCupid have been answered over a million times. Therefore we have unique insights into the American mindset

Yeah, so OKCupid users aren't representative of the average American, but somehow I don't think a post titled "Rape Fantasies and Hygiene By State" is meant to be a serious exercise in statistics.

jrockway · on June 26, 2009

Why not? Are sex and (not) bathing somehow different from other topics in a way that is relevant mathematically?

byrneseyeview · on June 26, 2009

The way the post is marketed indicates that it's entertainment, not analysis.

cabalamat · on June 27, 2009

Can't it be both?

DannoHung · on June 26, 2009

Whether or not the data is statistically valid across the general populace may or may not be relevant to people who are concerned with the sample that is represented.

hamidp · on June 26, 2009

Arguing that OKC data is better than Gallup's (as the article implies) isn't a strong claim of statistical validity, it's ignorance of the basic principles of statistics.

pfedor · on June 27, 2009

It's more likely a joke. The guy has a mathematics degree from Harvard. http://www.okcupid.com/about-us/