On Interpretting Polls
by Lost Thought, Wed May 28, 2008 at 04:57:40 PM EDT
Disclaimer: The following is pure conjecture from someone who knows naught about anything. Any facts or insight are completely coincidental
There is so much speculation about general election polls, that I felt it was necessary to throw in my 42 cents on how to read polls.
First, lets start with the basics. Every poll has an inherent margin of error. This is a purely statistical measure, and usually asserts that if you ran the poll exactly the same way at exactly the same time, but with a different random sample, 95% of the time the results of the new poll would be within the margin of error of this poll. Everyone knows this. I mention it only because it is important to note the temporal aspect of the margin of error, and that this is the last measure I will mention which is mathematically valid.
But there is a nasty aspect of polls, which is that damn undecided vote. Keith Olberman has tried to capture this aspect of polls in his so called "Keith Number", which simply adds the undecided voters to the margin of error. The thought process on this is simple: there are no undecideds at the ballot box, so they must vote for someone. Now, statistically, this is complete bullshit. Undecided voters can, and do, remain undecided by not voting. Not to mention that by these calculations, the margin could be widened by as much as 2 times the number of undecideds.
So the math behind the Keith Number needs to be refined, but that's not the point. The point is we are now measuring the Uncertainty of a poll, which is above and beyond the margin of error. We are measuring the extent to which a poll cannot capture the outcome at the ballot box, no matter how perfectly it is run. We are admitting that we simply do not know, to some extent, what will happen.
Now for my contribution to this little thought experiment. What I propose when trying to interpret polls, is to add an addition temporal factor to the Uncertainty. For the sake of argument, make it plus or minus 1% for every week between now and the election. The math isn't important, pick whatever numbers you want. The fact we want to capture here is that the poll is taking the pulse of people's opinions today, but the election does not happen today, and things tend to happen during the passage of time. This number takes into account the fact that competent politicians can realistically move the electorate at a rate of about 1% a week.
This additional 1% week is in addition to the Keith Number, which includes the polls margin of error. So lets test this out on a completely hypothetical poll I will make up right now.
Lets say we are 3 months (12 weeks) before an election, and we get a poll that looks like the following
Candidate A: 60%
Candidate B: 30%
Margin of Error: 4%
That looks like a massive 30% win for candidate A. Candidate B should just give up, right? Well, lets start with the Kieth Number, Undecided + Margin of Error = 14%. Even in the most optimistic reading still has Candidate B losing 44% to 46% (note: the undecideds were used twice: as I said, mathematical bullshit). But lets add in our new temporal term, 1% per week, or 12%, for a total Uncertainty of 26%. Are things still dim for Candidate B? Oh hell yeah, but now there is a reasonable chance for optimism: There is now a scenario where you can read the poll and have him coming out ahead.
But when we look at the poll, what isn't important is that Candidate B has a narrow shot. What's important is that the poll now has a margin of plus or minus over 25%! Essentially, the poll becomes useless as a predictor, because the only way you can guarantee the outcome is if the original margin was more than 50%!
Am I saying, in a really really roundabout and overly complicated way, that all polls this far out are useless? Kindof, but not necessarily. If you see a poll with a very small number of undecideds and a huge margin between the candidates, you can probably take that as a predictor. What I really want you to take away from this is that polls this far out are almost always useless, and a pseudo mathematical argument for why they are useless. Not only that, but this bullshit math will allow you to see the same poll results, taken at different times, and allow you to draw completely different conclusions from it without feeling like a complete idiot. Just a partial idiot. And who can ask for more than that?
Oh, and Candidate B lost.