My thanks to both of you for your thoughtful responses. I'm no big expert on polls so my response is based partly on impressions I've formed without being 100% sure of them. I should've gone to bed hours ago, though, so I'll have to content myself with this.
DCKit emphasizes the charge that the pollsters have an incentive to cheat in McCain's favor, so as to keep selling poll results. My guess is that the media outlets buying poll results or commissioning polls or whatever will keep doing so regardless of whether the race is close. You think that, if Obama showed a 15% lead in August, the media would decide to stop reporting poll numbers? Of course not. The MSM
vastly prefer reporting poll numbers and other "horse-race" aspects, as opposed to analyzing the substance of public issues. A pollster would best help its business by building a reputation for accuracy, not by trying to hype a close race.
I agree with DCKit that election fraud is a huge concern. There are many legitimate arguments to be made on that score, about EVM companies and Republican officials. To go further and say that all the major polling firms are also in on the conspiracy (with the assigned role of deliberately lying about pre-election poll results, on a large scale) is a little too TFH for me, though.
Assuming for the sake of the argument that the pollsters are sincerely trying to get it right, they still have two problems: They're taking only a small sample, and they can't ensure randomness.
On that first point, Nance points out that "poll participants are invariably a tiny fraction of the voting public." That's true. Nevertheless, the fraction isn't so tiny that two Hispanic responders might represent the whole Hispanic community. A typical national poll of the Gallup - Zogby - CNN variety samples about 800 to 1,200 people (either registered voters or likely voters, depending on the methodology). They'll have more than two Hispanics.
Sure, there can still be errors. Even if they were able to sample with perfect randomness, it would be possible to pick a random sample of 1,000 people, half of whom happened to be Bob Barr supporters. It's just unlikely. Exactly how unlikely any error is can be determined with fair mathematical precision, given the assumption of randomness in sampling. It turns out that the size of the sample, expressed as a fraction of the total universe, isn't very important. If a pollster gets responses from 1,000 New Hampshireites about their Senate race, and a different pollster gets responses from 1,000 Americans about the presidential race, the figures for Shaheen and Obama will be about equally reliable. The key to the "margin of error" of a poll is to predict the standard deviation of the population. (Yeah, I know, some of them are pretty deviant. Thank you, Groucho. Let's move on.)
For these reasons, I discount these two problems. First, I think it's reasonable to assume that the pollsters strive for accuracy. Second, there's mathematical basis for believing they'll achieve accuracy if they can get a truly random sample. That leaves the biggest problem -- the possible bias in sampling.
Nance writes:
You've stated that "Experienced pollsters can compensate for these problems, at least to some extent," and therein lies the problem. How do you 'compensate'? How do you base assumptions on a complete lack of data?
They don't have a complete lack of data. They have demographic information about the population and about past voting trends. For example, let's say that, based on Census data and on historic turnout, we conclude that 25% of the votes will be cast by people below a certain age. We then look at our sample and find that only 22% of it is below that age. (This is what would happen if, for example, younger people are more likely to have cell phones and no land lines.) I think the standard procedure is to give greater weight to the responses of those 22%. Similar fixes are applied for factors like gender, income level, etc. It's not perfect, because young people with land lines aren't necessarily representative of young people as a whole, but it's a lot better than what you'd get if you just recorded the responses of the first thousand people who answered their phones and reported those raw numbers.
So is polling "a flawed non-science", as Nance argues? It certainly doesn't have the precision of rocket science, where a very tiny percentage error would mean that the spacecraft either crashes into Titan or flies right past it. Success means entering orbit, which requires accuracy well to the right of the decimal point. That will never happen with polling. On the other hand, it has the basic characteristics of a science, namely that hypotheses can be formulated, subjected to experimental testing, and discarded if found false. Polling today is much more accurate than in, say, 1936, the year of the famous
Literary Digest poll predicting an Alf Landon victory. Polling isn't rocketry but it's also better than an educated guess.
What would throw the polls off most significantly would be the presence of novel factors that they haven't yet been able to adjust for based on experience. The most obvious possibility, in my mind, would be that, this year, the long-heralded explosion in the youth vote finally occurs. Pollsters try to identify likely voters based on past trends. It's possible that Obama really will inspire huge turnout gains among young people. If so, then some polls could correctly predict McCain's percentage among each age group, yet overestimate McCain's percentage of the total. Of course, we've heard this before. I think there was an
Onion headline about the 2004 election: "Young People Totally Intended to Vote in Record Numbers". You have to figure, though, that Obama has a better chance of pulling this off than Kerry did. Even those much-maligned polls showing that the percentage of support for the two major candidates is close also show that the percentage of
enthusiastic support is a blowout for Obama. That certainly translates into those lawn signs that Nance sees. We can hope it will translate into turnout as well.
So, here's my prediction:
The election will be roughly comparable to 1980 (Carter vs. Reagan). The people are unhappy about the way the country is going, especially the economy. They want change. They're ready to vote out the incumbent party. Nevertheless, they have some doubts about the challenger -- about whether he's really Presidential caliber, especially at a time of foreign crisis. That concern will keep public opinion close until late October, and the polls will reflect that. Over the course of the campaign, though, the people will become more familiar with the challenger. They'll see him in televised debates, holding his own (or better) against his supposedly more knowledgeable opponent. They'll overcome their initial worries about him and decide that he can be trusted with the Presidency. The polls in the last week before the election will reflect those opinions, showing a significant shift away from the incumbent party, and the challenger will win with a popular-vote margin in the range of eight to ten percent.
No analogy is perfect. McCain isn't the incumbent seeking re-election, as Carter was, but I think McCain's efforts to distance himself from Bush will largely fail. The more important difference is that election fraud is a much bigger concern this year. Some votes
will be stolen. The lesson I draw from the closeness of the polls is not that the pollsters are incompetent or are lying, but rather that we can't count on having that eight-to-ten percent cushion, which would probably be cheatproof. If Obama does indeed now have a slight lead, and if he doesn't benefit from a huge shift in his direction, then his lead on Election Day might still be small enough that it could be wiped out by fraud. And, yes, I'm now wishing I'd paid more attention to this problem beginning in 2004, but that's a topic for another post.