When you mislay a certain something,
keep your cool, and don't get hot,
calculatus eliminatus is the best friend that you've got,
calculatus eliminatus always helps an awful lot,
the way to find a missing something is to find out
where it's not.
I don't think it is in the exit polls.
Fraud, I mean. Let me explain in one post what I have been trying to explain on many.
As most of you will know, I have been an exit poll skeptic for some time. My background, though in all sorts of odd things, is largely in social sciences, so while I was highly suspicious of Bush's apparent win (and devastated) last November, and seriously wondered whether the exit polls were an indication of massive fraud (especially in the light of the Ukraine story that followed close in its heels) I did not share the assumption of many DUers that the discrepancy could not have been due to polling bias. I am not a statistician as such (i.e. it is not my primary speciality) but I have had a rigorous statistical training, and use statistics heavily, daily. I also teach statistics at university level. And I know enough about statistics to know that the "margin of error" refers to "sampling error" - in other words, it is the margin of error that could have occurred simply by chance. If I measure my son's height today, and again tomorrow, I may find he's grown. And he sure is growing. But because he wriggles a bit, I have to allow for random wriggles. A jump beyond the margin of wriggles will mean he's grown. A jump (or a drop) inside the margin of wriggles will just indicate he's wriggling.
OK. So I knew that the discrepancy between poll and count wasn't chance. So what was it? It could have been fraud. It could have been bias in the poll - i.e. the poll might have had a biased sample. And I was perfectly prepared to believe in sampling bias. I meet it every day. You can do what you can to minimize it, but you can't avoid it. So when I read the E-M report in January, and saw that apparently the discrepancy WAS at precinct level (not because of poor selection of precincts) I realised the only alternatives were: fraud; or biased sampling of voters. And although the report was frustrating to read (short on geeky bits like standard errors, F values, probability estimates, degrees of freedom) the apparent findings that "redshift" was associated with precinct characteristics likely to make adherence to strict random sampling protocol difficult, I was convinced that sampling bias probably played a role. However, nothing in the E-M report indicated how much, except the assertion that the evaluation had determined that "non-response bias" (which CAN mean that one group of voters was more likely to agree to participate than the other, but can also mean that one group of voters was more likely to selected than the other) accounted for the discrepancy.
So far, so not very good. Unsatisfactory in fact. In order to establish whether non-response bias was sufficient to account for the exit poll discrepancy a number of analyses need to be done (and though I would like to say I'd thought of all these months ago, I didn't and they've been growing on me slowly - however, the first was obvious).
The first would be a proper test of the polling bias hypothesis. A multiple regression analysis needed to be done, in which
all the precinct/interviewer characteristics hypothesised to be contributors to the discrepancy were entered into the
same model, ie. not a series of separate analyses where WPE in one kind of precinct is compared with WPE in another. What needs to be done is an analysis that takes into account the fact that in some precincts several of these factors may be present together, and may even interact. If such a model could be shown to account for the discrepancy - or if, to put it differently, after accounting for factors likely to give rise to polling bias, there was still a residual discrepancy, then one might deduce fraud. And actually, even if they did account for the full discrepancy, one might also wonder whether some of the variables were proxies for fraud variables. So that wouldn't be conclusive, but it would be of interest. In my paper, a few months back, I called on this kind of analysis to be done.
But we also neeed fraud hypotheses to test. I am aware of two, both important.
1. This was a test of a hypothesis originally formulated by USCV (aka NEDA) as the "Bush Strongholds have more Vote-Count Corruption" hypothesis. This in itself seemed an odd hypothesis (why, a priori, would we expect more corruption in Bush precinct strongholds?). However, a later formulation expressed by USCV, and suggested to me by Josh Mitteldorf, was that if fraud was responsible for the exit poll discrepancy, you would expect a "bunching" of precincts with highly discrepant poll results at the "high Bush" end of the spectrum. In other words, precincts that ought, if the vote count had been honest, to have been in the centre of the spectrum where the distribution of precincts is fattest, would have been shifted Bush-wards by fraud. As the distribution of precincts by Bush's vote share is roughly bell-shaped (actually there are more moderately Bush precincts than moderately Kerry precincts, but more extreme Kerry precincts than extreme Bush precincts) then if fraud was randomly distributed across the spectrum - a thickish swarm of precincts from the middle of the plot should move to the right (literally and metaphorically) and give rise to a positive correlation between discrepancy in the poll and Bush's share of the vote.
Unfortunately this test was complicated by the fact that the traditional measure of exit poll discrepancy (WPE)does weird things in relation to the way the vote count is distributed. I devised a measure that I think does a better job - and Mitofsky performed the correlation. The hypothesis was not supported - there was no linear tendency for the discrepancy to be greater at the Bush end of the plot.
Plots can be viewed at the links in this DKos diary by HudsonValleyMark (Mark Lindeman):
http://www.dailykos.com/story/2005/5/24/213011/565We cannot, however, conclude from this null finding that fraud was not responsible for the total discrepancy. Simply that it does not seem to have been randomly distributed through the precincts. Maybe it was concentrated at the
Kerry end.
2. So here is a second hypothesis, this time formuated by ESI for Ohio. This hypothesis says: if fraud was responsible for the exit poll discrepancy, as well as for Bush's apparent increase in support in the election (the presumed purpose of fraud) then precincts with greater "redshift" in the poll ought to have a greater shift to Bush in the count. In other words the two effects should be correlated. but how do we measure Bush's gain? One way of doing it is to baseline it from 2000, a year in which the exit polls were relatively accurate (and therefore a year in which fraud, if it occurred, seems to have played relatively minor role in Bush's vote count - after all, Gore won).
ESI performed the correlation for Ohio, and found no association between what Brits call "swing" to Bush and redshift in the poll. But the trouble is that states, in exit poll terms, are small, and therefore do not give you a lot of statistical power. And to demonstrate a null you need A LOT of statistical power. In fact you can never demonstrate a null. What you can do instead is to demonstrate that if there was an "effect" it was, to a given degree of probability, less than a certain size. And the power in the Ohio study would have left fairly wide confidence limits for the "true" association between redshift and swing to Bush. (We have yet to see a proper peer-reviewed report of the ESI study - it is apparently in the pipeline).
So Mitofsky repeated the ESI analysis on the entire dataset, and it was the results of this analysis that he presented at the debate with Steve Freeman in Philadelphia last week. He presented it in the form of two scatter plots, which are posted
here, by Mark Lindeman, with some informative text. If you click on the plots, you can examine them more closely.
What they indicate is that there is no discernable association between redshift and Bush's performance relative to 2000 (remember, a year in which the exit poll was fairly accurate, although there was a small net red-shift). Confidence limits are not given, but geeks among you can ballpark it given the precinct N which is 1250. The limits are fairly tight.
So what's with the Cat in the Hat?
I DON'T want to prove that Bush won a fair election. I WANT to prove that he didn't. But I think it is very hard, given that plot, to see view fraud as the explanation for the exit poll discrepancy. But we don't NEED to demonstrate that fraud was the cause of the exit poll discrepancy. We need to demonstrate that fraud was the cause of Bush's victory. Actually, as far as I'm concerned, we don't need to demonstrate even that - what we need to do is to demonstrate that he did not win a fair race. And he didn't. The race was unfair from beginning to end, from the moment he stepped into his Poppy's size nines, to the attack ads, to the lies about WMD, to the felon purges, to the voter suppression tactics, to the rationing of voting machines to Democratic precincts in key states, to the monkeying about with regulations on voter challenges, to the monkeying about with provisional ballot regulations, to the refusal to expedite a recount in Ohio, to the media mockery of those who doubted as "tinfoilers" (a new word to me) - and maybe to the electronic corruption of the actual vote, made absurdly possible by the absurdly insecure software installed on the voting machines.
But if we want to find that certain something - the smoking gun, the evidence that Bush, far from spreading freedom and democracy is the president of a democracy only in name - we need to FIND OUT WHERE IT'S NOT.
And if it's in the exit polls, it's bloody well hidden.
On edit: link added, and I should also say that my name is Elizabeth Liddle, for those who don't know, aka Lizzie.