It has not escaped my notice that many people are puzzled as to how exactly polls go about determining “likely voters” (LVs). There’s a good reason for this: polling firms or sponsors rarely put much effort into explaining, clearly and precisely, the mechanics of how they select these LVs.
So, as a public service, here’s how they do it. Let’s start with Gallup. According to David Moore of Gallup:
Gallup asks each [RV] respondent seven LV screening questions, and gives each person an LV score of 0 to 7. [Assuming a turnout of 55 percent], the top 55% are classified as likely voters.
In practice that typically means all of the “7”s–given full weight–plus some proportion of those with lower scores (usually the “6”s), who are weighted down so that the size of the likely voter sample matches the projected turnout for the year (apparently 55 percent this year). All other voters are discarded from the sample.
What are the Gallup likely voter questions? Unfortunately, the exact questions and their wording are not released by Gallup along with their polling data, but the questions apparently involve past voting behavior, interest in the election, intention to vote in the election and knowledge of things like the location of the local polling place.
That’s how Gallup does it. What about other organizations–do they select likely voters in the same way? Nope, they don’t. CBS News doesn’t use a cut-off model, where low-scoring respondents are thrown out altogether, but instead includes everyone in their RV sample, in some form, in their LV sample. They do this by asking respondents a series of voting-related questions and then assigning each respondent a weight based on their score on these questions, from very high weights for high-scoring respondents to very low weights for low-scoring respondents.
Finally, by far the most common way is simply to ask a few screening questions and then terminate the interview with those respondents who give the “wrong” answers. Or only one question; some likely voter screens are as simple as asking an RV how likely they are to vote in the upcoming election; if they don’t say “almost certain” or “probably”, out they go.
So that’s how they get the likely voters in the polls you read about. How do they know that likely voters, months before the election, are actually the voters who will show up on election day? They don’t.
Here’s David Moore from Gallup again: “We simply do not know, nor can we know, which model is better during the campaign itself. ” Exactly. So why does he think the Gallup LV model works so well months and months before the election. Because “if it is the most accurate model just before the election, it is probably the most accurate during the campaign as well”.
But that doesn’t follow at all. The Gallup LV model could work perfectly right before the election (not that it really does, but that’s another discussion) and still be quite a biased instrument earlier in the campaign. Pretty much by definition, Gallup’s LVs months before the election are not the same voters as Gallup’s LVs right before the election, since voters answer the LV questions differently at different stages of the campaign. And if there is any kind of partisan dimension to “tune-in”, so that, say, Democratic partisans or groups that lean strongly Democratic (like minorities) tend to tune in later, that means the LV model will have a systematic tendency to, on average, favor the party (the Republicans) whose partisans or groups tune in the earliest.
Of course, my hypothesis here about Gallup LV bias might be completely wrong. But to evaluate it, Gallup would have to make available the demographics and partisan breakdown of the both its RV and LV samples for the polls it releases plus, ideally, the results (including demographics and partisan breakdowns) of the various screening questions it uses. I’m not holding my breath.