Measures Of
Difficulty In Election Polling.
By
Joseph Shipman, PhD
Director of Election Polling, SurveyUSA
And
Jay H. Leve
Editor, SurveyUSA
Abstract: Some elections are harder to forecast than others.
Therefore, it is not fair across disparate elections to calculate one
pollster’s average “error” and compare it to another’s, since one pollster may
have polled disproportionately “easy” contests and the other may have polled
disproportionately “hard” contests. Data from 104 polling organizations in 120
statewide November 2004 election contests are analyzed. The 15 most active
statewide pollsters are ranked. Two measures of pollster performance that
take “degree of difficulty” into account are developed. One measure borrows
from golf, and assigns a “par” score for each different type of contest
(President, State Office, Ballot Measure). Pollsters are judged to have scored under
par or over par. The most active 15 pollsters are ranked on a Leader Board. A
second measure borrows from Olympic Figure Skating, and assigns an ordinal rank
to the accuracy of each poll that a pollster conducted, relative to all of the
other polls conducted in the same contest. The average of each pollster’s
ordinal rankings is then compared to determine, using this approach, which statewide
pollsters were Medalists in 2004.
Contents:
1.
Introduction
2.
Methods
3.
Measuring The Accuracy
Of An Individual Poll
4.
Average M5 Error
As A Measure Of Pollster Performance
5.
Problem:
Pollsters Don’t All Poll The Same
6.
Some Types Of
7.
“Par” As A
Benchmark For Pollster Performance
8.
Problems With “Par”
9.
A Rank-Order
Based Measure For Pollster Performance
10.
Pollster
Performance Using Rank-Order Measure
11.
Conclusion
1. Introduction
Various measures have been
proposed for evaluating the accuracy of individual election polls.
But the problem of evaluating fairly the performance of election pollsters
is more complicated, because pollsters don’t all poll the same contests.
Olympic athletes are judged
not only on the precision of their performance but on its inherent difficulty.
Some election contests present more of a polling challenge than others. In this
paper we investigate how to account for “degree of difficulty” when evaluating pollster
performance.
After selecting a measure of
poll accuracy, we construct two “difficulty-adjusted” measures of pollster
performance, an “absolute measure” (which adjusts for the difficulty of each
type of election), and a “relative measure” (derived from the rank-ordering of
polls by accuracy for each individual contest).
2. Methods
The 2004 General Election was
Tuesday 11/2. We include in our examination all statewide general election
contests for which at least one predictive poll was released between 10/1/04
and the time voting started on 11/2/04, counting only the final poll conducted
in each state by each research organization.[1]
There were qualifying Presidential
polls in 45 states (no qualifying poll in AK, DE, ID, MS, WY), qualifying U.S.
Senate polls in 28 states, qualifying Gubernatorial polls in 10 states, polls
for another statewide office in 3 states, and polls for 34 statewide ballot
measures in 12 different states – a total of 120 distinct election contests.
For these 120 contests, 446 polls were conducted by 104 different research
organizations.
Summarizing, our universe is:
§
120 election contests,
in which
§
446 election polls
were conducted by
§
104 different pollsters
Information on election polls
was obtained and triangulated from 3 websites:
In the few cases where these
sources were not in agreement, an individual pollster’s website was searched,
and/or www.nexis.com was searched. To
qualify for inclusion, a poll must have been conducted in an entire state,
provided information on Margin of Error or sample size, provided information on
field period, and predicted the outcome of a statewide contest. If more than
one set of percentages was given (some pollsters release separate numbers for “Registered
Voters” and “Likely Voters”), the numbers for “Likely Voters” were used. Some pollsters
released separate results for President with and without Independent candidate
Ralph Nader. In states where Nader
appeared on the ballot, the numbers which included Nader were used; in states
where Nader did not appear on the ballot, pollster numbers which excluded Nader
were used. Information on actual election returns was obtained in each state
from the Secretary of State or the State
Data was compiled by seven
SurveyUSA employees. Every effort was made to include all qualifying statewide
polls; no known polls were excluded. The database of polls underlying this
paper was released by SurveyUSA on
3. Measuring The Accuracy Of An Individual Poll
After the 1948 presidential
election, a commission was formed to study the failure of polls to predict
Truman's reelection. The resulting book, The Pre-election Polls of 1948:
Report to the Committee on Analysis of Pre-Election Polls and Forecasts, by
Frederick Mosteller et al, was published by the Social Science Research Council
(New York, 1949). In this book, eight ways to measure the accuracy of a
pre-election poll were proposed. Other methods have since been proposed
(Traugott et al, 2003 AAPOR conference; Shipman et al, 2005 AAPOR conference).
Of the eight Mosteller
Measures, “Mosteller 5” (M5) has been widely used by academics.[2] This
is the absolute difference between predicted and actual spread between the top
two choices, measured in percentage units. (Mosteller originally defined his
measure for an election with one Democratic and one Republican; here we extend
the definition here to cover ballot measures, where the choices are “Yes” and
“No.”)[3]
In what follows, we will
adapt the M5 measure, which measures
the accuracy of an individual poll in a single election,
to measure
the performance of a pollster in a group of
elections,
and also to measure
the difficulty of a particular type of
election.
The terminology is tricky: we
measure the accuracy of a poll, the performance of a pollster, and the difficulty of an election.
4. Average M5 Error As A Measure Of Pollster Performance
To begin, we identify the 15
most active state pollsters in 2004.[4] For
each contest, we calculate an M5 Error for each pollster. We then average the
M5 Errors across all the contests an individual pollster worked. We then rank
the 15 Most Active Pollsters from lowest M5 error to highest. To make it easier
to review the data unemotionally, we label the pollsters A through O.[5]
Here are the 15 most active statewide pollsters ranked by size of M5 Error:
|
|
|
|
|
|
Ranked ▼ |
|
Pollster |
# Contests |
President Contests |
State |
Ballot Measure |
Avg M5 Error |
|
Pollster A |
58 |
30 |
20 |
8 |
3.01% |
|
Pollster B |
17 |
11 |
6 |
0 |
3.09% |
|
Pollster C |
39 |
29 |
10 |
0 |
3.13% |
|
Pollster D |
10 |
8 |
2 |
0 |
3.34% |
|
Pollster E |
42 |
23 |
13 |
6 |
3.66% |
|
Pollster F |
9 |
4 |
4 |
1 |
3.82% |
|
Pollster G |
26 |
12 |
14 |
0 |
4.02% |
|
Pollster H |
7 |
5 |
2 |
0 |
4.47% |
|
Pollster I |
12 |
9 |
2 |
1 |
4.49% |
|
Pollster J |
28 |
14 |
11 |
3 |
5.45% |
|
Pollster K |
7 |
1 |
6 |
0 |
6.95% |
|
Pollster L |
6 |
1 |
1 |
4 |
7.43% |
|
Pollster M |
6 |
1 |
1 |
4 |
8.46% |
|
Pollster N |
7 |
1 |
0 |
6 |
8.48% |
|
Pollster O |
8 |
4 |
1 |
3 |
11.34% |
§
6 pollsters have
an M5 error between 3 and 4 points.
§
3 pollsters have
an M5 error between 4 and 5 points.
§
6 pollsters have
an M5 error between 5 and 12 points.
But is this a fair
comparison? Does it unfairly penalize pollsters who polled “hard” elections and
unfairly reward other pollsters who polled “easy” elections?
5. Problem: Pollsters Don’t All Poll The Same Elections
Not all pollsters work the
same types of elections. Pollsters B, C, D and I polled primarily on Presidential
contests. Pollsters L, M and N polled mostly Ballot Measures. Pollster K polled
mostly Senate elections. Pollsters A and E polled a mix of Presidential, State
Office, and Ballot Measures.
Even pollsters who polled a
similar mix of contests types polled different actual contests. Averaging
the M5 error across all polls is a valid way of comparing pollsters in
theory, but in practice we can do better.
6. Some Types Of Elections Are More Difficult Than Others
Even though the voters in
each state are approximately the same for President, State Office, and Ballot
Measures, contests on the ballot may have differing levels of voter interest,
awareness, or knowledge, and turnout may vary from state to state. Of our
universe of 446 statewide polls, 52%, were presidential polls, 36% were for
statewide office, 12% were for ballot measures.
Averaging all polls for all
contests, we confirm that some contests are indeed harder to poll than others:
|
Type Of Contest |
# Polls |
Avg
M5 Error |
|
President |
233 |
3.42 |
|
State Office |
161 |
4.62 |
|
Ballot Measure |
52 |
9.15 |
|
|
|
|
|
Summary For All Contests |
446 |
4.52 |
The average M5 Error for
Presidential contests is 3.42 points. The average M5 Error for State Office is
4.62 points, 35% greater. The average M5 error for Ballot Measures is 9.15
points, 168% greater than the M5 Error for Presidential polls.[6] These
numbers will be regarded as “difficulty measures” for each election type. A
pollster who polls only ballot measures, if he were average, would have an M5
error of 9.15. A pollster who polls only presidential contests would have an M5
Error of 3.42. The two cannot be compared side-by-side fairly without an
adjustment to account for difficulty.
7. “Par” As A Benchmark For Pollster Performance
The game of golf provides an inspiration.
Consider two golfers. Golfer
A plays the front nine and shoots a 35. Golfer B plays the back nine and shoots
a 38. They meet at the clubhouse and compare notes. Is it knowable who is doing
better, since they have not played one common hole? Yes. Using scores from
other golfers, we know (for purposes of this illustration) that Par for the
Front 9 is 33 and Par for the Back 9 is 39. Golfer A is 2 over par. Golfer B is
1 under par. Golfer B is doing better, though B has 3 more strokes than A.
Let’s apply this concept to pollsters.
Each type of contest (President, State Office, Ballot Measure) has an average
error. An average pollster should score Par. A better-than-average average
pollster should be under Par. A worse-than-average average pollster should be
over Par. If a pollster polls only Presidential elections, Par is the average
error for all Presidential polls. If a pollster polls a mix of elections, Par
is the weighted average error for each type of contest.