Measures Of Difficulty In Election Polling.

 

By

Joseph Shipman, PhD

Director of Election Polling, SurveyUSA

 

And

Jay H. Leve

Editor, SurveyUSA

 

Abstract: Some elections are harder to forecast than others. Therefore, it is not fair across disparate elections to calculate one pollster’s average “error” and compare it to another’s, since one pollster may have polled disproportionately “easy” contests and the other may have polled disproportionately “hard” contests. Data from 104 polling organizations in 120 statewide November 2004 election contests are analyzed. The 15 most active statewide pollsters are ranked. Two measures of pollster performance that take “degree of difficulty” into account are developed. One measure borrows from golf, and assigns a “par” score for each different type of contest (President, State Office, Ballot Measure). Pollsters are judged to have scored under par or over par. The most active 15 pollsters are ranked on a Leader Board. A second measure borrows from Olympic Figure Skating, and assigns an ordinal rank to the accuracy of each poll that a pollster conducted, relative to all of the other polls conducted in the same contest. The average of each pollster’s ordinal rankings is then compared to determine, using this approach, which statewide pollsters were Medalists in 2004.

 

Contents:

 

1.        Introduction

2.        Methods

3.        Measuring The Accuracy Of An Individual Poll

4.        Average M5 Error As A Measure Of Pollster Performance

5.        Problem: Pollsters Don’t All Poll The Same Elections

6.        Some Types Of Elections Are More Difficult Than Others

7.        “Par” As A Benchmark For Pollster Performance

8.        Problems With “Par”

9.        A Rank-Order Based Measure For Pollster Performance

10.     Pollster Performance Using Rank-Order Measure

11.     Conclusion

 

1.       Introduction

 

Various measures have been proposed for evaluating the accuracy of individual election polls. But the problem of evaluating fairly the performance of election pollsters is more complicated, because pollsters don’t all poll the same contests.

 

Olympic athletes are judged not only on the precision of their performance but on its inherent difficulty. Some election contests present more of a polling challenge than others. In this paper we investigate how to account for “degree of difficulty” when evaluating pollster performance.

 

After selecting a measure of poll accuracy, we construct two “difficulty-adjusted” measures of pollster performance, an “absolute measure” (which adjusts for the difficulty of each type of election), and a “relative measure” (derived from the rank-ordering of polls by accuracy for each individual contest).


 

2.       Methods

 

The 2004 General Election was Tuesday 11/2. We include in our examination all statewide general election contests for which at least one predictive poll was released between 10/1/04 and the time voting started on 11/2/04, counting only the final poll conducted in each state by each research organization.[1]

 

There were qualifying Presidential polls in 45 states (no qualifying poll in AK, DE, ID, MS, WY), qualifying U.S. Senate polls in 28 states, qualifying Gubernatorial polls in 10 states, polls for another statewide office in 3 states, and polls for 34 statewide ballot measures in 12 different states – a total of 120 distinct election contests. For these 120 contests, 446 polls were conducted by 104 different research organizations.

 

Summarizing, our universe is:

 

§         120 election contests, in which

§         446 election polls were conducted by

§         104 different pollsters

 

Information on election polls was obtained and triangulated from 3 websites:

 

§         www.realclearpolitics.com

§         www.pollingreport.com

§         www.nationaljournal.com

 

In the few cases where these sources were not in agreement, an individual pollster’s website was searched, and/or www.nexis.com was searched. To qualify for inclusion, a poll must have been conducted in an entire state, provided information on Margin of Error or sample size, provided information on field period, and predicted the outcome of a statewide contest. If more than one set of percentages was given (some pollsters release separate numbers for “Registered Voters” and “Likely Voters”), the numbers for “Likely Voters” were used. Some pollsters released separate results for President with and without Independent candidate Ralph Nader.  In states where Nader appeared on the ballot, the numbers which included Nader were used; in states where Nader did not appear on the ballot, pollster numbers which excluded Nader were used. Information on actual election returns was obtained in each state from the Secretary of State or the State Elections Department, and was checked against counts maintained by the Associated Press.

 

Data was compiled by seven SurveyUSA employees. Every effort was made to include all qualifying statewide polls; no known polls were excluded. The database of polls underlying this paper was released by SurveyUSA on 11/10/04 to the Associated Press, to the National Council on Public Polls (NCPP) and to the polling ombudsman www.mysterypollster.com.


 

3.       Measuring The Accuracy Of An Individual Poll

 

After the 1948 presidential election, a commission was formed to study the failure of polls to predict Truman's reelection. The resulting book, The Pre-election Polls of 1948: Report to the Committee on Analysis of Pre-Election Polls and Forecasts, by Frederick Mosteller et al, was published by the Social Science Research Council (New York, 1949). In this book, eight ways to measure the accuracy of a pre-election poll were proposed. Other methods have since been proposed (Traugott et al, 2003 AAPOR conference; Shipman et al, 2005 AAPOR conference).

 

Of the eight Mosteller Measures, “Mosteller 5” (M5) has been widely used by academics.[2] This is the absolute difference between predicted and actual spread between the top two choices, measured in percentage units. (Mosteller originally defined his measure for an election with one Democratic and one Republican; here we extend the definition here to cover ballot measures, where the choices are “Yes” and “No.”)[3]

 

In what follows, we will adapt the M5 measure, which measures

 

the accuracy of an individual poll in a single election,

 

to measure

 

the performance of a pollster in a group of elections,

 

and also to measure

 

the difficulty of a particular type of election.

 

The terminology is tricky: we measure the accuracy of a poll, the performance of a pollster, and the difficulty of an election.

 

4.       Average M5 Error As A Measure Of Pollster Performance

 

To begin, we identify the 15 most active state pollsters in 2004.[4] For each contest, we calculate an M5 Error for each pollster. We then average the M5 Errors across all the contests an individual pollster worked. We then rank the 15 Most Active Pollsters from lowest M5 error to highest. To make it easier to review the data unemotionally, we label the pollsters A through O.[5] Here are the 15 most active statewide pollsters ranked by size of M5 Error:


 

 

 

 

 

 

Ranked
From
Lowest
To
Highest

Pollster

# Contests

President Contests

State
Office

Ballot Measure

Avg M5 Error

Pollster A

58

30

20

8

3.01%

Pollster B

17

11

6

0

3.09%

Pollster C

39

29

10

0

3.13%

Pollster D

10

8

2

0

3.34%

Pollster E

42

23

13

6

3.66%

Pollster F

9

4

4

1

3.82%

Pollster G

26

12

14

0

4.02%

Pollster H

7

5

2

0

4.47%

Pollster I

12

9

2

1

4.49%

Pollster J

28

14

11

3

5.45%

Pollster K

7

1

6

0

6.95%

Pollster L

6

1

1

4

7.43%

Pollster M

6

1

1

4

8.46%

Pollster N

7

1

0

6

8.48%

Pollster O

8

4

1

3

11.34%

 

§         6 pollsters have an M5 error between 3 and 4 points.

§         3 pollsters have an M5 error between 4 and 5 points.

§         6 pollsters have an M5 error between 5 and 12 points.

 

But is this a fair comparison? Does it unfairly penalize pollsters who polled “hard” elections and unfairly reward other pollsters who polled “easy” elections?

 

5.       Problem: Pollsters Don’t All Poll The Same Elections

 

Not all pollsters work the same types of elections. Pollsters B, C, D and I polled primarily on Presidential contests. Pollsters L, M and N polled mostly Ballot Measures. Pollster K polled mostly Senate elections. Pollsters A and E polled a mix of Presidential, State Office, and Ballot Measures.

 

Even pollsters who polled a similar mix of contests types polled different actual contests. Averaging the M5 error across all polls is a valid way of comparing pollsters in theory, but in practice we can do better.

 

6.       Some Types Of Elections Are More Difficult Than Others

 

Even though the voters in each state are approximately the same for President, State Office, and Ballot Measures, contests on the ballot may have differing levels of voter interest, awareness, or knowledge, and turnout may vary from state to state. Of our universe of 446 statewide polls, 52%, were presidential polls, 36% were for statewide office, 12% were for ballot measures.

 

Averaging all polls for all contests, we confirm that some contests are indeed harder to poll than others:

 

Type Of Contest

# Polls

Avg M5 Error

President

233

3.42

State Office

161

4.62

Ballot Measure

52

9.15

 

 

 

Summary For All Contests

446

4.52

 

The average M5 Error for Presidential contests is 3.42 points. The average M5 Error for State Office is 4.62 points, 35% greater. The average M5 error for Ballot Measures is 9.15 points, 168% greater than the M5 Error for Presidential polls.[6] These numbers will be regarded as “difficulty measures” for each election type. A pollster who polls only ballot measures, if he were average, would have an M5 error of 9.15. A pollster who polls only presidential contests would have an M5 Error of 3.42. The two cannot be compared side-by-side fairly without an adjustment to account for difficulty.

 

7.       “Par” As A Benchmark For Pollster Performance

 

The game of golf provides an inspiration.

 

Consider two golfers. Golfer A plays the front nine and shoots a 35. Golfer B plays the back nine and shoots a 38. They meet at the clubhouse and compare notes. Is it knowable who is doing better, since they have not played one common hole? Yes. Using scores from other golfers, we know (for purposes of this illustration) that Par for the Front 9 is 33 and Par for the Back 9 is 39. Golfer A is 2 over par. Golfer B is 1 under par. Golfer B is doing better, though B has 3 more strokes than A.

 

Let’s apply this concept to pollsters. Each type of contest (President, State Office, Ballot Measure) has an average error. An average pollster should score Par. A better-than-average average pollster should be under Par. A worse-than-average average pollster should be over Par. If a pollster polls only Presidential elections, Par is the average error for all Presidential polls. If a pollster polls a mix of elections, Par is the weighted average error for each type of contest.