Measures Of Difficulty In Election Polling.

 

By

Joseph Shipman, PhD

Director of Election Polling, SurveyUSA

 

And

Jay H. Leve

Editor, SurveyUSA

 

Abstract: Some elections are harder to forecast than others. Therefore, it is not fair across disparate elections to calculate one pollster’s average “error” and compare it to another’s, since one pollster may have polled disproportionately “easy” contests and the other may have polled disproportionately “hard” contests. Data from 104 polling organizations in 120 statewide November 2004 election contests are analyzed. The 15 most active statewide pollsters are ranked. Two measures of pollster performance that take “degree of difficulty” into account are developed. One measure borrows from golf, and assigns a “par” score for each different type of contest (President, State Office, Ballot Measure). Pollsters are judged to have scored under par or over par. The most active 15 pollsters are ranked on a Leader Board. A second measure borrows from Olympic Figure Skating, and assigns an ordinal rank to the accuracy of each poll that a pollster conducted, relative to all of the other polls conducted in the same contest. The average of each pollster’s ordinal rankings is then compared to determine, using this approach, which statewide pollsters were Medalists in 2004.

 

Contents:

 

1.        Introduction

2.        Methods

3.        Measuring The Accuracy Of An Individual Poll

4.        Average M5 Error As A Measure Of Pollster Performance

5.        Problem: Pollsters Don’t All Poll The Same Elections

6.        Some Types Of Elections Are More Difficult Than Others

7.        “Par” As A Benchmark For Pollster Performance

8.        Problems With “Par”

9.        A Rank-Order Based Measure For Pollster Performance

10.     Pollster Performance Using Rank-Order Measure

11.     Conclusion

 

1.       Introduction

 

Various measures have been proposed for evaluating the accuracy of individual election polls. But the problem of evaluating fairly the performance of election pollsters is more complicated, because pollsters don’t all poll the same contests.

 

Olympic athletes are judged not only on the precision of their performance but on its inherent difficulty. Some election contests present more of a polling challenge than others. In this paper we investigate how to account for “degree of difficulty” when evaluating pollster performance.

 

After selecting a measure of poll accuracy, we construct two “difficulty-adjusted” measures of pollster performance, an “absolute measure” (which adjusts for the difficulty of each type of election), and a “relative measure” (derived from the rank-ordering of polls by accuracy for each individual contest).


 

2.       Methods

 

The 2004 General Election was Tuesday 11/2. We include in our examination all statewide general election contests for which at least one predictive poll was released between 10/1/04 and the time voting started on 11/2/04, counting only the final poll conducted in each state by each research organization.[1]

 

There were qualifying Presidential polls in 45 states (no qualifying poll in AK, DE, ID, MS, WY), qualifying U.S. Senate polls in 28 states, qualifying Gubernatorial polls in 10 states, polls for another statewide office in 3 states, and polls for 34 statewide ballot measures in 12 different states – a total of 120 distinct election contests. For these 120 contests, 446 polls were conducted by 104 different research organizations.

 

Summarizing, our universe is:

 

§         120 election contests, in which

§         446 election polls were conducted by

§         104 different pollsters

 

Information on election polls was obtained and triangulated from 3 websites:

 

§         www.realclearpolitics.com

§         www.pollingreport.com

§         www.nationaljournal.com

 

In the few cases where these sources were not in agreement, an individual pollster’s website was searched, and/or www.nexis.com was searched. To qualify for inclusion, a poll must have been conducted in an entire state, provided information on Margin of Error or sample size, provided information on field period, and predicted the outcome of a statewide contest. If more than one set of percentages was given (some pollsters release separate numbers for “Registered Voters” and “Likely Voters”), the numbers for “Likely Voters” were used. Some pollsters released separate results for President with and without Independent candidate Ralph Nader.  In states where Nader appeared on the ballot, the numbers which included Nader were used; in states where Nader did not appear on the ballot, pollster numbers which excluded Nader were used. Information on actual election returns was obtained in each state from the Secretary of State or the State Elections Department, and was checked against counts maintained by the Associated Press.

 

Data was compiled by seven SurveyUSA employees. Every effort was made to include all qualifying statewide polls; no known polls were excluded. The database of polls underlying this paper was released by SurveyUSA on 11/10/04 to the Associated Press, to the National Council on Public Polls (NCPP) and to the polling ombudsman www.mysterypollster.com.


 

3.       Measuring The Accuracy Of An Individual Poll

 

After the 1948 presidential election, a commission was formed to study the failure of polls to predict Truman's reelection. The resulting book, The Pre-election Polls of 1948: Report to the Committee on Analysis of Pre-Election Polls and Forecasts, by Frederick Mosteller et al, was published by the Social Science Research Council (New York, 1949). In this book, eight ways to measure the accuracy of a pre-election poll were proposed. Other methods have since been proposed (Traugott et al, 2003 AAPOR conference; Shipman et al, 2005 AAPOR conference).

 

Of the eight Mosteller Measures, “Mosteller 5” (M5) has been widely used by academics.[2] This is the absolute difference between predicted and actual spread between the top two choices, measured in percentage units. (Mosteller originally defined his measure for an election with one Democratic and one Republican; here we extend the definition here to cover ballot measures, where the choices are “Yes” and “No.”)[3]

 

In what follows, we will adapt the M5 measure, which measures

 

the accuracy of an individual poll in a single election,

 

to measure

 

the performance of a pollster in a group of elections,

 

and also to measure

 

the difficulty of a particular type of election.

 

The terminology is tricky: we measure the accuracy of a poll, the performance of a pollster, and the difficulty of an election.

 

4.       Average M5 Error As A Measure Of Pollster Performance

 

To begin, we identify the 15 most active state pollsters in 2004.[4] For each contest, we calculate an M5 Error for each pollster. We then average the M5 Errors across all the contests an individual pollster worked. We then rank the 15 Most Active Pollsters from lowest M5 error to highest. To make it easier to review the data unemotionally, we label the pollsters A through O.[5] Here are the 15 most active statewide pollsters ranked by size of M5 Error:


 

 

 

 

 

 

Ranked
From
Lowest
To
Highest

Pollster

# Contests

President Contests

State
Office

Ballot Measure

Avg M5 Error

Pollster A

58

30

20

8

3.01%

Pollster B

17

11

6

0

3.09%

Pollster C

39

29

10

0

3.13%

Pollster D

10

8

2

0

3.34%

Pollster E

42

23

13

6

3.66%

Pollster F

9

4

4

1

3.82%

Pollster G

26

12

14

0

4.02%

Pollster H

7

5

2

0

4.47%

Pollster I

12

9

2

1

4.49%

Pollster J

28

14

11

3

5.45%

Pollster K

7

1

6

0

6.95%

Pollster L

6

1

1

4

7.43%

Pollster M

6

1

1

4

8.46%

Pollster N

7

1

0

6

8.48%

Pollster O

8

4

1

3

11.34%

 

§         6 pollsters have an M5 error between 3 and 4 points.

§         3 pollsters have an M5 error between 4 and 5 points.

§         6 pollsters have an M5 error between 5 and 12 points.

 

But is this a fair comparison? Does it unfairly penalize pollsters who polled “hard” elections and unfairly reward other pollsters who polled “easy” elections?

 

5.       Problem: Pollsters Don’t All Poll The Same Elections

 

Not all pollsters work the same types of elections. Pollsters B, C, D and I polled primarily on Presidential contests. Pollsters L, M and N polled mostly Ballot Measures. Pollster K polled mostly Senate elections. Pollsters A and E polled a mix of Presidential, State Office, and Ballot Measures.

 

Even pollsters who polled a similar mix of contests types polled different actual contests. Averaging the M5 error across all polls is a valid way of comparing pollsters in theory, but in practice we can do better.

 

6.       Some Types Of Elections Are More Difficult Than Others

 

Even though the voters in each state are approximately the same for President, State Office, and Ballot Measures, contests on the ballot may have differing levels of voter interest, awareness, or knowledge, and turnout may vary from state to state. Of our universe of 446 statewide polls, 52%, were presidential polls, 36% were for statewide office, 12% were for ballot measures.

 

Averaging all polls for all contests, we confirm that some contests are indeed harder to poll than others:

 

Type Of Contest

# Polls

Avg M5 Error

President

233

3.42

State Office

161

4.62

Ballot Measure

52

9.15

 

 

 

Summary For All Contests

446

4.52

 

The average M5 Error for Presidential contests is 3.42 points. The average M5 Error for State Office is 4.62 points, 35% greater. The average M5 error for Ballot Measures is 9.15 points, 168% greater than the M5 Error for Presidential polls.[6] These numbers will be regarded as “difficulty measures” for each election type. A pollster who polls only ballot measures, if he were average, would have an M5 error of 9.15. A pollster who polls only presidential contests would have an M5 Error of 3.42. The two cannot be compared side-by-side fairly without an adjustment to account for difficulty.

 

7.       “Par” As A Benchmark For Pollster Performance

 

The game of golf provides an inspiration.

 

Consider two golfers. Golfer A plays the front nine and shoots a 35. Golfer B plays the back nine and shoots a 38. They meet at the clubhouse and compare notes. Is it knowable who is doing better, since they have not played one common hole? Yes. Using scores from other golfers, we know (for purposes of this illustration) that Par for the Front 9 is 33 and Par for the Back 9 is 39. Golfer A is 2 over par. Golfer B is 1 under par. Golfer B is doing better, though B has 3 more strokes than A.

 

Let’s apply this concept to pollsters. Each type of contest (President, State Office, Ballot Measure) has an average error. An average pollster should score Par. A better-than-average average pollster should be under Par. A worse-than-average average pollster should be over Par. If a pollster polls only Presidential elections, Par is the average error for all Presidential polls. If a pollster polls a mix of elections, Par is the weighted average error for each type of contest.

 

Because there are no “strokes” in polling, the Leader Board that follows is not expressed with Integers “under par” or “over par”. However, by converting each pollster’s performance into a percentage of Par, we are able to rank the 15 most active pollsters, where 100% is Par, percentages below 100% are Under Par and percentages above 100% are Over Par.

 


Here is the Leader Board for 2004: [7]

 

 

 

 

Ranked
From
Lowest
To
Highest

 

 

Pollster

Expected Par
Error

Actual M5
Error

%
of
Par

% of Par Rank, Best To Worst

Change from M5 Error

Pollster A

4.62%

3.01%

65%

1

Unchanged

Pollster E

4.61%

3.66%

79%

2

+ 3

Pollster B

3.84%

3.09%

80%

3

- 1

Pollster F

4.59%

3.82%

83%

4

+2

Pollster C

3.72%

3.13%

84%

5

-2

Pollster D

3.66%

3.34%

91%

6

-2

Pollster G

4.06%

4.02%

99%

7

Unchanged

Pollster L

7.44%

7.43%

100%

8

+4

Pollster N

8.33%

8.48%

102%

9

+5

Pollster I

4.09%

4.49%

110%

10

-1

Pollster M

7.44%

8.46%

114%

11

+2

Pollster H

3.76%

4.47%

119%

12

-4

Pollster J

4.50%

5.45%

121%

13

-3

Pollster K

4.44%

6.95%

156%

14

-3

Pollster O

5.72%

11.34%

198%

15

Unchanged

 

Pollster E jumps from 5th place to 2nd. Pollster B, C and D do worse. Pollsters L and N jump from the bottom of the pack to the middle. Pollster A is still 1st. Pollster O is still last.

 

8.       Problems With “Par”

 

The concept of Par is helpful. It is simple to calculate, and has a clear interpretation. But it is open to several criticisms.

 

a)       Elections are not all weighted equally. 

 

b)       It does not account for any polling difficulties unrelated to the simple 3-way classification of elections by type. Some ballot measures are more complex than others; some states may be harder to poll than others because of turnout differences or sampling issues.

 

c)       A pollster’s score can be influenced by contests the pollster did not work, since “Par” is calculated using the scores of all pollsters in all like contests.

 

 

The first point has the biggest impact. For example: if I poll one Senate election and one Ballot Measure, Par is 6.9. If in fact I score Par on the Senate election (4.6 point Error) and get the Ballot Measure exactly right (0.0 point Error), my Error is 2.3, which is 33% of Par. But if I score Par on the Ballot Measure (9.2 pt Error) and get the Senate election exactly right (0.0 point Error), my Error is 4.6 pts, or 67% of par. Getting the Senate election right is worth only half as much as getting the Ballot Measure right.

 

Is there another approach that overcomes these limitations?

 

9.       A Rank-Order Based Measure For Pollster Performance

 

When Olympic judges want to determine who is the best overall figure skater, they use the ordinal rankings for the contestants in each part of the competition. Olympic judges do not just average the scores of skaters from different judges, or across events, but rather, they determine the winner based on the rank-order finish of  skaters for each part, which are themselves determined by the rank-orderings of each judge. 

 

The reason the Olympics use ordinal rankings is to make sure each judge has an equal influence, and each part of the competition gets the appropriate weight.  Otherwise, a very high or very low score could have a disproportionate influence.[8]

 

Is there a polling equivalent? For each contest, we can examine all of the competing polls and rank them from best to worst. Why does this advance our understanding? Well, a pollster might poll contest x and have an M5 error of 3.0, and be the best pollster, but might poll contest y and have an identical M5 error of 3.0, but be the worst pollster. This is not a contrived example; it happened: In the contest for Governor of Washington, pollster B’s 3-pt M5 error was the best of 5 competing pollsters. In the contest for U.S. Senate in California, pollster B’s 3-point error was the worst of 5 competing pollsters. If polling were an Olympic sport, pollster B would finish 1st in Washington and last in California, though he had the exact same M5 score in both events.

 

Now, the number of competing pollsters varies by contests. To ensure that all contests have equal weight, we devised a system that takes all of the pollsters for any contest and converts their ordinal rank into a number from 0.0 to 1.0, where 0.0 is the best possible score and 1.0 is the worst possible score. If 2 pollsters work a contest, the one with the smaller M5 Error gets a score of 0.0 and the one with the larger M5 Error gets a score of 1.0. If 3 pollsters work a contest, the best gets a score of 0.0, the middle gets a score of 0.5, and the worst gets a score of 1.0; and so on. Ties are accommodated by averaging the scores of the places tied for.

 

Here is how this works in our two examples (actual returns are rounded for illustrative purposes):


 

 

WA Governor

Vote

Pollster B

Pollster E

Pollster A

Pollster P

Pollster Q

 

Gregoire (D)

49%

47%

48%

45%

45%

51%

 

Rossi (R)

49%

44%

43%

51%

38%

39%

 

Victory Margin

0

3

5

-6

7

12

 

M5 error

n/a

3

5

6

7

12

 

Ordinal rank

n/a

1st

2nd

3rd

4th

5th

Ranked Low to High ►

Ordinal score

n/a

0

0.25

0.5

0.75

1

 

 

 

 

 

 

 

 

 

CA U.S. Senate

Vote

Pollster A

Pollster R

Pollster O

Pollster S

Pollster B

 

Boxer (D)

58%

57%

53%

55%

53%

53%

 

Jones (R)

38%

36%

34%

33%

35%

36%

 

Victory Margin

20

21

19

22

18

17

 

M5 error

n/a

1

1

2

2

3

 

Ordinal rank

n/a

Tie for 1st

Tie for 1st

Tie for 3rd

Tie for 3rd

5th

Ranked Low to High ►

Ordinal score

n/a

0.125

0.125

0.625

0.625

1

 

In Washington, there are no ties, so the 5 pollsters receive 5 equally spaced scores: 0.0, 0.25, 0.5, 0.75, 1.0.

 

In California, the 1st and 2nd place pollsters tie, and the 3rd and 4th place pollsters tie. Ordinal scores are awarded: 0.125, 0.125, 0.625, 0.625, 1.0.

 

The new Ordinal Rank Measure for each pollster is the average of the pollster’s scores for each multi-pollster contest he polled. Contests with only one pollster will be ignored.[9]

 

The Ordinal Rank Measure defined here fixes all three problems with Par. Election contests are weighted equally, difficulty is determined individually for each contest, and a pollster’s score does not depend on what happened in contests he did not poll.

 

10.    Pollster Performance Using Rank-Order Measure

 

Here is how the pollsters stack up using Rank Order Measure, where a score of 0.0 means the pollster was consistently the best in every event he/she entered, and a score of 1.0 means the pollster was consistently the worst in every event he/she entered.


 

 

 

 

Ranked
From
Lowest
To
Highest

 

 

Pollster name

# Multi-Pollster Contests

Avg Ordinal Score

Ordinal Ranking Best to Worst

Compared
To
% of Par
Ranking

Compared
To M5 Error
Ranking

Pollster A

55

0.333

1

Unchanged

Unchanged

Pollster E

39

0.362

2

Unchanged

+3

Pollster D

10

0.388

3

+ 3

+1

Pollster C

39

0.399

4

+1

-1

Pollster B

17

0.458

5

-2

-3

Pollster F

9

0.477

6

-2

Unchanged

Pollster G

22

0.529

7

Unchanged

Unchanged

Pollster H

7

0.589

8

+4

Unchanged

Pollster L

5

0.633

9

-1

+3

Pollster I

12

0.636

10

Unchanged

-1

Pollster J

28

0.660

11

+2

-1

Pollster O

5

0.733

12

+3

+3

Pollster K

6

0.887

13

+1

-2

 

 

 

 

 

 

Pollster M

2

0.367

n/a

Pollster N

2

0.714

n/a

 

Pollster M and Pollster N are included in the table, but are listed separately and not given an Ordinal Rank, because the number of contests each polled where there was more than one pollster is now just 2 – too few cases in our judgment to merit an Ordinal Rank.

 

The following differences are noted:

 

a)       Compared with the original “M5 error” performance measure, the pollsters in 2nd through 5th place have reversed their relative order.

 

b)       Compared with the “% of par” measure, Pollster D has vaulted past Pollsters B, C, and F.  One possible way to interpret this is that Pollster D chose relatively “difficult contests” to do, even though its “Par” error value was lower than anyone else’s because 8 of its 10 polls were Presidential.

 

c)       Pollster L, whose 100% of par was average, has a below-average ordinal score of 0.633. This is because other pollsters usually did better on the specific elections Pollster L worked: Pollster L had an above-average rank in only 1 of 5 individual contests.

 

If Medals were awarded (using the date-inclusion criteria here adopted) and including the 15 most active state pollsters, the winners for 2004 would be: [10]

 

  • Gold Medal           Pollster A
  • Silver Medal          Pollster E
  • Bronze Medal       Pollster D

 

 

11.    Conclusion

 

In measuring the performance of election pollsters, it is inappropriate to average a measure of accuracy for individual polls, because some elections are harder to poll than others. 

 

Calculating a “% of Par” score gets us closer to an understanding of how well pollsters performed.

 

Limitations of the Par score approach can be addressed by using a Rank Order Measure based on the ordinal rankings of how pollsters performed relative to other pollsters.

 


Appendix

 

Sort

 

 

 

ID

Pollster Name

Location

Website

A

SurveyUSA

Verona, NJ

www.surveyusa.com

B

Strategic Vision

Atlanta, GA

www.strategicvision.biz

C

Rasmussen Reports

Ocean Grove, NJ

www.rasmussenreports.com

D

American Research Group

Manchester, NH

www.americanresearchgroup.com

E

Mason-Dixon Polling & Research

Washington, DC

www.mason-dixon.com

F

Opinion Dynamics

Cambridge, MA

www.opiniondynamics.com

G

Research 2000

Olney, MD

www.research2000.us

H

Market Shares

Mt. Prospect, IL

www.marketsharescorp.com

I

Gallup Organization

Princeton, NJ

www.gallup.com  

J

Zogby International

Utica, NY

www.zogby.com

K

Global Strategy Group

New York, NY

www.globalstrategygroup.com

L

Wilson Research Strategies

McLean, VA

www.w-r-s.com

M

Schroth & Associates

Washington, DC

 

N

Davis & Hibbits

Portland, OR

www.dhmresearch.com

O

Los Angeles Times

Los Angeles, CA

http://www.latimes.com

 

 

Sort

 

 

ID

Pollster Name

Location

Website

D

American Research Group

Manchester, NH

www.americanresearchgroup.com

N

Davis & Hibbits

Portland, OR

www.dhmresearch.com

K

Global Strategy Group

New York, NY

www.globalstrategygroup.com

I

Gallup Organization

Princeton, NJ

www.gallup.com  

O

Los Angeles Times

Los Angeles, CA

http://www.latimes.com

H

Market Shares

Mt. Prospect, IL

www.marketsharescorp.com

E

Mason-Dixon Polling & Research

Washington, DC

www.mason-dixon.com

F

Opinion Dynamics

Cambridge, MA

www.opiniondynamics.com

C

Rasmussen Reports

Ocean Grove, NJ

www.rasmussenreports.com

G

Research 2000

Olney, MD

www.research2000.us

M

Schroth & Associates

Washington, DC

 

B

Strategic Vision

Atlanta, GA

www.strategicvision.biz

A

SurveyUSA

Verona, NJ

www.surveyusa.com

L

Wilson Research Strategies

McLean, VA

www.w-r-s.com

J

Zogby International

Utica, NY

www.zogby.com

 

 

 



 

 

 

 

 

 

 

 



[1] There is a tension in selecting a start date for inclusion. If you choose a date that is too early, you risk including in your data set “stale” polls that may well have reflected opinion at the time they were conducted but not opinion at the time of the election. If you choose a date that is too late, your data set is so small that you may have trouble drawing conclusions from it. The authors examined three possible cutoff dates. A) October 1, 2004, which is the date ultimately used in the preparation of this paper. B) October 19, 2004, which is two weeks to Election Day, and C) October 26, 2004, which is one week to Election Day. Had we chosen 10/19/04 as the cutoff date for inclusion, the universe of polls available for examination would have shrunk from 446 to 385 (14% decline), and the universe of elections available for examination would have shrunk from 120 to 115 (4% decline). Had we chosen 10/26/04 as the cutoff date for inclusion, the universe of polls available for examination would have shrunk from 446 to 285, and the universe of elections available for examination would have shrunk from 120 to 98.  Also, the universe of elections with more than one competing pollster, which is what is used for the Ordinal Rank measure, would have shrunk from 84 to 62. In addition, unique to 2004, there was a tension in choosing the back-end date for inclusion. In 2004, one pollster released polls at 5 pm ET on Election Day, after voting had begun and after early exit polls were widely available. The authors excluded these polls from consideration, and included the pollster’s penultimate predictions.

.

[2] Analysis herein is based on M5, but the steps would work the same way had we chosen any numerical measure of poll accuracy.

[3] In a hypothetical election: if the Republican wins 51.1% to 48.3%, the spread between is 2.8 points. A pollster who predicted the Republican would win 54% to 45% had a 9-point spread. His M5 Error is (9-2.8) = 6.2 points.  A pollster who predicted the Democrat would win 49% to 47% has a M5 Error of -((-2)-2.8) = 4.8 points (we change the sign so the error measure always comes out greater than or equal to zero).

[4] The same analysis is possible choosing a different cutoff for “minimum number of contests worked.” In this case, we chose “more than 5” rather than a higher or lower cutoff, because it provided a large enough group of pollsters to illustrate all significant aspects of the proposed measures, while ensuring that each pollster’s ranking was based on a large enough sample of polls to give statistically representative averages.

[5] The names of the polling firms are included in the Appendix.

[6] These errors assume a Release Date inclusion period beginning 10/1/04. Had we chosen 10/19/04 as the Release Date inclusion cutoff, the M5 Average Error for President declines from 3.42 % to 3.21%. The M5 Average Error for State Office declines from 4.62% to 4.34%. The M5 Average Error for Ballot Measures declines from 9.15% to 8.99%. The M5 Average Error for all contests declines from 4.52% to 4.31%. Had we chosen 10/26/04 as the Release Date inclusion cutoff, the M5 Average Error for President declines to 2.99%, the M5 Average Error for State Office declines to 4.00%, and the M5 Average Error for Ballot Measures declines to 8.20%. The M5 Average Error for all contests declines to 3.89%.

[7] Had 10/19 been used as the Release Date inclusion cutoff, instead of 10/1, the Leader Board would be: A, E, B, C, F. Had 10/26 been used as the Release Date inclusion cutoff, the Leader Board would be: A, E, B, C, D.

[8] For example, suppose that in the “short program” there are five judges, all of whom give skater x a score of 5.5 out of 6.0.  Four judges give skater y a score of 5.4, but the remaining judge gives skater y a perfect score of 6.0.  If the raw scores were averaged, skater y would beat skater x 5.52 to 5.50, even though four out of five judges prefer x to y.  When the scores are converted to ordinal rankings, x gets four 1st-place votes and one 2nd place vote, for an average ordinal of 1.2, while y gets one 1st place vote and four 2nd-place votes, for an average ordinal of 1.8. x wins. With ordinals, it is impossible for one judge to unfairly increase his influence by inflating or depressing scores – the most a judge can do is put one skater 1st and another skater 2nd.  

[9] This is an unavoidable omission. Contests with only 1 pollster don’t give any information about whether that pollster was better or worse than average for that election. It would be possible to assign each pollster a score of 0.5 for all contests where there were no competing polls; but this would unfairly penalize a better-than-average pollster for taking on elections nobody else is doing, no matter how accurate he is on them, and it would also unfairly reward a worse-than average pollster for taking on elections nobody else is doing, no matter how in-accurate he is on them.

[10] Had a 10/19 Release Date inclusion cutoff been used, instead of 10/1, the medalists would have been: Gold: Pollster A, Silver: Pollster C; Bronze, Pollster D. Had a 10/26 Release Date inclusion cutoff been used, the medalists would have been: Gold: Pollster C; Silver: Pollster A. Bronze: Pollster D.