Measures Of
Difficulty In Election Polling.
By
Joseph Shipman, PhD
Director of Election Polling, SurveyUSA
And
Jay H. Leve
Editor, SurveyUSA
Abstract: Some elections are harder to forecast than others.
Therefore, it is not fair across disparate elections to calculate one
pollster’s average “error” and compare it to another’s, since one pollster may
have polled disproportionately “easy” contests and the other may have polled
disproportionately “hard” contests. Data from 104 polling organizations in 120
statewide November 2004 election contests are analyzed. The 15 most active
statewide pollsters are ranked. Two measures of pollster performance that
take “degree of difficulty” into account are developed. One measure borrows
from golf, and assigns a “par” score for each different type of contest
(President, State Office, Ballot Measure). Pollsters are judged to have scored under
par or over par. The most active 15 pollsters are ranked on a Leader Board. A
second measure borrows from Olympic Figure Skating, and assigns an ordinal rank
to the accuracy of each poll that a pollster conducted, relative to all of the
other polls conducted in the same contest. The average of each pollster’s
ordinal rankings is then compared to determine, using this approach, which statewide
pollsters were Medalists in 2004.
Contents:
1.
Introduction
2.
Methods
3.
Measuring The Accuracy
Of An Individual Poll
4.
Average M5 Error
As A Measure Of Pollster Performance
5.
Problem:
Pollsters Don’t All Poll The Same
6.
Some Types Of
7.
“Par” As A
Benchmark For Pollster Performance
8.
Problems With “Par”
9.
A Rank-Order
Based Measure For Pollster Performance
10.
Pollster
Performance Using Rank-Order Measure
11.
Conclusion
1. Introduction
Various measures have been
proposed for evaluating the accuracy of individual election polls.
But the problem of evaluating fairly the performance of election pollsters
is more complicated, because pollsters don’t all poll the same contests.
Olympic athletes are judged
not only on the precision of their performance but on its inherent difficulty.
Some election contests present more of a polling challenge than others. In this
paper we investigate how to account for “degree of difficulty” when evaluating pollster
performance.
After selecting a measure of
poll accuracy, we construct two “difficulty-adjusted” measures of pollster
performance, an “absolute measure” (which adjusts for the difficulty of each
type of election), and a “relative measure” (derived from the rank-ordering of
polls by accuracy for each individual contest).
2. Methods
The 2004 General Election was
Tuesday 11/2. We include in our examination all statewide general election
contests for which at least one predictive poll was released between 10/1/04
and the time voting started on 11/2/04, counting only the final poll conducted
in each state by each research organization.[1]
There were qualifying Presidential
polls in 45 states (no qualifying poll in AK, DE, ID, MS, WY), qualifying U.S.
Senate polls in 28 states, qualifying Gubernatorial polls in 10 states, polls
for another statewide office in 3 states, and polls for 34 statewide ballot
measures in 12 different states – a total of 120 distinct election contests.
For these 120 contests, 446 polls were conducted by 104 different research
organizations.
Summarizing, our universe is:
§
120 election contests,
in which
§
446 election polls
were conducted by
§
104 different pollsters
Information on election polls
was obtained and triangulated from 3 websites:
In the few cases where these
sources were not in agreement, an individual pollster’s website was searched,
and/or www.nexis.com was searched. To
qualify for inclusion, a poll must have been conducted in an entire state,
provided information on Margin of Error or sample size, provided information on
field period, and predicted the outcome of a statewide contest. If more than
one set of percentages was given (some pollsters release separate numbers for “Registered
Voters” and “Likely Voters”), the numbers for “Likely Voters” were used. Some pollsters
released separate results for President with and without Independent candidate
Ralph Nader. In states where Nader
appeared on the ballot, the numbers which included Nader were used; in states
where Nader did not appear on the ballot, pollster numbers which excluded Nader
were used. Information on actual election returns was obtained in each state
from the Secretary of State or the State
Data was compiled by seven
SurveyUSA employees. Every effort was made to include all qualifying statewide
polls; no known polls were excluded. The database of polls underlying this
paper was released by SurveyUSA on
3. Measuring The Accuracy Of An Individual Poll
After the 1948 presidential
election, a commission was formed to study the failure of polls to predict
Truman's reelection. The resulting book, The Pre-election Polls of 1948:
Report to the Committee on Analysis of Pre-Election Polls and Forecasts, by
Frederick Mosteller et al, was published by the Social Science Research Council
(New York, 1949). In this book, eight ways to measure the accuracy of a
pre-election poll were proposed. Other methods have since been proposed
(Traugott et al, 2003 AAPOR conference; Shipman et al, 2005 AAPOR conference).
Of the eight Mosteller
Measures, “Mosteller 5” (M5) has been widely used by academics.[2] This
is the absolute difference between predicted and actual spread between the top
two choices, measured in percentage units. (Mosteller originally defined his
measure for an election with one Democratic and one Republican; here we extend
the definition here to cover ballot measures, where the choices are “Yes” and
“No.”)[3]
In what follows, we will
adapt the M5 measure, which measures
the accuracy of an individual poll in a single election,
to measure
the performance of a pollster in a group of
elections,
and also to measure
the difficulty of a particular type of
election.
The terminology is tricky: we
measure the accuracy of a poll, the performance of a pollster, and the difficulty of an election.
4. Average M5 Error As A Measure Of Pollster Performance
To begin, we identify the 15
most active state pollsters in 2004.[4] For
each contest, we calculate an M5 Error for each pollster. We then average the
M5 Errors across all the contests an individual pollster worked. We then rank
the 15 Most Active Pollsters from lowest M5 error to highest. To make it easier
to review the data unemotionally, we label the pollsters A through O.[5]
Here are the 15 most active statewide pollsters ranked by size of M5 Error:
|
|
|
|
|
|
Ranked ▼ |
|
Pollster |
# Contests |
President Contests |
State |
Ballot Measure |
Avg M5 Error |
|
Pollster A |
58 |
30 |
20 |
8 |
3.01% |
|
Pollster B |
17 |
11 |
6 |
0 |
3.09% |
|
Pollster C |
39 |
29 |
10 |
0 |
3.13% |
|
Pollster D |
10 |
8 |
2 |
0 |
3.34% |
|
Pollster E |
42 |
23 |
13 |
6 |
3.66% |
|
Pollster F |
9 |
4 |
4 |
1 |
3.82% |
|
Pollster G |
26 |
12 |
14 |
0 |
4.02% |
|
Pollster H |
7 |
5 |
2 |
0 |
4.47% |
|
Pollster I |
12 |
9 |
2 |
1 |
4.49% |
|
Pollster J |
28 |
14 |
11 |
3 |
5.45% |
|
Pollster K |
7 |
1 |
6 |
0 |
6.95% |
|
Pollster L |
6 |
1 |
1 |
4 |
7.43% |
|
Pollster M |
6 |
1 |
1 |
4 |
8.46% |
|
Pollster N |
7 |
1 |
0 |
6 |
8.48% |
|
Pollster O |
8 |
4 |
1 |
3 |
11.34% |
§
6 pollsters have
an M5 error between 3 and 4 points.
§
3 pollsters have
an M5 error between 4 and 5 points.
§
6 pollsters have
an M5 error between 5 and 12 points.
But is this a fair
comparison? Does it unfairly penalize pollsters who polled “hard” elections and
unfairly reward other pollsters who polled “easy” elections?
5. Problem: Pollsters Don’t All Poll The Same Elections
Not all pollsters work the
same types of elections. Pollsters B, C, D and I polled primarily on Presidential
contests. Pollsters L, M and N polled mostly Ballot Measures. Pollster K polled
mostly Senate elections. Pollsters A and E polled a mix of Presidential, State
Office, and Ballot Measures.
Even pollsters who polled a
similar mix of contests types polled different actual contests. Averaging
the M5 error across all polls is a valid way of comparing pollsters in
theory, but in practice we can do better.
6. Some Types Of Elections Are More Difficult Than Others
Even though the voters in
each state are approximately the same for President, State Office, and Ballot
Measures, contests on the ballot may have differing levels of voter interest,
awareness, or knowledge, and turnout may vary from state to state. Of our
universe of 446 statewide polls, 52%, were presidential polls, 36% were for
statewide office, 12% were for ballot measures.
Averaging all polls for all
contests, we confirm that some contests are indeed harder to poll than others:
|
Type Of Contest |
# Polls |
Avg
M5 Error |
|
President |
233 |
3.42 |
|
State Office |
161 |
4.62 |
|
Ballot Measure |
52 |
9.15 |
|
|
|
|
|
Summary For All Contests |
446 |
4.52 |
The average M5 Error for
Presidential contests is 3.42 points. The average M5 Error for State Office is
4.62 points, 35% greater. The average M5 error for Ballot Measures is 9.15
points, 168% greater than the M5 Error for Presidential polls.[6] These
numbers will be regarded as “difficulty measures” for each election type. A
pollster who polls only ballot measures, if he were average, would have an M5
error of 9.15. A pollster who polls only presidential contests would have an M5
Error of 3.42. The two cannot be compared side-by-side fairly without an
adjustment to account for difficulty.
7. “Par” As A Benchmark For Pollster Performance
The game of golf provides an inspiration.
Consider two golfers. Golfer
A plays the front nine and shoots a 35. Golfer B plays the back nine and shoots
a 38. They meet at the clubhouse and compare notes. Is it knowable who is doing
better, since they have not played one common hole? Yes. Using scores from
other golfers, we know (for purposes of this illustration) that Par for the
Front 9 is 33 and Par for the Back 9 is 39. Golfer A is 2 over par. Golfer B is
1 under par. Golfer B is doing better, though B has 3 more strokes than A.
Let’s apply this concept to pollsters.
Each type of contest (President, State Office, Ballot Measure) has an average
error. An average pollster should score Par. A better-than-average average
pollster should be under Par. A worse-than-average average pollster should be
over Par. If a pollster polls only Presidential elections, Par is the average
error for all Presidential polls. If a pollster polls a mix of elections, Par
is the weighted average error for each type of contest.
Because there are no
“strokes” in polling, the Leader Board that follows is not expressed with
Integers “under par” or “over par”. However, by converting each pollster’s
performance into a percentage of Par, we are able to rank the 15 most active pollsters,
where 100% is Par, percentages below 100% are Under Par and percentages above
100% are Over Par.
Here is the Leader Board for
2004: [7]
|
|
|
|
Ranked ▼ |
|
|
|
Pollster |
Expected Par |
Actual M5 |
% |
% of Par Rank, Best To Worst |
Change from M5 Error |
|
Pollster A |
4.62% |
3.01% |
65% |
1 |
Unchanged |
|
Pollster E |
4.61% |
3.66% |
79% |
2 |
+ 3 |
|
Pollster B |
3.84% |
3.09% |
80% |
3 |
- 1 |
|
Pollster F |
4.59% |
3.82% |
83% |
4 |
+2 |
|
Pollster C |
3.72% |
3.13% |
84% |
5 |
-2 |
|
Pollster D |
3.66% |
3.34% |
91% |
6 |
-2 |
|
Pollster G |
4.06% |
4.02% |
99% |
7 |
Unchanged |
|
Pollster L |
7.44% |
7.43% |
100% |
8 |
+4 |
|
Pollster N |
8.33% |
8.48% |
102% |
9 |
+5 |
|
Pollster I |
4.09% |
4.49% |
110% |
10 |
-1 |
|
Pollster M |
7.44% |
8.46% |
114% |
11 |
+2 |
|
Pollster H |
3.76% |
4.47% |
119% |
12 |
-4 |
|
Pollster J |
4.50% |
5.45% |
121% |
13 |
-3 |
|
Pollster K |
4.44% |
6.95% |
156% |
14 |
-3 |
|
Pollster O |
5.72% |
11.34% |
198% |
15 |
Unchanged |
Pollster E jumps from 5th
place to 2nd. Pollster B, C and D do worse. Pollsters L and N jump
from the bottom of the pack to the middle. Pollster A is still 1st.
Pollster O is still last.
8. Problems With “Par”
The concept of Par is
helpful. It is simple to calculate, and has a clear interpretation. But it is
open to several criticisms.
a)
b)
It does not
account for any polling difficulties unrelated to the simple 3-way
classification of elections by type. Some ballot measures are more complex than
others; some states may be harder to poll than others because of turnout
differences or sampling issues.
c)
A pollster’s
score can be influenced by contests the pollster did not work, since “Par” is
calculated using the scores of all pollsters in all like contests.
The first point has the
biggest impact. For example: if I poll one Senate election and one Ballot
Measure, Par is 6.9. If in fact I score Par on the Senate election (4.6 point
Error) and get the Ballot Measure exactly right (0.0 point Error), my Error is
2.3, which is 33% of Par. But if I score Par on the Ballot Measure (9.2 pt
Error) and get the Senate election exactly right (0.0 point Error), my Error is
4.6 pts, or 67% of par. Getting the Senate election right is worth only half as
much as getting the Ballot Measure right.
Is there another approach
that overcomes these limitations?
9. A Rank-Order Based Measure For Pollster Performance
When Olympic judges want to
determine who is the best overall figure skater, they use the ordinal rankings
for the contestants in each part of the competition. Olympic judges do not just
average the scores of skaters from different judges, or across events, but
rather, they determine the winner based on the rank-order finish of skaters for each part, which are themselves
determined by the rank-orderings of each judge.
The reason the Olympics use
ordinal rankings is to make sure each judge has an equal influence, and each
part of the competition gets the appropriate weight. Otherwise, a very high or very low score
could have a disproportionate influence.[8]
Is there a polling
equivalent? For each contest, we can examine all of the competing polls and
rank them from best to worst. Why does this advance our understanding? Well, a
pollster might poll contest x and have an M5 error of 3.0, and be the
best pollster, but might poll contest y and have an identical M5 error
of 3.0, but be the worst pollster. This is not a contrived example; it
happened: In the contest for Governor of Washington, pollster B’s 3-pt M5 error
was the best of 5 competing pollsters. In the contest for U.S. Senate in
Now, the number of competing
pollsters varies by contests. To ensure that all contests have equal weight, we
devised a system that takes all of the pollsters for any contest and converts
their ordinal rank into a number from 0.0 to 1.0, where 0.0 is the best
possible score and 1.0 is the worst possible score. If 2 pollsters work a
contest, the one with the smaller M5 Error gets a score of 0.0 and the one with
the larger M5 Error gets a score of 1.0. If 3 pollsters work a contest, the
best gets a score of 0.0, the middle gets a score of 0.5, and the worst gets a
score of 1.0; and so on. Ties are accommodated by averaging the scores of the
places tied for.
Here is how this works in our
two examples (actual returns are rounded for illustrative purposes):
|
|
WA Governor |
Vote |
Pollster B |
Pollster E |
Pollster A |
Pollster P |
Pollster Q |
|
|
Gregoire (D) |
49% |
47% |
48% |
45% |
45% |
51% |
|
|
Rossi (R) |
49% |
44% |
43% |
51% |
38% |
39% |
|
|
Victory Margin |
0 |
3 |
5 |
-6 |
7 |
12 |
|
|
M5 error |
n/a |
3 |
5 |
6 |
7 |
12 |
|
|
Ordinal rank |
n/a |
1st |
2nd |
3rd |
4th |
5th |
|
Ranked Low to High ► |
Ordinal score |
n/a |
0 |
0.25 |
0.5 |
0.75 |
1 |
|
|
|
|
|
|
|
|
|
|
|
CA U.S. Senate |
Vote |
Pollster A |
Pollster R |
Pollster O |
Pollster S |
Pollster B |
|
|
Boxer (D) |
58% |
57% |
53% |
55% |
53% |
53% |
|
|
Jones (R) |
38% |
36% |
34% |
33% |
35% |
36% |
|
|
Victory Margin |
20 |
21 |
19 |
22 |
18 |
17 |
|
|
M5 error |
n/a |
1 |
1 |
2 |
2 |
3 |
|
|
Ordinal rank |
n/a |
Tie for 1st |
Tie for 1st |
Tie for 3rd |
Tie for 3rd |
5th |
|
Ranked Low to High ► |
Ordinal score |
n/a |
0.125 |
0.125 |
0.625 |
0.625 |
1 |
In
In
The new Ordinal Rank Measure
for each pollster is the average of the pollster’s scores for each
multi-pollster contest he polled. Contests with only one pollster will be
ignored.[9]
The
Ordinal Rank Measure defined here fixes all three problems with Par. Election contests are weighted equally, difficulty is
determined individually for each contest, and a pollster’s score does not
depend on what happened in contests he did not poll.
10. Pollster Performance Using Rank-Order Measure
Here is how the pollsters
stack up using Rank Order Measure, where a score of 0.0 means the pollster was
consistently the best in every event he/she entered, and a score of 1.0 means
the pollster was consistently the worst in every event he/she entered.
|
|
|
|
Ranked ▼ |
|
|
|
Pollster name |
# Multi-Pollster Contests |
Avg Ordinal Score |
Ordinal Ranking Best to Worst |
Compared |
Compared |
|
Pollster A |
55 |
0.333 |
1 |
Unchanged |
Unchanged |
|
Pollster E |
39 |
0.362 |
2 |
Unchanged |
+3 |
|
Pollster D |
10 |
0.388 |
3 |
+ 3 |
+1 |
|
Pollster C |
39 |
0.399 |
4 |
+1 |
-1 |
|
Pollster B |
17 |
0.458 |
5 |
-2 |
-3 |
|
Pollster F |
9 |
0.477 |
6 |
-2 |
Unchanged |
|
Pollster G |
22 |
0.529 |
7 |
Unchanged |
Unchanged |
|
Pollster H |
7 |
0.589 |
8 |
+4 |
Unchanged |
|
Pollster L |
5 |
0.633 |
9 |
-1 |
+3 |
|
Pollster I |
12 |
0.636 |
10 |
Unchanged |
-1 |
|
Pollster J |
28 |
0.660 |
11 |
+2 |
-1 |
|
Pollster O |
5 |
0.733 |
12 |
+3 |
+3 |
|
Pollster K |
6 |
0.887 |
13 |
+1 |
-2 |
|
|
|
|
|
|
|
|
Pollster M |
2 |
0.367 |
n/a |
||
|
Pollster N |
2 |
0.714 |
n/a |
||
Pollster M and Pollster N are
included in the table, but are listed separately and not given an Ordinal Rank,
because the number of contests each polled where there was more than one
pollster is now just 2 – too few cases in our judgment to merit an Ordinal
Rank.
The following differences are
noted:
a)
Compared with the
original “M5 error” performance measure, the pollsters in 2nd through
5th place have reversed their relative order.
b)
Compared with the
“% of par” measure, Pollster D has vaulted past Pollsters B, C, and F. One possible way to interpret this is that Pollster D chose relatively “difficult
contests” to do, even though its “Par” error value was lower than anyone
else’s because 8 of its 10 polls were Presidential.
c)
Pollster L, whose
100% of par was average, has a below-average ordinal score of 0.633. This is
because other pollsters usually did better on
the specific elections Pollster L worked: Pollster L had an above-average
rank in only 1 of 5 individual contests.
If Medals were awarded (using
the date-inclusion criteria here adopted) and including the 15 most active
state pollsters, the winners for 2004 would be: [10]
11. Conclusion
In measuring the performance
of election pollsters, it is inappropriate to average a measure of
accuracy for individual polls, because some elections are harder to poll than others.
Calculating a “% of Par”
score gets us closer to an understanding of how well pollsters performed.
Limitations of the Par score
approach can be addressed by using a Rank Order Measure based on the ordinal
rankings of how pollsters performed relative to other pollsters.
Appendix
|
Sort |
|
|
|
|
ID |
Pollster Name |
Location |
Website |
|
A |
SurveyUSA |
|
|
|
B |
Strategic Vision |
|
|
|
C |
Rasmussen Reports |
|
|
|
D |
American Research Group |
|
|
|
E |
Mason-Dixon Polling & Research |
|
|
|
F |
Opinion Dynamics |
|
|
|
G |
Research 2000 |
|
|
|
H |
Market Shares |
|
|
|
I |
|
|
|
|
J |
Zogby International |
|
|
|
K |
Global Strategy Group |
|
|
|
L |
Wilson Research Strategies |
|
|
|
M |
Schroth & Associates |
|
|
|
N |
Davis & Hibbits |
|
|
|
O |
|
|
|
|
Sort |
|
|
|
ID |
Pollster Name |
Location |
Website |
|
D |
American Research Group |
|
|
|
N |
Davis & Hibbits |
|
|
|
K |
Global Strategy Group |
|
|
|
I |
|
|
|
|
O |
|
|
|
|
H |
Market Shares |
|
|
|
E |
Mason-Dixon Polling & Research |
|
|
|
F |
Opinion Dynamics |
|
|
|
C |
Rasmussen Reports |
|
|
|
G |
Research 2000 |
|
|
|
M |
Schroth & Associates |
|
|
|
B |
Strategic Vision |
|
|
|
A |
SurveyUSA |
|
|
|
L |
Wilson Research Strategies |
|
|
|
J |
Zogby International |
|


[1] There is a
tension in selecting a start date for inclusion. If you choose a date that is
too early, you risk including in your data set “stale” polls that may well have
reflected opinion at the time they were conducted but not opinion at the time
of the election. If you choose a date that is too late, your data set is so
small that you may have trouble drawing conclusions from it. The authors
examined three possible cutoff dates. A) October 1, 2004, which is the date
ultimately used in the preparation of this paper. B) October 19, 2004, which is
two weeks to Election Day, and C) October 26, 2004, which is one week to
Election Day. Had we chosen 10/19/04 as the cutoff date for inclusion, the
universe of polls available for examination would have shrunk from 446 to 385
(14% decline), and the universe of elections available for examination would
have shrunk from 120 to 115 (4% decline). Had we chosen 10/26/04 as the cutoff
date for inclusion, the universe of polls available for examination would have
shrunk from 446 to 285, and the universe of elections available for examination
would have shrunk from 120 to 98. Also, the universe of elections with
more than one competing pollster, which is what is used for the Ordinal Rank
measure, would have shrunk from 84 to 62. In addition, unique to 2004, there
was a tension in choosing the back-end date for inclusion. In 2004, one
pollster released polls at 5 pm ET on Election Day, after voting had begun and
after early exit polls were widely available. The authors excluded these polls
from consideration, and included the pollster’s penultimate predictions.
.
[2] Analysis herein
is based on M5, but the steps would work the same way had we chosen any numerical
measure of poll accuracy.
[3] In a hypothetical election: if the Republican wins
51.1% to 48.3%, the spread between is 2.8 points. A pollster who predicted the
Republican would win 54% to 45% had a 9-point spread. His M5 Error is (9-2.8) =
6.2 points. A pollster who predicted the
Democrat would win 49% to 47% has a M5 Error of -((-2)-2.8) = 4.8 points (we
change the sign so the error measure always comes out greater than or equal to
zero).
[4] The same analysis
is possible choosing a different cutoff for “minimum number of contests worked.”
In this case, we chose “more than 5” rather than a higher or lower cutoff,
because it provided a large enough group of pollsters to illustrate all
significant aspects of the proposed measures, while ensuring that each
pollster’s ranking was based on a large enough sample of polls to give
statistically representative averages.
[5] The names of the polling firms are included in the Appendix.
[6] These errors
assume a Release Date inclusion period beginning 10/1/04. Had we chosen 10/19/04 as the Release Date inclusion
cutoff, the M5 Average Error for President declines from 3.42 % to 3.21%. The
M5 Average Error for State Office declines from 4.62% to 4.34%. The M5 Average
Error for Ballot Measures declines from 9.15% to 8.99%. The M5 Average Error
for all contests declines from 4.52% to 4.31%. Had we chosen 10/26/04 as the
Release Date inclusion cutoff, the M5 Average Error for President declines to
2.99%, the M5 Average Error for State Office declines to 4.00%, and the M5
Average Error for Ballot Measures declines to 8.20%. The M5 Average Error for
all contests declines to 3.89%.
[7] Had 10/19 been used as the Release Date inclusion cutoff, instead of 10/1, the Leader Board would be: A, E, B, C, F. Had 10/26 been used as the Release Date inclusion cutoff, the Leader Board would be: A, E, B, C, D.
[8] For example, suppose that in the “short program” there are five judges, all of whom give skater x a score of 5.5 out of 6.0. Four judges give skater y a score of 5.4, but the remaining judge gives skater y a perfect score of 6.0. If the raw scores were averaged, skater y would beat skater x 5.52 to 5.50, even though four out of five judges prefer x to y. When the scores are converted to ordinal rankings, x gets four 1st-place votes and one 2nd place vote, for an average ordinal of 1.2, while y gets one 1st place vote and four 2nd-place votes, for an average ordinal of 1.8. x wins. With ordinals, it is impossible for one judge to unfairly increase his influence by inflating or depressing scores – the most a judge can do is put one skater 1st and another skater 2nd.
[9] This is an
unavoidable omission. Contests with
only 1 pollster don’t give any information about whether that pollster was
better or worse than average for that election. It would be possible to
assign each pollster a score of 0.5 for all contests where there were no
competing polls; but this would unfairly penalize a better-than-average
pollster for taking on elections nobody else is doing, no matter how accurate
he is on them, and it would also unfairly reward a worse-than average pollster for
taking on elections nobody else is doing, no matter how in-accurate he
is on them.
[10] Had a 10/19 Release Date inclusion cutoff been used, instead of 10/1, the medalists would have been: Gold: Pollster A, Silver: Pollster C; Bronze, Pollster D. Had a 10/26 Release Date inclusion cutoff been used, the medalists would have been: Gold: Pollster C; Silver: Pollster A. Bronze: Pollster D.