Ballot Image Test Data: Minnesota Challenged Ballots

Obtaining access to hand-marked ballots created by voters for use in real elections has been problematic due to various legal constraints. Fortunately, a recent turn of events created an unprecedented opportunity to address this situation. We have assembled a large-scale dataset consisting of voter-marked ballot images from the 2008 General Election that was challenged due to the contested Senate Race in the State of Minnesota. So far as we are aware, this is the first such collection to ever been made openly available and hence is an invaluable resource to those wishing to develop better image processing and pattern recognition methods for reading op-scan ballots. We make this data freely available as a service to the research community. Further details appear below.

Because of the large size of this dataset, totaling 6,737 ballots, ground-truth is not yet available. As described in a recent paper we have submitted to the Ninth IAPR International Workshop on Document Analysis Systems, we are in the process of collecting ground-truth, but this task is laborious and likely to take a long time. If you wish to become involved in the truthing activities, please let us know.

The 2008 General Election in the United States took place on November 4. Among the races decided that day was the presidential election in which Barack Obama was elected the 44th U.S. President.  In the State of Minnesota, in addition to the presidency and a number of state-wide and local races, citizens also voted to elect a U.S. Senator. Five candidates were listed on the ballot. In the initial tally, Republican Norm Coleman received 1,211,590 votes (41.988% of the votes cast) while Democrat Al Franken received 1,211,375 votes (41.981% of the votes cast). Because of the closeness of the race, a mandatory recount was ordered.

In the process of performing recounts, representatives from either candidate were able to challenge individual ballots for not meeting the legal requirements set by the state. Ballots that had been challenged were scanned and placed online so that the public would have an opportunity to view them.  A number of websites made the ballot images available, including the site for Minnesota Public Radio.  While a short video demonstrating the scanning process can be found on YouTube, only minimal technical details can be deduced. Ballots were first photocopied and the originals stored in a secure location. The photocopies were then scanned to PDF using an auto-feeder equipped flatbed scanner. The ballot was two-sided, with both sides scanned simultaneously.

To collect all of the ballots from the MPR website, we wrote a simple web "crawler" that automatically downloaded the files, saving them under their original file names. Another program was then used to extract the images from the PDF, saving the front and the back of each ballot as a separate TIF file. There are a total of 6,737 ballots in the set. Examination of the TIF suggests that the ballots were scanned at 300 dpi bitonal, and that lossy compression was never used in the handling of the files. Hence, they form an ideal dataset for document analysis research.

We have built a graphical tool to support the ground-truthing of ballot images. BallotTool contains a collection of useful software components for manipulating ballot images and their associated metadata. The BallotTool graphical user interface (GUI) is written in the popular Tcl/Tk scripting language with versions that run under both the Linux and Microsoft Windows operating systems, where it also makes use of the standard Netpbm open source toolkit for manipulating image files. Below is a screen snapshot of BallotTool displaying a partially annotated ballot image.

BallotTool screen snapshot

BallotTool system displaying a partially annotated ballot from Aitkin County

Below we provide links to the draft of our DAS 2010 paper describing the dataset and the ground-truthing process
. We also provide a link to the current guidelines we are using for ground-truthing. Finally, we provide a link to the raw ballot image files in both PDF and TIF format.

Button "Document Analysis Issues in Reading Optical Scan Ballots" (draft)

Button Instructions for Ground-Truthing OpScan Ballot Images

Button Minnesota Challenged Ballot Image Files

PERFECT is an acronym that stands for "Paper and Electronic Records for Elections: Cultivating Trust." PERFECT is a multidisciplinary research effort aimed at studying the reliable processing of paper ballots and other hardcopy election records. Participating institutions include Lehigh University, Boise State University, Muhlenberg College, and Rensselaer Polytechnic Institute. Click here to return to the PERFECT homepage.

NSF logo
PERFECT is funded in part by the National Science Foundation under award numbers NSF-0716368, NSF-0716393, NSF-0716647, NSF-0716543. Any opinions, findings, and conclusions or recommendations expressed on this website are the investigators' and do not necessarily reflect those of the National Science Foundation.