On Nov. 17, 2020, Professor of Mathematics Steven J. Miller signed an affidavit providing estimates of how many ballots in Pennsylvania were requested in the name of a registered Republican by someone other than that person or requested and returned but not counted. The estimate he provided was shockingly high and purported to point to electoral fraud in the state. Miller’s statement was not only picked up by right-wing media but tweeted by President Trump himself to millions of his followers. I believe Miller’s analysis to be irresponsible, wrong and an example of statistical malpractice.
Miller received the original data set from Matt Braynard. Braynard was the leader of the data team for Trump’s 2015 and 2016 campaigns, and recently set up a voter fraud project asking for half a million dollars on the website GoFundMe before GoFundMe removed the page, criticizing it for attempting “to spread misleading information about the election.” When it was pointed out to Miller that he had neither asked for any verification of the data nor had he inquired about the procedures by which it was collected, he updated his statement using meaningless phrases such as, “the analysis is based on the data I received being accurate and a representative sample of the population.”
When receiving data from such an obviously biased source, a simple disclaimer that “I believe the data” is meaningless. The statistician who performs the analysis is required to either validate the data or to provide disclaimers and limits to the analysis when that is not possible. To not do so is unethical. It is not enough to simply state that the analysis is based on (untenable) assumptions.
After receiving this and similar feedback, on both statements, Miller apologized for his “lack of clarity and due diligence.” While this was an admirable step forward, he continually repeated his disclaimer, concluding: “the extrapolated numbers here are significant.” This amounts to statistical malpractice.
Perhaps an analogy will help:
Imagine a company, “Alien Travel,” that sets out to prove that space travel is not only possible but that many people in their town of 100,000 have already flown in flying saucers with aliens.
To do this, they find a list of 1000 people (by means that they keep secret) who have already expressed interest in aliens and UFOs. They try to survey these citizens to ask about experiences with alien travel, but not surprisingly, most hang up on them. Undaunted, they continue down the list until they manage to convince 100 people to answer some questions. Of these 100, they find 10 who enthusiastically say, “Yes, I have been in a flying saucer with aliens.”
Not content to tell the media that they’ve found 10 people who have been in space, the company finds a consultant willing to help them with their mission. They tell the consultant that the data are from a random sample of the town. The consultant takes their word for it and gets to work, proudly asserting in his report that “the analysis is predicated on the assumption that the responders are a representative sample of the population.” He then applies a simple formula that provides an estimate of the true percentage of alien travelers in the town — under the assumption that the 10 responders are representative not only of the 100 people contacted, but the 900 people who didn’t answer the calls, and the 99,000 who were not surveyed! Flaunting his expertise as a respected mathematician he claims, with great fanfare, that “Thus we estimate there are between 4000 and 16,000 people in our town who have actually been in a flying saucer with aliens.”
The consultant has managed to successfully apply a formula taught in STAT 101 to these data. Unfortunately, he apparently didn’t take the rest of the course that teaches that applying this formula to a biased sample is not only irresponsible and unethical, but wrong. And worse, most of the public will breeze past the caveats about assumptions and focus on the sensational and ridiculous conclusion, especially since the consultant has been paraded as a “respected mathematician.” This would be a comical mistake if it didn’t have real-world consequences. Unfortunately, the public, especially those who want to believe in alien travel, will not understand the mistake or the difference between mathematical and statistical reasoning and expertise.
Returning to Earth for a moment, we similarly find the key statement in Miller’s report — “I estimate that almost surely (based on the data I received) that the number of ballots requested by someone other than the registered Republican is between 37,001 and 58,914, and almost surely the number of ballots requested by registered Republicans and returned but not counted is in the range from 38,910 to 56,483” — to be completely without merit. A confidence interval built from biased data is worthless.
To summarize, I disagree with Miller’s assessment and believe that any estimates based on unverifiable or biased data are inaccurate, wrong and unfounded. To apply naïve statistical formulas to biased data and publish this is both irresponsible and unethical. The onus is on the statistician to report the limits and problems with the analysis, not to proceed with an inappropriate methodology based on false assumptions, trying to pass the buck back to the person sponsoring the project. To do otherwise is dangerous and violates at least 7 out of 10 of the following guidelines for ethics as laid out and published by the American Statistical Association, specifically (#1,2,4,5,6, 8 and 9):
The ethical statistician:
Acknowledges statistical and substantive assumptions made in the execution and interpretation of any analysis. When reporting on the validity of data used, acknowledges data editing procedures, including any imputation and missing data mechanisms.
Reports the limitations of statistical inference and possible sources of error.
In publications, reports, or testimony, identifies who is responsible for the statistical work if it would not otherwise be apparent.
Reports the sources and assessed adequacy of the data, accounts for all data considered in a study, and explains the sample(s) actually used.
Clearly and fully reports the steps taken to preserve data integrity and valid results.
Where appropriate, addresses potential confounding variables not included in the study.
In publications and reports, conveys the findings in ways that are both honest and meaningful to the user/reader. This includes tables, models, and graphics.
In publications or testimony, identifies the ultimate financial sponsor of the study, the stated purpose, and the intended use of the study results.
When reporting analyses of volunteer data or other data that may not be representative of a defined population, includes appropriate disclaimers and, if used, appropriate weighting.
To aid peer review and replication, shares the data used in the analyses whenever possible/allowable and exercises due caution to protect proprietary and confidential data, including all data that might inappropriately reveal respondent identities.
Strives to promptly correct any errors discovered while producing the final report or after publication. As appropriate, disseminates the correction publicly or to others relying on the results.
Richard D. De Veaux, C. Carlisle and M. Tippit professor of statistics and associate chair of statistics, has been at the College since 1994. He is the 2018-2021 Vice President of the American Statistical Association.