How overconfidence hurts the polling and forecasting industries
Some reflections on an attention-grabbing forecasting method and thoughts about how election forecasting actually works
How quantitative forecasting works:

The Takeaway: As we approach the 2020 election, beware of attention-grabby, overconfident “forecasts” that purport to be better than all of the other ones. Listen only to them and you might just get burned. In this email I’ll lay out my objections to an existing forecasting model and explain my passion for proper calibrations of (un)certainty.
Editor’s note:
Thanks for reading my thoughts on this subject. And thanks for subscribing! Your membership adds up and makes all this newslettering possible (reminder: I do all this work independently). Please consider sharing online or with a friend; the more readers, the merrier. Remember that, apart from getting special articles, subscribers can also comment below each post and participate in exclusive threads.
As always, send me your tips about what you’d like to read about next. Or what you don’t want to read. Or your feedback otherwise. Also, cat pictures are nice, so please send them to me! I’m elliott@thecrosstab.com, or @gelliottmorris on Twitter.
Thanks all,
—Elliott
I have been putting off the discussion of a prominent political scientist’s forecasting work for a while, but now I feel I need to properly explain my objections and intent. I fear I have been harmfully unclear. I will try to do lay out my criticism without inflaming passions or invoking biases. My conversation with this scholar has so far been quite ugly and I want to use this forum to re-focus on the methods and get away from personal feuds. Please note that I have directed similar criticisms at other analysts before.
Dr Rachel Bitecofer is an assistant director of the Wason Center at Christopher Newport University in Virginia, pollster, and the developer of a new forecasting model for the 2020 election. Her model uses linear regression to predict Democrats’ two-party vote share using various demographic and political variables. This is how most forecasting models work—specifically, the “fundamentals” stage which seeks to provide a prior or baseline expectation for the race (polls are later added on top of these fundamentals).
Dr Bitecofer’s method performed admirably in 2018, foreseeing 42 pickups for the Democrats (they got 41) months ahead of the contest. But it is also worth noting that the final fit of her predictions have an R-squared (a measure of the match between two variables) of just 0.7. For context, using the results from the 2016 election alone would get you an R^2 of 0.94. And poll-based forecasts, even early, explained similar shares of variance (mine included).
Further, I am not sure how well a model trained to predict the 2018 congressional elections can predict the 2020 presidential contest. Specifically, though Dr Bitecofer asserts she has added a correction to account for the fact that the election was atypically pro-Democratic (it is unlikely we will Democrats win the the 2020 popular vote by 9 percentage points), she hasn’t said how. Additionally, such a model necessarily assumes that the relationship between demographics and vote share will hold constant from election year to election year. I am not so sure about this. How well would a model trained to predict the 2010 midterms do at predicting the ones in 2012?
Yet, despite this uncertainty, Dr Bitecofer has characterized her prediction as a near blowout victory for Democrats. She wrote an op-ed for the New York Times last month titled “Why Trump Will Lose in 2020” (my emphasis). She told me via email that, barring a ground war in Iran or an alien assuming Trump’s body, that he will not be re-elected. I believe this confidence to be even more misleading than the method itself. Dr Bitecofer’s model shows that Democrats are projected to win a slim victory with 279 electoral votes next fall. If you allocate electoral votes probabilistically (so we can add toss-ups to their tally as well), that jumps to 300. Even if you ignore all the uncertainty that results from (a) producing an estimate this early, (b) assuming that model coefficients that connect demographics to vote choice won’t change over time, (c) predicting state-level demographic change, and (d) predicting presidential elections using congressional election results, such a projection is certainly not the guaranteed victory for the Democrats that she is advertising.
I have sent these criticism to Dr Bitecofer directly and shared them Twitter, as I think the public needs to hear them. Here is what I tweeted. In retrospect, I believe I came on too strong, and I apologized to Dr Bitecofer for doing so. Our conversation plays out in the mentions.


Here is why I am so passionate about this.
Pollsters and forecasters alike got burned badly in the aftermath of 2016, even though polls were very accurate in historical context and probabilistic forecasts (the good ones, anyway) gave pretty good odds to the chance Trump would win. I would like to avoid getting burned this badly again, so I am raising the point that Dr Bitecofer is advertising her model with too much overconfidence, especially given how it is designed under the hood.
If polling and forecasts are misinterpreted again in 2020, I (and others in the industry with whom I’ve spoken) fear that the credibility of polling will be dramatically undermined—again. I would like to avoid this.
The bottom line is that there is a lot of uncertainty in Dr Bitecofer’s approach that she is not addressing. I have asked her to do so. Hopefully, she will listen or prove me wrong.