An earth scientist colleague wrote to me this week to ask about the election. In the climate-forecasting business, he wrote, one often uses “persistence”—that is, the assumption that conditions remain unchanged from one year to the next—as a control condition and basis for comparisons. He wanted to know what would happen if you applied the same logic to electoral politics: Were this year’s poll-based predictions any better than what you’d get by guessing that the 2016 results would repeat themselves?

Read More

illustration of 2020 in red and blue

My quick answer was, no, the persistence method would not have worked. If you’d just copied the 2016 results, you would have had a Republican victory, and as of Thursday it looks like Joe Biden won the presidential election with victories in many key states and a slightly higher share of the national vote than Hillary Clinton received four years ago. But we can do better than that. Political scientists have developed models that do a good job of forecasting the national vote based on so-called “fundamentals”: key variables such as economic growth, approval ratings, and incumbency. If we’d taken one of these models and adjusted it based on the parties’ vote shares from 2016 (as opposed to using recent polling data), we would have projected a narrow Biden win, and likely ended up closer to the mark than any guess derived from the famous poll averages. Even better, we would have done so at a fraction of the cost.

I say this as a co-creator of one of those famous—or maybe I should say “notorious”—poll averages. Our election forecast at The Economist ended up predicting Biden would win more than 54 percent of the two-party vote, and gave him a 97 percent chance of winning the electoral college. Given the closeness of the election, we’re now feeling a bit uncomfortable with that latter claim. On the other hand, the popular vote, electoral vote, and vote shares in all or almost all the states (including Florida!) seem to have fallen within our 95 percent uncertainty intervals—so maybe it’s fairer to say that we successfully expressed our uncertainty.

The question here, though, is whether polling and forecasting are a waste of time and resources, given that, at least in this election, we could’ve done better with no polls at all. We should be able to study this using our forecasting model. It’s Bayesian, meaning that it combines information from past elections, a fundamentals-based forecast, and polls during the campaign.

One thing I can say with some confidence is that we currently have too many polls—too many state polls and too many national polls. At some point, polling a state or the country over and over again has diminishing returns, because all the polls can be off—as we saw in several states this election.

Then again, I’m not paying for the polls. Many surveys are done by commercial pollsters who make money by asking business-related questions on their surveys. Election polling serves as a loss leader for these firms, a way for the polling organization to get some publicity. The good thing about this system is that the pollsters have an economic motivation to get things right. For example, the fact that the Selzer poll performed so well in Iowa, predicting a strong Republican finish, should be good for their business.

But this logic led me and others to be too sanguine about poll performance in this election. Sure, some key state polls bombed in 2016, but the pollsters learned from their mistakes, right? They did fine in 2018. The wide uncertainties in our 2020 forecast were based on our historical analysis of state-level polling errors, and they came in handy this time, as they allowed our prediction intervals to include the ultimate election outcomes despite the poll failures.

SUBSCRIBE

Image may contain Rug
Subscribe to WIRED and stay smart with more of your favorite Ideas writers.

What went wrong with the polls this year? It wasn’t just Donald Trump. The polls also systematically understated the vote for Republican congressional candidates. We can’t be sure at this point, but right now I’m guessing that the big factors were differential nonresponse (Republican voters being less likely to respond to polls and Democratic voters being more likely) and differential turnout (Republicans being more likely to go out and vote). We had a record number of voters this year, and part of this was Republicans coming out on Election Day after hearing about record early voting by Democrats. Other possible reasons for discrepancies between the polls and the vote include differential rates of ballot rejection and last-minute changes in opinion among undecided voters.