Analytics Journal: Three Lessons for Pollsters and Business Analytics Leaders from the U.S. Election

Nov 11, 2016

Like many other Americans who went to bed on election night prematurely, I learned about Donald Trump’s stunning victory in the U.S. presidential election on my phone early in the morning. The result was unambiguous but shocking and hard to process, especially at 5 a.m.

But also like many other Americans, my shock wasn’t driven by a lack of awareness of Americans’ prevailing anti-establishment mindset and desire for change that tilted the vote (I’ve resided in three of the four key Midwestern states that “flipped”), but by the disconnect between the final result and the longstanding, data-driven expectation we had overly trusted.

In short, Americans and the world were surprised by the election’s outcome primarily because the data we saw, and the vast majority of statistical models based on that data, kept many of us from anticipating (or even imagining) the outcome we got.

What an unthinkably wild swing for data and analytics in just one week! Five days after baseball pundits conceded the new value of analytics in their old sport, fueled by the Chicago Cubs’ stats-minded approach to winning a historic World Series title, political writers quickly decried data geeks and statistical models for grossly erring in their predictions.

After the news sank in, my mind quickly went to the polls, pollsters, and statisticians whose detailed analyses and beautiful data visualizations I’ve been tracking for months. Election forecasting sites like FiveThirtyEight and New York Times’ Upshot, with their daily ingestion of new polling data and frequently updated probability tables, were my gateway to the election and the diverse, disparate American population underlying it.

Undeniably, most forecasts did miss greatly with their expected election outcomes, due to a variety of polling factors and voter behaviors that will be dissected for months.

But the more I considered the criticism showering on the polling community, the more I could look past their data and forecasts, and see a different overarching failure. I reflected on some of my own career experiences in science, analytics, and teaching, and I thought about the pressing challenges facing many analytics leaders I interface with at the International Institute for Analytics.

I realized that business analytics leaders and pollsters often face the same basic challenge: how to deliver statistically rich data and forecasts most clearly and effectively. Considering how polling data was presented, utilized, and digested by “stakeholders” (i.e., journalists and voters), analytics folks certainly can relate to the gaps in understanding along the journey to increasing data-driven decisions in their organizations.

I see three critical lessons from the election prediction cycle that are equally relevant to business analytics leaders.

First, present data in ways that are mindful of the majority of consumers’ needs and capacities for digesting the information. A natural reaction by many to vast amounts of data and hundreds of scenarios is to focus on the most expected outcome (and maybe more so when the outcome meets our preference). We teach in statistics that the mean is usually the best single indicator of a distribution, and in this election the mean forecast was quite consistent in its support of Hillary Clinton. This “storytelling” aspect of data can be hard to master, but it can be the most important aspect.

For example, a busy frequency distribution that tries to account for every electoral possibility looks impressive and smart, but it might be better presented as a few buckets of scenarios, with an emphasis on relatable analogies that illustrated the “1 in 5” (for example) chance of a Trump win. I would partly attribute the post-election protests in some cities to the failure in clear storytelling of polling and forecasts that helped create the shock.

(One self-awareness of this lesson is that I and other stats-trained data visualization experts might not be the best judges of how gorgeous data visualizations are truly understood.)

Second, analytics leaders and pollsters need to increase their stakeholders’ comfort level with forecast uncertainty. In an election context, margin for error is usually clearly stated, but that doesn’t mean that it’s understood. For a business analogy, consider a retail analytics team’s weekly sales forecasts for products with a lot of sales volatility, and the perception issues with company leaders who don’t appreciate the innate challenge of forecasting the future based on the past.

Said another way: if FiveThirtyEight’s Nate Silver, a respected election forecaster who rose to fame in 2008 for correctly calling nearly every state in the presidential election, were a chief analytics officer or product owner in a company with less tolerance or understanding of model uncertainty, he might be out of a job this week. The most trying days of my early career in analytics arose from the need to clearly communicate the uncertainty in my recommendations. Education is key here, too, but knowing the audience is even more key.

Finally, systematic error that arises from unforeseen factors or biases can be a killer in both polling and business analytics. In the case of the election, the 2-4% systematic shift in sentiment between polling samples and actual voters seems to have driven the difference between expected and actual outcome, and that shift was enough to tilt the election. Cases abound in business analytics of disruptive factors greatly affecting a model’s accuracy, and the business’s performance as a result.

How to handle such biases? One approach is to develop analytical models highly iteratively, to increase the agility of incorporating new information and keep the data, assumptions, and methods from becoming obsolete or stale. An “agile” approach to analytics work can be a potent tool for reducing the biases that arise in models over time. As a bonus, iterative model development can allow other experts into the process, which also can increase accuracy.

As Americans adjust to the election’s shocking outcome, polling organizations will be digging into the causes of why the outcome was so surprising to most of us. Recognizing that some of the surprise wasn’t directly due to the polls or models themselves, but rather to how they were shared with and understood by the public, will be crucial to staving off future widespread shock. For business analytics leaders as well, the same lessons are vital.