Are Democratic primary polls actually rigged against non-white voters?
Probably not, but people are still making dishonest arguments that they are
Last week, the Washington Post’s political science blog, the Monkey Cage, had an article by Matt Barreto, a professor at UCLA, that alleged that polls of the Democratic primary were “undersampling” black and Hispanic voters, decrease aggregate support for the candidates they liked and thus keeping non-white candidates off the debate stage. All respect to Dr Barreto, but this claim is dubious at best and clearly diverges from the evidence we have available. The article is misleading in several ways, which I will discuss presently:
Representativeness depends almost entirely on the benchmark, and Barreto’s might not be the best one
Barreto “unskews” the polls by first using the American National Election Study (ANES) to create a baseline expectation for the racial composition of the Democratic primary electorate and then re-weights public polling data to match his targets. The math here isn’t necessarily wrong, but the benchmark very well might be.
According to Barreto, the ANES shows that roughly 55% of 2016 Democratic primary voters were white. This number differs wildly from both the Cooperative Congressional Election Study (CCES), a huge voter-validated survey that interviews plenty of Spanish voters and shows the white percentage at 62%, and data from YouGov, a reliable online pollster that polled both the 2016 and 2020 Democratic primaries and also finds the white share to be 62%. It seems this is the primary disagreement between Barreto and his critics (myself included), who argues that the CCES and YouGov aren’t representative of Spanish-speakers because they are online surveys. That may be true for some non-probability polls, but YouGov is one of if not the best at producing estimates of attitudes among people of colour. The 2018 exit poll also has the democratic electorate at 60% white.
So, starting off, Barreto is using the benchmarks (the ANES) that will serve his narrative the best and he’s treating it with far too much certainty. It’s not clear at all to me that this central premise that “most polls misrepresent the Democratic electorate” is true whatsoever.
It’s worth noting that YouGov’s 2020 primary polls for The Economist match the racial composition of their 2016 polls very, very well.



Complaining about “oversampling” ignores that polls are weighted, often in complex ways
Throughout the piece, Dr Barreto claims that polls have too few people of color in them to be representative of the electorate. I have already argued that it is too hard to determine what “representative” is given the disagreement between polls and methods, but I’ll also make the point that his arithmetic is based on the wrong numbers throughout the piece. Barreto constantly uses polls’ unweighted racial frequencies as the baseline for his comparison, when the relevant number is the weighted frequencies (the ones that actually determine a poll’s topline).
For example, he claims that a poll from Monmouth University is 71% white, based on the publicly-available tables from their website when the actual number is 58% when weighted. That’s much, much closer to Barreto’s “target”, and decreases the “bias” he estimates by about 20%.
What Barreto is doing is not too uncommon, to be sure. Every time I share a poll online I get tons of comments from people saying the pollster sampled too few Democrats or conservatives, etc. etc. But polls are adjusted to benchmarks so that, e.g., getting a sample that is 70% white is not that big a problem.
The relevant criticism is about using polls to determine debate qualification at all
In sum, I think Barreto’s article is pretty misleading. But he also missed a good opportunity to make a broader (and, in my opinion, correct) argument about why it is foolish for party committees to use polls to decide who makes it on the debate stage. We have seen how polls-based qualifications cause issues several times in this year’s primary:
Polls-based thresholds keep unpopular, but qualified, candidates off the debate stage. Just ask Julían Castro, or Michael Bennett, or Steve Bullock. If they want to win elections and perpetuate good governance, the Democratic Party should incentivize qualifications and electability beyond topline poll numbers 10 months before people are voting.
Polls-based thresholds let popular, but unqualified, candidates on the debate stage. It is not in the Democratic Party’s interest to let unqualified candidates like Tom Steyer, Mike Bloomberg, Marianne Williamson or Andrew Yang on the debate stage. When they do, sometimes it is clear that they are abusing the polling/donor thresholds by (a) spending tens of millions of dollars in select states to shore up support or (b) mobilizing an intense, but small, network of online and social media activists to donate to their campaign; 200k donors is nothing for Mr Yang, for example, who has 1.1m Twitter followers and a Reddit forum of nearly 100k constant posters and activists.
Polls-based thresholds magnify the impacts of statistical noise. It is worth stating the obvious: polls are not perfect. Sometimes, they produce errant results—be it via a poor selection of weighting variables, small populations, bad likely voter filters or something else. Polls also tend to jump around a little bit just by random chance; if the “true” support for a candidate is 25%, we should expect an average poll to get a number anywhere between ~20% and ~30% in 19 out of 20 polls. If a candidate is polling at 1, then, there’s a good chance they might meet a 3% threshold by chance alone. Do we really want chance deciding who gets to run for president?
…
Now back to the article at hand.
It is not an honest criticism of polling to asserts that unweighted numbers are unrepresentative. That’s the whole point of survey weights, after all. I’m surprised that Dr Barreto thought that doing so would fly. But I’m also floored that the Monkey Cage thought these comparisons were sound and worth publishing, given how many of its editors are experts in survey research. It’s true, as Barreto points out, that polls need to be adjusted to be representative of the electorate. But pollsters are mostly making these adjustments already (a fact that Barreto ignores), so this argument about “undersampling” is pretty moot.
The relevant discussion regards population targets. If we want to discuss the correct benchmarks for the racial composition of the Democratic primary, though, I’m happy to have that conversation. For now, I would discard for lack of evidence such claims as Dr Barreto’s that polls are rigged against voters—and candidates—of color. There’s no proof of that.
Editor’s note:
Merry Christmas and happy holidays to you all. I hope you enjoyed today’s letter, written to you whilst I sit at my parent’s dining table and stare at the cows and horses that live on the farm across the street.
Thanks for reading my thoughts on this subject. And thanks for subscribing! Your membership adds up and makes all this newslettering possible (reminder: I do all this work independently). Please consider sharing online or with a friend; the more readers, the merrier!
As always, send me your tips about what you’d like to read about next, or your feedback otherwise. You can reach me via email at elliott@thecrosstab.com or @gelliottmorris on Twitter.
—Elliott