AI can't replace polling

Using AI models such as ChatGPT to mimic human respondents betrays the public and provides inaccurate answers to key questions

Sep 02, 2025

In Isaac Asimov’s 1955 short story “FRANCHISE,” a dystopian future America (set in 2008) has turned into an “electronic democracy” where a supercomputer called Multivac replaces the need for people to vote. Each year, Multivac selects one citizen — the mathematically most representative person in America, according to its “billions” of data points — to be the “Voter of the Year.” Multivac conducts an extensive interview with this individual and uses the answers to predict the votes of every other American. Multivac then assigns the winners of every elected office in the country, with no actual votes being cast; the opinions of every voter are inferred from a single interview.

We are edging toward this dystopia Asimov imagined — not (yet) for democracy, but for public opinion polling. Over the past year, several AI startups have begun using large language models (such as ChatGPT) as a stand-in for human interviewees in products they call “AI polls.” One New York-based company, Aaru, generated headlines in 2024 for publishing a forecasting model for the presidential election made up entirely of AI-generated respondents. Aaru created these respondents by first entering descriptions of real people’s demographic traits and news diets (representative data for which can be gathered from the Census Bureau and reputable pollsters) into a large language model, then prompting the model to assume the personas it fed in and answer questions as if it were that person.

The idea does not stop at startups. This month, the global polling company Ipsos announced a new partnership with Stanford’s Politics and Social Change Lab to build and validate a dataset of “digital twins” of its real respondents — AI simulations of real people that can then be used to predict answers on questions pollsters or organizations don’t have the money to ask real humans. And one week ago, Google announced a similar initiative.

As a data-driven political journalist, author of a book about polls, and part-time pollster, I understand the appeal of a quick AI-generated solution to polling’s problems. Response rates are in a total free-fall, approaching half a percent for a traditional phone poll, making political surveys take longer and much, much more expensive.

But replacing a sample of real people with a so-called “synthetic sample” is not the answer. AI-powered simulations of public opinion are both inaccurate compared to real polls and betray the crucial role polling plays in our modern democratic process. They are, in short, a terrible idea. This piece explains why.

1. AI-generated “synthetic samples” are inaccurate

Over the last two months, I have been working with the (human) polling company Verasight on a set of whitepapers assessing the accuracy of these LLM-generated synthetic samples. In our new and first white paper, I benchmarked multiple large language models against a high-quality, representative survey of 1,500 U.S. adults conducted on Verasight’s panel in June 2025. For each real respondent, we created a “persona” (based on a person’s sex, age, race, education, income, state of residence, and their self-described ideology and partisanship) and asked the LLM to answer the same questions those people answered, as if it were each person.

We then compared the AI-generated results (toplines, crosstabs, and even person-level agreement) to our real polling data on three questions: Trump approval, the generic congressional ballot, and a question about zoning reform. The last question is meant to serve as a test of the LLM’s ability to predict opinions on new, “out-of-sample” questions it shouldn’t have any information about, since this issue emerged after the knowledge cutoff of the LLM’s training data (mid-2024).

We find that the AIs cannot successfully replicate real-world data. Across models, the LLMs missed real population proportions for Trump approval and the generic ballot by between 4 and 23 percentage points. Even the best model we tested overstated disapproval of Trump, and almost never produced “don’t know” responses despite ~3% of humans choosing it.

For core demographic subgroups, the average absolute subgroup error was ~8 points; errors for some key groups (e.g., Black respondents) were as large as 15 points on Trump disapproval, and smaller groups had larger errors still (30 percentage points for Pacific Islanders). This is unusable for serious analysis.

At the person level, models often flip a respondent’s views. Roughly 1-in-5 to 1-in-4 real Trump disapprovers were predicted as approvers (and vice-versa) by the LLM — even before splitting by intensity. Results were worse if you separate respondents by intensity (eg, “strongly” vs “somewhat” approve).

And on an out-of-sample policy question (the zoning reforms proposed by Derek Thompson and Ezra Klein in their 2025 book “Abundance”), the model got the plurality leader wrong, overstated support for reform by 30 percentage points, and again produced 0% “don’t know” versus nearly a third of real respondents selecting uncertainty.

The last point is very important in the debate over AI vs human polling. When we tested the ability of a set of LLM “digital twins” to replicate population proportions on questions about topics the AI hadn't already been trained on — issues that emerged after the model's knowledge cutoff — the synthetic polling completely fell apart. On a question about local zoning reform, the AI not only got the percentages wrong but actually flipped which position had majority support.

You can ask a human being how they feel about a new political issue. You cannot do this with a robot.

Another important finding is that the AI models systematically eliminate uncertainty from human responses. While 3% of our real respondents said "don't know" on Trump approval — representing about 8 million American adults — the AI models almost never said they didn’t know the answer to a question. This isn't a minor technical glitch; it's a fundamental misrepresentation of how people think about complex political questions. We know that human political psychology is messy, and (as demonstrated by our out-of-sample question) uncertainty is an important feature of people’s attitudes about politics. AIs eliminate that uncertainty because they are fundamentally not trained to mimic the behavior of a human brain; they’re trained to predict the next word in a sequence of text.

The bottom line of our research is that LLMs are not accurate replacements for human respondents. In this open research paper, we show that you cannot rely on AI “polls” to create accurate predictions of public opinion. Indeed, in the 2024 elections, Aaru’s predictions performed on par with or worse than other poll-based models (and it’s possible, maybe even likely, they augmented their predictions with other public real survey data, anyway).

Support independent political journalism!

I usually reserve my in-depth Tuesday essays for paying subscribers, but this piece is too important to keep behind a paywall. Thanks to the generous support of Strength In Numbers members, I can share this analysis with everyone.

If you believe in the necessity of centering real human data in reporting and analysis about our democracy, consider joining our community of supporters. You'll not only fuel independent, data-driven journalism but also gain premium access to exclusive stories and insights you won't find anywhere else.

Upgrade today

2. AI-generated “polls” betray the democratic purpose and process of polling

After the election, a co-founder at Aaru justified the company’s over-confident predictions to Semafor, “If you’re going to pay for polling data that gets the wrong result, you might as well use AI and save money.”

This reflects a fundamental misunderstanding of what polling is for. In my book Strength In Numbers, I explain both how polls work and why they are important to the democratic process. One quote I always come back to is from the late political scientist Sidney Verba:

Surveys produce just what democracy is supposed to produce — equal representation of all citizens. The sample survey is rigorously egalitarian; it is designed so that each citizen has an equal chance to participate and an equal voice when participating.

Even if so-called “AI polls” were occasionally “close enough” on the horse race, they would still betray the fundamental democratic purpose of polling.

In Strength In Numbers, I argued that polls give ordinary people a structured way to be heard — a scientific wrapper around the principle of one person, one voice — so our elected leaders, the media, and the public can understand the country in as close to an objective manner as exists. Polls are a civic technology: a way to measure public preferences about the most public and high-profile matters of political life, not manufacture them for the sake of prediction.

Properly conducted, polls mimic the social contract we have all drawn between each other and our government. Swapping in AI “digital twins” for real people ruptures that social contract in several ways.

First, representation suffers. Synthetic respondents are not people; they are statistical pastiches assembled from training data and prompt engineering. That makes them especially prone to flattening or stereotyping minority perspectives and to mischaracterizing small, intersectional groups — the very voices good surveys work hard to include. When those misstatements are later weighted up to represent the country, the errors compound, yielding confident but illusory precision. AI systems also stereotype partisans, underestimating the amount of disagreement Democrats or Republicans may have with their party leader.

Second, uncertainty and humility disappear. The answer options “don’t know” and “it depends” in polling are not nuisances; they are valuable signals about attentiveness, ambivalence, and agenda salience. Humans express that uncertainty all the time, especially on new or low-information issues. LLMs, by design, tend to produce fluent, definitive answers even when a real person would hesitate. The result is brittle, polarized outputs that overstate conviction and understate how preferences actually form and evolve in civic life.

Finally, accountability and legitimacy erode. With real samples, researchers can document recruitment, response rates, weighting choices, instrument wording, and nonresponse bias — and those factors can be audited when polls err, as they did in the 2016 and 2020 presidential elections. With synthetic samples, the chain of inference runs through a proprietary model the pollster does not control, turning “opinion” into whatever the model (and the prompts a user inputs into it) happens to emit. That is not a public instrument; it’s a black box. Polling draws its legitimacy from participation itself: people are asked and heard. Replacing them with simulations severs that link and invites distrust at a moment when democratic institutions can least afford it.

Good polling helps citizens understand their fellow citizens, reveals the complexity of public opinion, and provides accountability mechanisms for elected officials. It's a tool for self-knowledge and democracy.

AI polling undermines all of these functions. Even if they were accurate, when AI-predicted responses replace human voices, we lose the authentic diversity of public opinion. When algorithms smooth over the genuine uncertainty and ambivalence that characterizes real human political thinking, we get a false picture of democratic consensus. When minority voices are systematically distorted by models trained on majority perspectives, we perpetuate rather than illuminate existing biases.

The Asimov story resonates not because electronic democracy is technically impossible, but because it represents the ultimate abstraction of democratic participation — reducing the messy, contradictory, beautifully human process of collective decision-making to algorithmic optimization.

Want more essays like this about politics and technology? Or to stay in the loop with fresh data-driven political coverage? Support our independent news website where we believe that the views of the American people — and we mean real people — matter.

Upgrade to paid

3. The cost of synthetic shortcuts

Proponents of AI polling argue that even imperfect synthetic data is better than no data at all, especially given rising survey costs. But this misses the point entirely. Bad polling isn't just useless — it's actively harmful to democratic discourse.

The pro-AI crowd will also often retort that large language models are just another model of polling data, similar to other models such as multilevel regression and post-stratification that transform national surveys into estimates of opinions for states or congressional districts. This is a false equivalence; statistical models of actual polling data assess the real-world relationship between the demographic traits of real humans and their real responses, then project those relationships onto other people based on their demographics as recorded by the U.S. Census Bureau.

In contrast, when, e.g., news organizations report that "Americans support" or "oppose" a policy based on synthetic polling, they're not informing the public about genuine public opinion. They're amplifying outputs of an algorithm that has no knowledge of real traits and responses, only the correlations between those words (like “Black” and “Trump”) where they exist in the LLMs’ training data. When researchers use synthetic data to study political behavior, they're building social science on a foundation of computational guesswork.

4. Polling and the future

To be clear, I am not opposed to all applications of AI in survey research. In my own polls, for example, I’m enthusiastic about the use of AI tools to recode interviewees’ “open-ended” responses (long text answers to questions such as “How are you feeling about the state of the country today? Feel free to answer in your own words, with whatever comes to mind”) into digestible text. That is the type of thing pollsters used to rely on intern labor for.

But letting AI replace the people we aim to represent is a wrong turn.

There are a number of ways to bring down the costs of survey research today, before turning to AI replacements. For example, pollsters can use representative online panels, recruiting hard-to-reach voters and keeping them around for multiple surveys. Organizations can also band together, cooperating on surveys together and sharing the data. Creative solutions honor both accuracy and the democratic ideal that every person deserves an equal voice.

Real polling, whatever its limitations, captures something essential about human political thinking that no algorithm can replicate: the genuine voice of citizens grappling with choices that matter to their lives. When we replace those voices with synthetic approximations, we don't just lose statistical accuracy. We lose the human element that makes democracy meaningful.

The path forward isn't to abandon polling in the face of rising costs and methodological challenges. It's to preserve and strengthen the tools that help us hear from real people about real choices. In an age of increasing algorithmic mediation, that human connection becomes more valuable, not less.

Asimov's electronic democracy was dystopian not because the machines were inaccurate, but because they eliminated the need for human participation altogether. AI polling takes us down the same troubling path — substituting computational efficiency for democratic legitimacy, synthetic convenience for human authenticity and complexity.

We still have time to choose a different direction. But only if we remember that in polling, as in democracy, there's no substitute for listening to real people.1

I do not argue that LLM-powered prediction machines cannot be practical at some tasks. For example, new research shows that if you give the LLM much richer data on a real person — including their real responses to other real poll questions — the correlations embedded in the model can be used to impute missing variables in a survey. You might be able, for example, to impute with some accuracy how someone feels about the One Big Beautiful Bill if you already know their sex, age, race, education, income, approval of Donald Trump, vote in the 2024 election, feelings about Medicaid and SNAP benefits, etc. That could be helpful — though existing statistical methods can already do this!

Tiffany

Sep 2

The only reason this seems like this would be helpful is to get an immediate answer to some new question or situation that, most likely, some 24 hour news organization wants to know, rather than wait a week or so. But that's exactly what an AI model is likely to fail to do because it's outside the training data. At that point, why not just be a journalist and look up some similar polling issues and report on those? It has worked the way for a long time, and I don’t think anyone has been actively complaining about it.

Expand full comment

1 reply by G. Elliott Morris

Bob Fertik

>“If you’re going to pay for polling data that gets the wrong result, you might as well use AI and save money.”<

This is both remarkably cynical - and perfectly in synch with the Age of Bullshit that Trump has given us.

1 reply

14 more comments...

Discussion about this post

Ready for more?