AI can't replace polling

Sep 2

The use of AI models to mimic human respondents betrays the public, and provides inaccurate answers to key questions

16 Comments

I'm a retired statistician with a Substack about various topics related to data and statistics. I was so intrigued by your article about AI polling that I dug into the topic and just published my own take (https://dmalitz.substack.com/p/polls-without-people). I reviewed 8 studies of the use of synthetic data (including yours) and share your conclusion that this technology isn't nearly ready for prime time. I worry about bad actors flooding the media with AI polls during elections that lack validity but confuse voters with an air of legitimacy. Thanks for helping to bring the topic to the public's attention.

Expand full comment

Jessie christopher Lapinid

Sep 7

I remember there was at least one polling firm in 2024 who did polling through AI simulation.

Expand full comment

Ben Page (UK)

Sep 5Edited

Just for clarity., as the CEO of Ipsos. Yes Ipsos is partnering with Stanford on synthetic data. It needs serious examination as a new technology - but while we could anticipate that at some point in the future it might be possible, currently we have no intention of doing political opinion polling using digital twins / for the reasons this article well describes. we are in the process of validating the likely strengths and weaknesses of synthetic digital twins for any polling, period. While some experiments are promising we have a lot of work to do before we could recommend clients adopt this approach.

Expand full comment

Mark Pack

Sep 4Edited

Fascinating research; thank you for doing it and sharing the results. The consistent under-estimation of people's uncertainty on topics sounds like an error that future AI models could get better at? As you say, what LLMs are doing is based on predicting the next word of text - but as the data out there for LLMs to be trained on includes plenty of polls and accounts of polls in which uncertainty among respondents does feature, I don't think the inherent nature of an LLM means they can't get better at including text or numbers in their output which feature uncertainty of views more strongly?

Though of course that wouldn't fix all the (many!) other issues you've highlighted.

Expand full comment

Philip

Sep 3Edited

"AIs eliminate that uncertainty because they are fundamentally not trained to mimic the behavior of a human brain; they’re trained to predict the next word in a sequence of text."

---

This made me chuckle a bit, though it's accurate (with a nuance).

The idea of a neural network was originally inspired the design of a human brain's neurons. It's not unlike saying a computer's memory is named for human memory, or that computer files derive from physical file folders. In all those cases, the novel concept diverged quickly from its namesake. No one wastes a second pondering how to write on a computer file in Sharpie.

When I first learned about NN, I expected to find something modeled closely after the human brain. (Apparently there is promising research in this area, actually, but that's not the established meaning of NN.) Turns out that was a major step toward anthropomorphizing artificial "intelligence."

Expand full comment

Linda Aldrich

Sep 3

Wow, talk about propagating AI hallucinations with devastating effects! Case in point.

I know a computer scientist who is extremely wary of large language model AI and sees a dot.com type of bust coming in the industry, where the cart is put before the horse in its development, leading to unsound and overly risky investments. The polling companies you reference have a high chance of fitting into this category, but not before muddying the waters tremendously.

Expand full comment

Esker

Sep 2

I find it baffling that anyone would even _consider_ using an LLM in this way. Okay, I guess I'm not baffled that anyone would consider it, but I'm baffled that polling professionals who are in principle trained in statistics would consider it.

In addition to all of the reasons you write about here, the statistical relationships an LLM encodes are fundamentally not suited to produce probabilistic claims about anything except sequences of words (that is to say, not about people). The unit of analysis is fundamentally wrong: LLMs are trained on the level of word and phrase and document pattern occurrences, not on the level of people. It should be blindingly obvious that the frequency with which an opinion is expressed in the documents that make up LLMs' training data (more or less "on the internet") is a terrible proxy for how common that opinion is in the population -- but using an LLM to synthesize opinions is doing exactly that!

If a dataset oversamples some people (which LLM training sets absolutely do!), a good statistical model can account for that by representing "token level" parameters distinctly from "person level" parameters. Your uncertainty about a given data source goes down as you collect more data from that source (to the extent that the additional data approaches conditional independence of the previous data, conditioning on the "person-level" traits you're trying to estimate), but to estimate correlations among those traits, you need to look at variability (and covariance) _across_ those person-level traits. In other words, you need your model to explicitly represent sources of variability, and not just throw all the data you have into a big pool as LLM training does.

LLMs aren't just black boxes, whose statistical regularities are too complex to inspect, they're also capturing statistical regularities at the _wrong level of granularity_, and as a result the correlations they encode between traits and opinions are essentially meaningless for the purpose of predicting one from the other.

Expand full comment

Cameron Lopez

Sep 2

good article

Expand full comment

Barry G. Hall

Sep 2Edited

As a scientist I always told my students and postdocs that bad data are worse than no data at all. Your excellent article perfectly illustrates this principle. I hope that the resources are available to permit the replication of this study.

Even doctors who use AI to write up summaries of recorded patient visits tell me that the AI generated summaries are only about 80-85% accurate. Even with the necessart editing those AI summaries save doctors hours every day, but the review by a human is an absolute necessity. I hope that use of AI to summarise results of open ended responses will similarly be reviewed by interviewers.

Keep up the good work.

Expand full comment

Reply (1)

Esker

Sep 3

And summarizing the text in the prompt is one of the _most_ sensible use cases for an LLM!

Expand full comment

JDM

Sep 2

What troubles me most is the inherent paradox of this application of AI. The only way the AI can accurately generate opinion data is with robust prior data, which can only come from polling. That is likely something we could get accurate enough in the short term, but future predictive value necessitates the pattern staying static, which we know it doesn't. So then we'd have to build a model, which can already do. So... 🤷‍♂️

Expand full comment

Bob Fertik

Sep 2

>“If you’re going to pay for polling data that gets the wrong result, you might as well use AI and save money.”<

This is both remarkably cynical - and perfectly in synch with the Age of Bullshit that Trump has given us.

Expand full comment

Reply (1)

Curt J. Sampson

Sep 30

And just wrong. It seems clear to me that the AI is generating just fine results; it's the *people* that are giving us the wrong results. The sooner we replace all people with AI, the better off we'll be.

And I, for one, welcome our new AI overlords.

Expand full comment

Tiffany

Sep 2

The only reason this seems like this would be helpful is to get an immediate answer to some new question or situation that, most likely, some 24 hour news organization wants to know, rather than wait a week or so. But that's exactly what an AI model is likely to fail to do because it's outside the training data. At that point, why not just be a journalist and look up some similar polling issues and report on those? It has worked the way for a long time, and I don’t think anyone has been actively complaining about it.

Expand full comment

Reply (1)

G. Elliott Morris

Sep 2

Or send some reporters out to the streets!

Expand full comment

The Coke Brothers

Sep 2Edited

This is scary as sh*t. People don't realize how scary it is. Artificial Intelligence cannot capture Natural Stupidity. In what (virtual) world would anything that is remotely intelligent choose donald trump to lead America?

Expand full comment