11 Comments
User's avatar
Philip's avatar
4hEdited

"AIs eliminate that uncertainty because they are fundamentally not trained to mimic the behavior of a human brain; they’re trained to predict the next word in a sequence of text."

---

This made me chuckle a bit, though it's accurate (with a nuance).

The idea of a neural network was originally inspired the design of a human brain's neurons. It's not unlike saying a computer's memory is named for human memory, or that computer files derive from physical file folders. In all those cases, the novel concept diverged quickly from its namesake. No one wastes a second pondering how to write on a computer file in Sharpie.

When I first learned about NN, I expected to find something modeled closely after the human brain. (Apparently there is promising research in this area, actually, but that's not the established meaning of NN.) Turns out that was a major step toward anthropomorphizing artificial "intelligence."

Expand full comment
Linda Aldrich's avatar

Wow, talk about propagating AI hallucinations with devastating effects! Case in point.

I know a computer scientist who is extremely wary of large language model AI and sees a dot.com type of bust coming in the industry, where the cart is put before the horse in its development, leading to unsound and overly risky investments. The polling companies you reference have a high chance of fitting into this category, but not before muddying the waters tremendously.

Expand full comment
Esker's avatar

I find it baffling that anyone would even _consider_ using an LLM in this way. Okay, I guess I'm not baffled that anyone would consider it, but I'm baffled that polling professionals who are in principle trained in statistics would consider it.

In addition to all of the reasons you write about here, the statistical relationships an LLM encodes are fundamentally not suited to produce probabilistic claims about anything except sequences of words (that is to say, not about people). The unit of analysis is fundamentally wrong: LLMs are trained on the level of word and phrase and document pattern occurrences, not on the level of people. It should be blindingly obvious that the frequency with which an opinion is expressed in the documents that make up LLMs' training data (more or less "on the internet") is a terrible proxy for how common that opinion is in the population -- but using an LLM to synthesize opinions is doing exactly that!

If a dataset oversamples some people (which LLM training sets absolutely do!), a good statistical model can account for that by representing "token level" parameters distinctly from "person level" parameters. Your uncertainty about a given data source goes down as you collect more data from that source (to the extent that the additional data approaches conditional independence of the previous data, conditioning on the "person-level" traits you're trying to estimate), but to estimate correlations among those traits, you need to look at variability (and covariance) _across_ those person-level traits. In other words, you need your model to explicitly represent sources of variability, and not just throw all the data you have into a big pool as LLM training does.

LLMs aren't just black boxes, whose statistical regularities are too complex to inspect, they're also capturing statistical regularities at the _wrong level of granularity_, and as a result the correlations they encode between traits and opinions are essentially meaningless for the purpose of predicting one from the other.

Expand full comment
Cameron Lopez's avatar

good article

Expand full comment
Barry G. Hall's avatar

As a scientist I always told my students and postdocs that bad data are worse than no data at all. Your excellent article perfectly illustrates this principle. I hope that the resources are available to permit the replication of this study.

Even doctors who use AI to write up summaries of recorded patient visits tell me that the AI generated summaries are only about 80-85% accurate. Even with the necessart editing those AI summaries save doctors hours every day, but the review by a human is an absolute necessity. I hope that use of AI to summarise results of open ended responses will similarly be reviewed by interviewers.

Keep up the good work.

Expand full comment
Esker's avatar

And summarizing the text in the prompt is one of the _most_ sensible use cases for an LLM!

Expand full comment
JDM's avatar

What troubles me most is the inherent paradox of this application of AI. The only way the AI can accurately generate opinion data is with robust prior data, which can only come from polling. That is likely something we could get accurate enough in the short term, but future predictive value necessitates the pattern staying static, which we know it doesn't. So then we'd have to build a model, which can already do. So... 🤷‍♂️

Expand full comment
Bob Fertik's avatar

>“If you’re going to pay for polling data that gets the wrong result, you might as well use AI and save money.”<

This is both remarkably cynical - and perfectly in synch with the Age of Bullshit that Trump has given us.

Expand full comment
Tiffany's avatar

The only reason this seems like this would be helpful is to get an immediate answer to some new question or situation that, most likely, some 24 hour news organization wants to know, rather than wait a week or so. But that's exactly what an AI model is likely to fail to do because it's outside the training data. At that point, why not just be a journalist and look up some similar polling issues and report on those? It has worked the way for a long time, and I don’t think anyone has been actively complaining about it.

Expand full comment
G. Elliott Morris's avatar

Or send some reporters out to the streets!

Expand full comment
The Coke Brothers's avatar

This is scary as sh*t. People don't realize how scary it is. Artificial Intelligence cannot capture Natural Stupidity. In what (virtual) world would anything that is remotely intelligent choose donald trump to lead America?

Expand full comment