All models are wrong

Sunset over the Arna in Florence

There’s a famous aphorism attributed to George Box that “All models are wrong, but some are useful”, meaning that, since every model is an abstraction of the system being modeled, there are bound to be things the model is not correct for.

I’ve been thinking about this a lot in the context of LLM hallucinations , whereby the model gives confident responses which are plausible but incorrect.

Hallucinations are likely an inherent feature of all generative AI, since the model is just that. Not every piece of training data is stored in the model. Rather the model learns relationships between words, based on the training data. For a (very) detailed explanation see this excellent post by Stephen Wolfram.

If that’s not making sense, then another way to think about this: If the model is smaller than the size of all the training data, then the pigeon hole principal shows that the model must be lossy. There is no possible way that the 4.2Gb Stable Diffusion model has every detail of all 2.3 billion images it was trained on.

That doesn’t mean that generative AI is not useful, just that we must be cautious about how we use it. Consider two questions we might ask:

  1. What is carbon sequestration?
  2. How many active oil wells are there in Minnesota?

The first is a qualitative question. There are an infinite number of correct answers to this, which we might rank from most useful to most useless. The usefulness of any given answer is in the eye of the beholder: the best answer for a ten year old is not the same for an expert in the domain. Generative AI does great with these types of questions.

The second is a quantitative question. There is exactly one correct answer for any point in time. This is where we need to be really careful. Generative AI might give a confident sounding answer that is totally incorrect.

Does that mean we can’t use it for the latter type of question? No, but such answers are better provided by knowledge graphs and relational databases which do a much better job of storing facts and their provenance. But you need to know how to query them. That’s where generative AI can help: writing the query in SQL or SPARQL for us based on an arbitrary set of natural language prompts.

An added bonus of this approach is we can update the facts without retraining the model. We strike oil in Minneapolis? Update the database, not the model.