Data normalization

Layers of complexity on the beach at Pornic

When I worked at an information business, we thought a lot about normalization of data. In some sense, if you’re selling data, especially data that you don’t originate, normalization is a core part of your value add, since, by normalizing, you’re making data easier to consume, understand and analyze.

To my mind, there are three types of normalization:

While these concepts seem natural to those whose business is data, to me they’re just as important in data warehousing, lakes, and business intelligence.

Long gone are the days (if indeed they ever existed) of a single database to run a business: an organization’s data is federated across numerous SaaS systems. If they don’t agree on reference data, if they use the same names to mean different things, if they differ on calculation, it’s a much harder job to connect the data up to answer the hard questions that give a business an edge.

While normalization is without doubt critical, it’s not without pitfalls: a substantial one being abstraction cost. That’s a post for another day.