In the last post on the graph, we looked at the use of URIs to identify the subject of the thing that we’re discussing.
You’ll recall that RDF models data as a triple: subject, predicate (the attribute of the thing we wish to state something about) and object (the value of the thing).
Like identity, a simplistic view of predicates puts them all in the eye of the beholder. A great example is ‘age’. Nearly everything has an age. For humans, we typical measure age in years. For fruit flies, years lack sufficient granularity: we might choose days. For other things we cannot be precise: the age of a fossil might be described as ‘late cretaceous’: we cannot be more specific than that.
If we use just ‘age’ in our table, we introduce ambiguity:
A human, reading this table, can interpret the age and make reasonable guesses about what we mean based on external world knowledge (perhaps the employee’s age is in hex?). A machine has no such ability. Ask a human to sort these four rows by age and you’ll likely get the right answer. A machine can only sort this data based on alpha numeric. It has no other insight it can rely on.
This is how every relational database, Excel, CSV file etc. does it. They give the predicate (column) a name and expect the consumer to make sense of it. Fundamentally, this is a big part of the reason why so much of data science is data engineering: massaging data to get it to ‘fit’ together.
The RDF solution is elegant and simple: just like identity, if we add hierarchy to the predicate, we give ourselves the ability to make more precise statements. Restating the above table, we might get:
|http://nonodename.com/ld/fossil/34||http://nonodename.com/ont/ geological/Age||Late Cretaceous|
Notice how each age has a different URI prefix, providing a separate ‘namespace’ for the predicate at the expense of storage volume. Like the subjects, we can choose to make these URIs hyperlinks, giving us a simple way to learn more about the predicate.
The RDF suite of standards builds on this, giving us ways to describe the permitted range and data type of the object, cardinality, labels, comments and more for each distinct predicate. This is the field of ontology: the description of properties of a subject area and their relations.