Modeling a type two dimension join in PowerBI
Posted
A common way to model slowly changing dimensions is with the type two technique. The linked Wikipedia article describes this well: instead of overwriting the dimension with the new value (and potentially invalidating the relationship in older facts) one creates a new row and uses from/to effective dates to model when the attributes were correct.
In the SQL world this works well, since we can include the date of the fact and the join with the dimension table.
However, the data modelling feature in Power BI permits relationships on one column only, resulting in a many to many relationship. For instance, consider two tables: one of facts about mileage driven and one about owners:
Fact table
Registration | Date | Miles |
---|---|---|
VEU008 | 1-1-13 | 12 |
VEU008 | 2-1-13 | 10 |
RER959 | 1-1-13 | 11 |
RER959 | 2-1-13 | 9 |
Dimension table
Registration | Owner | From | To |
---|---|---|---|
VEU008 | Dan | 1-1-09 | 1-5-13 |
VEU008 | Bob | 1-6-13 | |
RER959 | Bob | 1-1-06 |
A naive join on the registration column in Power BI will result in a forced many to many relationship since PowerBI will see that the value VEU008
has a cardinality of greater than 1 in both tables. That becomes problematic with aggregation functions.
Depending on your goal, there are multiple ways to resolve this:
If you only care about current values, you could filter the table (using Table.SelectRows
after import to remove all rows that are not current (e.g. have a ’to’ date < now). However, depend on the goal, that may create erroneous aggregates: do that for this model and we would be attributing 22 miles to Bob, not 10.
Another approach would be to create a surrogate key on the dimension table and then enrich the fact table with that key before you ingest the data into Power BI.
But what if the two tables come from different sources? Then, we must make the join in Power BI. Here’s one way to do it. The trick is to join the two tables using Table.Join
and then use Table.SelectRows
to remove the rows that aren’t relevant:
For instance:
|
|
From a performance perspective this isn’t ideal but it’s a clean way to model a complex relationship.