Modeling a type two dimension join in PowerBI

Posted Jan 31, 2021

A common way to model slowly changing dimensions is with the type two technique. The linked Wikipedia article describes this well: instead of overwriting the dimension with the new value (and potentially invalidating the relationship in older facts) one creates a new row and uses from/to effective dates to model when the attributes were correct.

In the SQL world this works well, since we can include the date of the fact and the join with the dimension table.

However, the data modelling feature in Power BI permits relationships on one column only, resulting in a many to many relationship. For instance, consider two tables: one of facts about mileage driven and one about owners:

Fact table

Registration	Date	Miles
VEU008	1-1-13	12
VEU008	2-1-13	10
RER959	1-1-13	11
RER959	2-1-13	9

Dimension table

Registration	Owner	From	To
VEU008	Dan	1-1-09	1-5-13
VEU008	Bob	1-6-13
RER959	Bob	1-1-06

A naive join on the registration column in Power BI will result in a forced many to many relationship since PowerBI will see that the value VEU008 has a cardinality of greater than 1 in both tables. That becomes problematic with aggregation functions.

Depending on your goal, there are multiple ways to resolve this:

If you only care about current values, you could filter the table (using Table.SelectRows after import to remove all rows that are not current (e.g. have a ’to’ date < now). However, depend on the goal, that may create erroneous aggregates: do that for this model and we would be attributing 22 miles to Bob, not 10.

Another approach would be to create a surrogate key on the dimension table and then enrich the fact table with that key before you ingest the data into Power BI.

But what if the two tables come from different sources? Then, we must make the join in Power BI. Here’s one way to do it. The trick is to join the two tables using Table.Join and then use Table.SelectRows to remove the rows that aren’t relevant:

For instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


let
    Source = Table.Join(
    Table.FromRecords({
        [Registration = "VEU008", When = #date(2013,1,1), Miles = 12],
        [Registration = "VEU008", When = #date(2013,2,1), Miles = 10],
        [Registration = "RER959", When = #date(2013,1,1), Miles = 11],
        [Registration = "RER959", When = #date(2013,2,1), Miles = 9]
    }),
    "Registration",
    Table.FromRecords({
        [Registration = "VEU008", Owner = "Dan", From = #date(2009,1,1),To = #date(2013,1,5)],
        [Registration = "VEU008", Owner = "Bob", From = #date(2013,1,6),To = null],
        [Registration = "RER959", Owner = "Bob", From = #date(2006,1,1),To = null]
    }),
    "Registration"),
    Filtered = Table.SelectRows(Source,each [From] <= [When] 
      and ([To] >= [When] or [To] = null))
in
    Filtered

From a performance perspective this isn’t ideal but it’s a clean way to model a complex relationship.