#PowerQuery – Joining to a Slowly Changing Dimension

I blogged previously about how to look up a surrogate key for a slowly changing dimension using DAX. This post is about how to do the same thing using Power Query.

I'm going to start off with the same 2 tables that I used in the previous blog post. One is a fact table and the other is my BU (Business Unit) table. I started by clicking on each table of data in Excel and choosing the "From Table" data source option.

image

And for each table I unchecked the "Load to worksheet" option and then clicked apply & save.

SNAGHTML179a06bf

Once I had done that for both tables my Power Query tool pane looked like the following, I have two queries defined, but neither of them is loading any data directly.

image

Now that we have our two source queries we want to use the Merge option in Power Query to join them together

image

The Merge option in Power Query is how you join matching rows in two tables together. I chose "Fact" as my first table as for each row in the Fact I want to find the matching BU_Key from the BU table.

image

You'll notice that at this point we can only choose columns for an equality match, there are no options for us to test that the Date in Fact is between the StartDate and EndDate in the BU table.

When we click on OK we end up with a result like the following which has our original rows from the Fact table and then a column called "NewColumn" which contains the 1 or more rows from the BU table which matched on the BU_Code column.

image

If we click on the little double arrow button in the header of the NewColumn column you get the following options:

image

We can choose to either expand or aggregate the rows in the nested table. Because we want to lookup the BU_Key we tick that as well as the StartDate and EndDate columns as we will need those later.

That gives us a result like the following:

image

Now we are getting close, but we still have one major issue. We now have 16 rows instead of our original 8 because each row in the Fact table is matching to multiple rows in the BU table as we have not done any filtering based on the start and end dates yet. Clicking on the filter button at the top of the "Date" column it initially looks like doing a date filter and choosing the "Between" option would be a solution.

image

But that only gives us the option to select fixed dates values from our data, not references to another column.

image

One solution would be to put in fixed dates and then manually edit the filter in the formula bar, but I wanted to see how far I could get without resorting to doing any advanced editing. The solution I came up with involved some minor code, but it can be done without manually editing the formula.

What I ended up doing was inserting a new custom column which we can then use to filter out the rows we don't want. So from the "Insert" tab on the ribbon I chose the "Insert Custom Column" option:

image

Then I entered the following expression to create a new column called "DateFilter" which will return a value of True if the Date from the current Fact row was between the StartDate and EndDate from the BU table.

= ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )

image

That gives us the following result:

image

Then to filter down to just the "True" values we just need to click on the dropdown in the header of the "DateFilter" column and select the "TRUE" value in our filter.

image

We are now back to our original 8 rows.

image

Then we just need to do a little clean up. By holding the Ctrl key while clicking on the green columns above we can remove those columns. Then I just renamed "NewColumn.BU_Key" to BU_Key and clicked on the "Date" column and set it's type as date (which somehow did not get correctly get detected) we now end up with our finished table which we could choose to load into Excel or directly into a Power Pivot model.

image

Below is the Power Query Formula that was created as a result of the above steps. (this is just the merge query excluding the 2 source queries for "BU" and "Fact")

let
    Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Source
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    InsertedCustom = Table.AddColumn(
            #"Expand NewColumn", "DateFilter"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),
    FilteredRows = Table.SelectRows(InsertedCustom, each ([DateFilter] = true)),
    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate", "DateFilter"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}})
in
    RenamedColumns

If you want to manually tweak things you can go into the Advanced Editor and manually edit the formula to combine all three queries into one and you can also do away with the custom column and just do the between filtering inline. The following query shows the single query solution.

let
    Fact1 = Excel.CurrentWorkbook(){[Name="Fact"]}[Content],
    BU1 = Excel.CurrentWorkbook(){[Name="BU"]}[Content],
    Join = Table.NestedJoin(Fact1,{"BU_Code"},BU1,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Join
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    FilteredRows = Table.SelectRows(#"Expand NewColumn"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),

    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Date", type datetime}})
in
    ChangedType

If you are curious you can download the workbook I used for this blog post from my OneDrive: http://1drv.ms/1mux5O5

Print | posted on Monday, May 5, 2014 9:53 PM