When working with DAX measures, it is critical to think in terms of context. Those of you who have taken our course of DAX Fundamentals, or attended one of our conference sessions on ‘Mastering DAX’ (we keep being asked to deliver that talk), know the emphasis we put on understanding context and being able to transform contexts. Indeed, in a simple chart like the one below, each column constitutes its own specific context.
In reality, the visual does not retrieve the results for each data point separately but for everything at once. Which raises the question: what is ‘everything’? How many data points are being calculated? Understanding this can help solve performance problems in your Power BI reports, but it provides new opportunities as well. In this post, we dive into this topic; in another blog post I will give a practical example (I will need two post for that, probably).
Every visual behaves fundamentally like a pivot table in Excel so it helps to start there. As most Excel users know, traditionally, a pivot table in Excel is based on a table of data in a worksheet (nowadays, of course, it could also be based on an embedded Power Pivot model or an external data source). When you put a column of the source table, say Country, in the pivot table as a row or column label, you will see all unique Country values appear in the pivot table. When you add another column, say Sales, to the values of the pivot table, you will typically not see all Countries anymore, but only those Countries that have Sales. How does Excel know which Countries to show? Easy: it calculates the Sales aggregation for each Country, and only shows the Countries for which the calculation returns a non-blank result.
A Power BI visual works the same: when you only add a label, all unique values from the column used are shown. After adding a value from a measure, all unique values are shown that have a non-blank measure result. But still, the calculation is done for all the unique values in the label column.
When you add more measures, you will see that all labels are visible for which at least one measure returns a result. There are some exceptions to this rule, for instance when using measures in a tooltip.
Now, what happens when you add more columns as labels in the visual? In Excel, things are easy: all unique combinations of values in the label columns are shown, and when using a value field, only the combinations that have a value remain (but the calculation is done for all combinations). When working with a Power BI model (or Power Pivot in Excel), it depends:
- For columns that come from the same table, the calculation is done for unique combinations of values that are found in the table.
- For columns from different tables, the calculation is done for all combinations of unique values from the columns.
As an example, consider the model snippet below, where we have a Customer table with a relationship to an Employee table, meaning each Customer has a sales representative who is an Employee. Let’s say we have 1000 Customers and 500 Employees.
When you would use two columns Customer and Representative from the Customer table, the total number of calculations per measure is 1000. When you use a column from Customer and a column from Employee, the total number of calculations per measure is 500.000. Even while most of these combinations results in an empty selection of customers in the query context and would not be shown in the visual.
By only considering combinations of values that are found in the table, Power BI takes a shortcut to reduce the number of calculations needed. This feature is called AutoExist.
Now, you may have questions at this point – let me address some of them:
Should we avoid storing data in separate tables altogether then?
Well, not necessarily – Power BI does not have trouble calculating a well-designed measure many, many times, and there are other reasons to combine or to not combine data in one table. However, it is important to be aware of what combinations of label values are considered by Power BI. Specifically, it is a bad idea to try to mimic pivot tables or tables that you may be familiar with from Excel. I am absolutely no fan of the Table visual in Power BI for this specific reason: people tend to add more and more columns to the visual, unconsciously increasing the number of calculations needed into billions. Combined with less efficient measures, you will run into performance problems at some point.
Why is AutoExist not applied to columns from separate tables as well?
Hmm, that would make sense, wouldn’t it? The truth is, when you know how AutoExist works and when it is applied, you can use this to calculate results that would not be possible otherwise. In another blog post, I will give you an example of this.
I added two columns from different tables to a visual, but still only see valid combinations, not all of them?
This is an interesting one. Power BI tries to only show results that make sense – this means that when you don’t add a measure to the visual, Power BI will secretly add something that helps determine which combinations of label values to show. It does this by looking at which values logically go together through the relationships in the model (like a Customer and some attribute of their representative). Even if you use two columns that are not directly related, but are both linked to the same fact table, Power BI can do this: all combinations of values that have associated facts in the model, are shown in the visual. This goes wrong, however, when the tables are linked to more than one fact table: it becomes ambiguous which values go together and the visual will return an error instead.
As a final remark here, there is another difference between pivot tables in Excel and a Power BI visual, namely, Power BI visuals interact with each other. In fact, the AutoExist behavior not only applies when you have multiple columns of the same table in one visual, but also when these columns are in other, connected visuals, like a slicer. In other words, it is about which filters apply to the visual.