Menu

Master Tableau Approach

cover-image

In an earlier post, titled Master Tableau Concepts, I have summarized and added color to a talk by Joe Mako in which he highlighted the advanced concepts that a master practitioner can use to achieve flow by working with instead of against Tableau. This was an excellent talk which he also gave as a repeat at TC14.

In this new post, Master Tableau Approach, I’ll summarize & add color to a TC14 session from Bethany Lyons and Alan Eldridge, titled Jedi Level Calculation Techniques. Today's topic is the thought process used by a master practitioner when deciding which technique to employ when answering a complex question using Tableau.

How to Think

While introducing his own TC14 session, Dashboarding at the Edge of Impossible, Dustin Smith described the Tableau Jedi not as the person who knows every tip and trick. He instead described a Jedi as the person who knows how to best approach the problem.

In a similar vein, Bethany and Alan emphasized during their session that: While every problem on the planet may be solvable with a Table Calc, doing so can prove to be quite painful. The aim of their two hour lab was not to teach the step-by-step of each exercise. It was to convey the understanding of why those steps were appropriate, given the question.

The decision tree below, which originates from Bethany & Alan’s session, is noteworthy because it codifies this way of thinking. And a similar model was referenced during another TC14 session: Mix and Match your Data, presented by Alex Woodcock.

So a common aim of 4 separate Jedi level sessions at TC14 was to teach not tactic, but strategy. And what we see in the model below is the emergence of a codified circuitry by which your Master Tableau brain can be wired. This circuitry enables you to choose how and when to take the optimal course of action.

The Model

The model I present below is simply a beautification of the original presented by Bethany and Alan during their session. They (and likely others) have done the difficult work of forging it. So my own contribution is nothing more than a few added pleasantries to ease the communication. All the credit belongs to them!

It is in essence a decision tree of four questions, the answers to which determine the optimal approach for answering a complex question with Tableau.

The first three approaches are one of either a Table Calc, a Calculated Field, or a Set. Should your question lead yet further down the rabbit hole, then the last two approaches are one of either a Self Blend or an Aggregated Self Blend. The model is designed to get you away from using Table Calcs in those cases when they aren’t required.

As described by Bethany and Alan: Table Calcs, while radically empowering, don't always scale (they're performed by the front-end, at run-time). And they can also dramatically increase the complexity of your solution.

The Four Questions

Let’s walk through them one question at a time and look at a few examples. When offered, my examples differ from those used by Bethany and Alan. I’ve done this mostly to ensure that I understand the scenario myself, before teaching it to others.

Question #1: Can this row’s calculation perform without requiring results from the rows around it ?

If the answer is “No", then you need a Table Calc.

To understand this decision, please consider all the Table Calcs you know: all of them. They each have one thing in common. All Table Calcs perform their operations as a final step in the computation pipeline. And each of them operates on the table of data that is composed from the pills on your canvas.

So, whether you've built a crosstab or a visualization, an underlying canvas table is always formed according to the pills arrangement. A Table Calc is computed by the front-end processor (after data has returned from the database). And a Table Calc will use the values from the current partition of this canvas table as an input for their calculation.

Consider a few examples and you’ll get the idea: RUNNING_SUM, WINDOW_AVG, INDEX, RANK, LOOKUP, LAST, etc. They all depend on the surrounding values from the canvas table partition to compute correctly.

For this reason, if the result for a single row of your desired calculation can compute correctly without first requiring results from the rows around it as input, then a Table Calc is not the best approach for your task at hand. And if your calculation does require the nearby rows as input, then a Table Calc is your option.

Sometimes the canvas table contents can be difficult to imagine in their tabular form, because they're obscured behind the marks of your vis. I find the easiest way to look "under the vis" is to press CNTRL-A on the vis itself, and COPY & PASTE the entire canvas table into Excel (special thanks to Joe Mako for this!). Some other folks prefer to ‘Duplicate as a Crosstab’ and rearrange as necessary. Either way, you’re able to see the underlying canvas table contents & decide how best to build your Table Calcs from there.

Question #2: Is the Level of detail in the view the same that I need to answer the question?

If the answer is “Yes”, then you need a Calculated Field.

This also makes sense when you consider all of the Calculated Fields you know. What do they all have in common? They each work at the level of granularity defined by the view.

So with this question, we’re asking: Is the level of granularity that came from the database the very same that I require for my calculation? And if the answer is “Yes”, then a Calculated Field will serve you well.

Of note here: even if your calculated field performs an aggregation, like SUM([Sales]) or MIN([Vacation Hours]), etc, the granularity of the view will not be changed by the presence of this calculation. Why? Because Dimension pills determine granularity by partitioning the view; Measure pills paint marks into those partitions; and an aggregation is always a Measure.

Question #3: Is the calculation binary, with only two possible results (IN / OUT)?

If the answer is “Yes”, then you can build a Conditional Set based on the Grouping Dimension.

Here's where things begin to get interesting. The Jedi approach in this example uses a Dynamic Set within a Calculated Field to answer the business question. Bethany & Alan’s example revolved around suppressing Employee Names for all those departments whose number of employees was less than a parameterized threshold.

IF [Threshold Departments] THEN [Employee Name]
ELSE "Suppressed"
END

^-- [Threshold Departments] up there is a dynamic set, based on the formula: COUNTD([Employee Name]) > [Parameters].[Employee Threshold]

In my own example, we use this approach to dynamically suppress the detail for SUM([Sales]) into a single value called "ALL" for those Departments with fewer than the threshold number of Items.

The approach is the same: a dynamic set based on the grouping dimension is used inside of a subsequent Calculated Field. And the Calculated Field says:

“IF [IN the SET], then show me the detail. IF [OUT of the SET], then summarize or suppress.”

Remember, the important take-away is not the fancy trick. Emphasis here is on the decision to build upon a dynamic set, because business question includes a binary IN / OUT scenario.

Question #4: Do you need to aggregate a dimension to get the required result?

Having made it this far down the decision tree, we've exhausted the options of a single data source. The last two answers require a data blend and are best described by way of example.

If the answer is “No”, then you can Self Blend Using the Group-By Dimensions as Linking Fields.

Imagine a bar chart of Sales by Category, each bar labeled with a Percent of Total. This is fairly straight forward, and based on the first question in the decision model you'll build your Percent of Total from a Quick Table Calculation. If this is all your vis needs to do, then it’s a Quick Table Calc & you’re done!

But what happens when you want to color those same bars by region? It breaks the Table Calc. What you want is for your Percent of Total to continue to perform at the original level of detail (by Category), even after an extra level of detail (Region) has been added into the view. The broken vis looks something like this now:

So how can we color the bars by Region, while continuing to label Percent of Total by Category only? The answer is a "self blend" to a duplicate of the original data source and use the group-by dimension (in this case Category) as the only linking field(s).

When blending two data sources, Tableau will by default close the linking chains for all those fields used in the view. In other words, because Category & Region are both used in the view, Tableau will by default use them both as linking fields during the self blend.

However, this chart requires the Percent of Total to perform at the Category level only. So we can intervene against the default & choose Category as the only linking field when the Table Calc is brought in from the secondary data source.

Again, the message here is not the vis or how it’s constructed. It's that we're able to perform a calculation using a duplicate source, and perform that calc at a level of granularity that differs from the granularity of the view. Moreover, we’ve reached this approach thanks to the decision model presented by Bethany and Alan.

So to summarize, our location in the model is:

Yes it is required to know the results of the surrounding rows to calculate the Percent of Total. ✓ Table Calc
No the view is not at the level of detail that we need to answer the question: The view is at Category + Region while the Percent of Total needs to perform at Category only. Calculated Field
No the answer cannot be reached by way of a binary IN / OUT result. Dynamic Set
No it is not required to pre-aggregate any of the dimensions to reach the desired answer.✓ Self Blend Using Group-By Dimensions as the Linking Fields

For the last question, notice it wasn't necessary to pre-aggregate any dimensions to achieve the desired result. In fact, if anything, we actually wanted to avoid aggregating our label calculation at Region. This approach can be thought of as a "disaggregated blend”.

A Quick Note on Blend Performance

  • Performance in this case is great because we’re blending on only the 17 distinct values for Category
  • Blend performance is affected by the cardinality of the linking fields

Final Scenario

Now let’s look at one final example to explain the last decision in the model. If the answer to Question #4 is “Yes”, then you can Self Blend After Aggregating the Dimension.

In this new scenario, the business wants to analyze profits over time by repeat customers vs. customers who are new. To perform the analysis, one begins from the question, “What is the first date of purchase, per customer?”.

For this question, MIN(Order Date) works just fine when Customer defines the level of detail for the view. However, once you replace the Customer level of detail with an axis for MONTH(Order Date), then things begin to get wonky.

Here’s the problem: we want for MIN(Order Date) to be a Dimension. But, because MIN(Order Date) includes an aggregation, it is treated by Tableau as a Measure.

This is an important subtlety that the Tableau Master understands: calculations that include an aggregation are treated as a Measure. (It is for this reason that Table Calc pills are always treated as a Measure: because Table Calcs always include an aggregation).

If you’re finding yourself even just a bit uncertain here, now will be an excellent time to go back & review The Four Pill Types portion of the Master Tableau Concepts post.

In short, your choice of whether a pill is treated as a Dimension or a Measure will drive how computations are performed.

  • Dimension = GROUPBY
  • Measure = Aggregation

And sometimes, as with this example, the computation does the driving. Thanks to the aggregation, we have no choice but for MIN(Order Date) to be treated as a Measure. And thusly, here we are, at the bottom of Alan and Bethany's decision model, and the challenge is to GROUPBY MIN(Order Date).

Thankfully, there is an answer. According to the decision model, the answer is to "self blend after aggregating the necessary dimension”. So in the context of this example, this means that we can duplicate the data source & blend in a secondary extract which contains only one record per customer, with their first date of purchase.

There are a few alternative approaches for how to get this done. In the lab at TC14, Alan and Bethany showed us how it's possible to duplicate the data source & create the desired data extract using the options “Hide All Unused Fields” and “Aggregate Data for Visible Dimensions”. Myself, I find it easier & less confusing to build the desired crosstab and copy / paste those values into the workbook as a Clipboard Data Source. And a third option would be to export the desired crosstab to an Excel or a CSV file. In either case, the method you choose seems largely a question of preference.

A Quick Note on Blend Performance

  • Performance in this case is great because we’re blending on only 48 values of Month(Order Date)
  • As an exercise: establish the Data Relationships to blend upon the 1,427 distinct values of MDY(Order Date) & watch how terrible the performance becomes!
  • Blend performance is affected by the cardinality of the linking fields

In Summary

As the Tableau Masters have emphasized during their Jedi level sessions at TC14: the Master Practitioner achieves flow with Tableau by way of a relational understanding of complex and interrelated concepts. She knows how and when to take best approach when answering a data question. In this post, I've walked through a decision model that was designed for doing just that.

Since I've not provided the step-by-step instructions for each example vis, you might like to recreate them on your own as practice. This supporting workbook can be used as reference & reverse engineering. Just let me know if you have any questions!

Jonathan's Caveat :)

Also, in chatting with Jonathan Drummey just shortly after this was published, he's reminded me to add an important caveat to all of this!

In a complex environment, there are bound to be additional factors (many of which not mentioned in this post), each with the potential to challenge a four question decision model.

Like all models: this one is a simplification and is incomplete. All of which means, well, it works great right up until the time that it doesn't :). A notion summarized well by the famous quote from George Box:

George Box, (1919-2013): a wit, a kind man and a statistician

Thankfully, in all such cases the Tableau Community is awesome! It has a Community Forum and Zen Masters!

To end the chat, we laughed & agreed the complete model would look something akin to the wiring diagram to an aircraft carrier. Or, if you're Jonathan Drummy, then you maintain the working draft of such a model & a glimpse of it looks like this =)

null

Word Count: 2,587

References

  1. "Master Tableau Concepts", Keith Helfrich, Red Headed Step Data, June 22, 2014:
    http://redheadedstepdata.io/master-tableau-concepts
  2. "Flowing With Data", Joe Mako, Featured Session, Tableau Conference, September 10, 2014:
    https://tc14.tableausoftware.com/schedule/wednesday#session-1043
  3. "Jedi Calculation Techniques", Bethany Lyons and Alan Eldridge, Tableau Conference, September 9, 2014:
    https://tc14.tableausoftware.com/schedule/tuesday#session-1095
  4. "Unexpected Interactivity: Dashboarding at the Edge of Impossible", Dustin Smith, Tableau Conference, September 10, 2014:
    https://tc14.tableausoftware.com/schedule/wednesday#session-951
  5. "Mix and Match Your Data: Advanced Data Blending", Alex Woodcock, Tableau Conference, September 9, 2014:
    https://tc14.tableausoftware.com/schedule/tuesday#session-1052
  6. "Master Tableau Concepts", Keith Helfrich, Red Headed Step Data, June 22, 2014:
    http://redheadedstepdata.io/master-tableau-concepts
  7. "Master Tableau Approach - Workbook", Examples, Keith Helfrich, Tableau Public, September 19, 2014:
    https://public.tableausoftware.com/views/master-tableau-approach/A3_DynamicSetExample
  8. "Jonathan Drummey", Zen Master, September 19, 2014: http://drawingwithnumbers.artisart.org/
  9. "George Box, (1919-2013): a wit, a kind man and a statistician", by Julian Champkin, Significance Magazine, 2013:
    http://www.significancemagazine.org/details/webexclusive/4566511/George-Box-1919-2013-a-wit-a-kind-man-and-a-statistician.html
  10. "Tableau Community Forums", September 19, 2014:
    http://community.tableausoftware.com/community/forums
  11. "2014 Zen Masters", Photo by KK Molugu, September 11, 2014:
    https://twitter.com/KKMolugu/status/510175720484507648
  12. "Complete Model -- Almost", Photo by Jonathan Drummey, via Twitter, September 19, 2014:
    https://twitter.com/jonathandrummey/status/512934333808930816/photo/1