This post helps you to understand how the granularity & the shape of your underlying data will affect the visualization work you do in Tableau.
I was recently stumped by a common problem, one for which Jonathan Drummey back in 2012 has started a page to document the many scenarios in which this type of complication can occur.
My particular scenario was Headcount, when given the Arrival & Departure Dates. And Jonathan's collection of similar scenarios is Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?
The most interesting thing about each of the scenarios in Jonathan's collection is not their individual solution in isolation. But rather, the underlying pattern behind those solutions: the what they share in common.
And when Joe Mako helps me get through something on a Sunday afternoon, you know the answer is worth sharing!
The number one, most important facet of learning Tableau, and learning from Joe, is to recognize the patterns that recur. By recognizing common patterns when working with data, and by learning the behaviors of Tableau, one learns to reach a flow state with similar encounters in the future, even while the details may vary.
So let’s look at the patterns.
- shift starts & shift ends for employees
- utilization rates with start time & end time
- work orders that arrive, take different amounts of time to process, and are then are completed
- patient records with admit date & discharge date
What each scenario has in common is the shape of the data
One row per incident, with two date columns per row. One date for the beginning & another date for the end
And the type of question being asked is always something like:
Among these observations, compare the difference between the two date values
Tableau Prefers Tidy
In his paper on the subject of tidy data in the Journal of Statistical Software³, Hadley Wickham draws this definition:
- Each variable forms a column
- Each observation forms a row
- Each type of observational unit forms a table
To which, I would add:
4. Related tables share a common column that allows them to be linked
Tableau Prefers Tidy Data. And rightly as an analyst, you also prefer tidy data!
A Discussion of Shape
But wait there. My data is tidy.
For all of these scenarios, every data set has a single column per variable. Each collection of observations is comprised in a single table. And each observation is in a new row.
- The survey data has one new row for every survey response
- The work order data has one new row per work order
- The patient records have one new row per patient
So here we make the distinction..
Lookup vs. Transactional
The data structure, or shape, for each of these scenarios is indeed perfectly tidy: as a lookup table. And the tidy lookup shape of these data sets can easily answer lookup questions, like:
- Does Garrett arrive before Cecilia ?
- Who is allergic to Nuts ?
- How many Gluten Free will arrive on Wednesday ?
When comparing the open & close dates, this lookup shape very easily allows Tableau to calculate the number of open work orders for one specific date. On April 26th, for example.
But the lookup shape does not easily answer these transactional questions across multiple dates:
- On each day, across all dates, how many Gluten Free are coming for dinner ?
- On each day, during the entire month of March, what was the count of open work orders ?
Because the observations begin & end at a varying pace, the lookup data shape doesn't allow for Tableau to easily perform these calculations.
In fact, asking transactional questions of a lookup data shape is nerve wracking!
So the common pattern to each solution in Jonathan's collection of scenarios is to reshape the data to make it transactional.
Some scenarios have solved their problem by reshaping with SQL, some may prefer to choose Alteryx. I've reshaped my headcount data using R.
But it doesn’t matter how. The point is, transactional questions call for tidy transactional data.
One record for each transaction
What’s interesting here is that, by the time Joe could set me straight, I had found a way to answer my question without reshaping the data. That approach required a scaffold & complex table calculations.
So while it may be true, that if you torture yourself for long enough then you can get the lookup data to talk; and while it may be fun to explore what's possible with scaffolding & table calculations, those complex solutions are difficult to maintain. And they aren't very flexible.
Reshaping is Easy
Moreover, if you're working in a spreadsheet, then reshaping your lookup data to be transactional for these scenarios needn't require SQL, or Alteryx, or R.
In this case, making your lookup data transactional is literally as easy as duplicating the data set & inserting a new column.
Half of the duplicated records should now have the
date_type = “begin date”, and half the duplicated records should now have the
date_type = “end date”.
In the work orders example:
transaction_type = "open"and
transaction_type = "close"
It’s that easy.
Important to note! Now you’ve now doubled the number of records in your data set. So, for these lookup questions:
- How many total patients have we treated ever
- How many Gluten Free exist in the data set
You must be absolutely careful to avoid duplicating the value of your aggregate measures.
Each of those old-fashioned lookup questions should still be answered from the original lookup table. Or, perhaps, from your new transactional data, but with a filter to consider only half of the duplicated records.
The great Noah Salvaterra once said,
And the great Joshua Milligan writes,
In the end, this post is all about granularity.
Word Count: 989
- "Headcount, when given the Arrival & Departure Dates", Tableau Community Forum, Jul 19, 2015
- "Utilization, Queues, Census, Throughput for Blocks, Floors, Units, Rooms, Offices, etc. - What do you do?", Tableau Community Forum, Nov 21, 2012
- “Tidy Data”, Hadley Wickham, Journal of Statistical Software, MMMMMM YYYY
- "THE FIRST QUESTION TO ASK”, Joshua Milligan, VizPainter, Jan 13, 2015