This post builds upon the theme of designing a performant data architecture for your high volume solutions in Tableau.
One core performance concept is that good design considers the entire solution stack. If you fail to design for performance at all vertical levels, then the worst performing layer will make the solution slow. A train is only as fast as the slowest car. And worse, if various layers have design problems, then your train likely isn’t moving at all.
We must consider the entire vertical solution, together as a holistic system, from the top to the bottom. And this design investment is best made at the outset. To focus performance efforts at only a single layer or to return to a poor performing system in hindsight in search of "one thing" to fix is insufficient.
Of the various layers in the typical solution stack, this post is focused on two: User Interface Design and Semantic Data Architecture.
Yes, as UI designer in Tableau you are also a Data Architect!
As with any “big problem", the solution to good performance on large data volumes is to break that big problem up into smaller pieces.
Sure, the entire thing may be too large to tackle all at once. But individually the smaller pieces are each manageable on their own. As with all big challenges in life, this is also true in data design!
My previous post Advanced Menu as Dynamic Parameter introduced the concept of building a Guided Analysis along with an innovative solution to multi-select, cross-data source filtering in Tableau.
Here we begin with a high level overview of the Guided Analysis, exploring what it actually looks like in terms of performant design.
I suggested in the advanced menu post that a typical Tableau dashboard that is built on large data volumes will likely follow Schneiderman's mantra4:
"Overview first, zoom and filter, then details-on-demand."
In this way, the Guided Analysis becomes a Yellow Brick Road: one that your audience can follow along, to explore in search of the "Emerald City" within their data.
Especially on large data volumes, for good performance the overview and zoom & filter views should always select from a pre-aggregated data source. Summarization requires computation. And computation takes time.
The granularity of each data source is tailored to the views it will serve, to avoid summarization at runtime.
If a view renders four bars, there is no good reason to build it from a data source with 400 million records. Shown with red numbers, Kate Morris and Dan Cory have demonstrated a performant Tableau dashboard built on three separate data sources5.
This Works Because
As each subsequent data source becomes deeper, the queries against it become more and more narrow. So, even on large data volumes, the dashboard performs because you’re always asking the broad and expansive questions of a shallow data source. And you’re asking only very narrow questions of the deeply granular data.
Combined with an optimized columnar database, these narrow questions of deep data can still perform very, very well! This is something we’ll expand upon in the future.
To weave in another recent suggestion, An Open Letter to the Wall of Data, it is usually only in the final "drill to detail” portion of Schniederman’s Dashboard that an Excel like cross-tab of rows and columns becomes appropriate. Not before.
In the earlier stages of the guided analysis ("Overview First" and "Zoom & Filter"), rendering summary comparisons visually leverages the pre-cognitive processing powers in the human brain. This allows your audience to "respond to what they see”, without having to think about it.
Then, towards the end, only after they have zoomed and filtered, only then is a cross-tab of hard facts perhaps appropriate. These are the emeralds they were searching for!
So now we understand why building a Guided Analysis is relevant to performant data design. Let’s move on to Logical Partitioning.
Logical partitioning reduces search effort by grouping like items together, so the things we’re not searching for don’t get in the way.
We do this all the time, because it works. We logically partition our homes to keep the living room separate from the dining room. We logically partition our cities to keep the Industrial zones separate from Residential & Commercial areas. We logically partition files into folders, photos into albums. Even this blog post is logically partitioned!
Just as with the Guided Analysis, Logical Partitioning is another tactic for breaking that big problem up into smaller, more manageable pieces. And just as in other parts of our lives, logical partitioning also plays an important role in performant data design.
Let’s say our original data set has ~1 billion records. That’s a lot.
Depending on your infrastructure, even a million records could perform slow! Ten years ago, half a million records was huge. The point is: good design occurs at every layer of the solution stack and good data architecture is key to capturing the best performance from your hardware.
So, in this example, the number is a billion. And we will logically partition those billion records by Region.
We Do This Because
Search takes time. And on a big data set, even the best Guided Analysis with pre-aggregated data sources may not be sufficient.
Yet when the two techniques are combined together, now the bite-sized pieces are becoming much more responsive. Building a summary & detail data source for each of three distinct regions eliminates a huge amount of runtime search and runtime summarization.
Query times are faster because the data sets are smaller. Rendering is faster because we've saved cross-tabs until the end. Now the extracts are smaller. And not only do they finish faster, but the extracts can also run in parallel.
We've gone from a single solution on a billion records for everything, to three solutions each approximately one-third the size. And each with a summary data source for the summary views. Each rendering only the data required, one step at a time.
Enter the world-wide executives. These are high-level folks, whose decisions span across all regions. Their decisions span across all product categories, across all lines of business.
Now that each region is physically separated, have we lost the ability for these high-level individuals to compare across the logically partitioned data? Of course not!
These high-level decision makers don’t ever compare transaction level detail between regions. Rather, they only need a path to get to those transaction records when there is an exception.
The world-wide dashboard is therefore built on a pre-summarized data source at the world-wide level. Dashboard actions guide those consumers along their own Yellow Brick Road, tailored for them, down into the regional views, and eventually down to the record level detail.
World-wide executives are busy people. In reality, they likely won't drill down to the gory transaction-level details very often. But we certainly can, and should, build this for them! Imagine how happy that executive will be, late at night with a problem to solve, when they are able to find the detailed answer they're looking for:
- in a visually intuitive way
- with fast response time
- from the highest level summary down to the most granular line-item detail
This is good data design. And it is your job to build it for them.
Good performance is always the culmination of many, many design decisions. Good design must occur at every level of the vertical solution stack.
This post highlights two data design techniques for performance:
- Guided Analysis, at the User Interface Design layer
- Logical Partitioning, at the Semantic Data Architecture layer
Both techniques significantly improve performance by breaking up a large problem into smaller pieces. Combined together with the Advanced Menu as Dynamic Parameter, it is possible to achieve both good performance and multi-select, cross data source filtering in Tableau.
Bad data design on expensive infrastructure is no solution at all. So, as the volume of data in our lives continues to grow, good data design techniques like these are imperative for maximizing hardware dollars.
Word Count: 1,379
- "Vertical Technology Stack”, Pyramid image built in Tableau by Noah Salvaterra, DataBlick, February 26, 2016
- "Advanced Menu as Dynamic Parameter", Keith Helfrich, Red Headed Step Data, January 13, 2016
- "Guided Analysis", Joshua N. Milligan, Learning Tableau, O'Reilly Media, April 27, 2015, p. 192
- "The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations", Ben Schneiderman, Department of Computer Science, Human-Computer Interaction Laboratory, and Institute for Systems Research, University of Maryland, September, 1996
- "TURBO CHARGING YOUR DASHBOARDS FOR PERFORMANCE", Kate Morris + Dan Cory, Tableau Conference 2014, Recorded Session, September, 2014
- "Open Letter to the Wall of Data", Keith Helfrich, Red Headed Step Data, October 7, 2015