In a performance optimized Data Mart, especially one utilizing Massively Parallel Processing (MPP) and Columnar Storage such as Vertica, the creation of temporary tables by a front-end tool like Tableau can be devastating to your response times. Temp tables have long been loathed by the administrators of all database technologies, across the board, for their negative impact on performance. But the situation becomes all the more exceptional in Vertica, precisely because the DBMS is so highly optimized for performance.
In this post I summarize some of my findings with respect to the use of temp tables by Tableau. This information is possibly incomplete or imprescise, and is certain to decay in accuracy over time. Yet the broader strokes & various considerations are ones you can utilize as context for your own investigations.
One of Three Strategies
To make use of the results from earlier queries during blending, grouping, and filtering operations (such as add to context), Tableau will generally choose between one of three strategies. If the first strategy doesn’t work out, then Tableau quickly tries next one. If that fails, then Tableau tries the last one.
- temp table in the database
- sub-query in the database
- data engine .tde
Note that from data source to data source, the sequence of which strategy is attempted either first or second or third will also vary.
Arguably, Tableau would always prefer to offload as much resource intensive heavy lifting onto the backend as possible. So, while the sequence of the first two strategies may toggle depending on the data source, Tableau will often opt to fall back onto creating a temporary extract & performing any joins or blends to that .tde using the Data Engine only as a last resort.
Step 1 is Upgrade
In their TC14 session, Turbocharging Your Dashboards for Performance, Kate Morris and Dan Cory emphasize that, first and foremost: the #1 best thing you can do for performance is to upgrade to the latest version of Tableau.
This post is in many ways an example of that message with supporting detail. The Tableau engineers are continuously improving the product’s approach to performance. When you delay the upgrade, you delay the benefit of that engineering investment. And if you fail to undo your previous work-arounds after the upgrade, then you may never see the benefit at all. Read on.
Why are Temp Tables Atrocious in Vertica ?
The Vertica DBMS is designed for INSERT ONLY disk operations (no updates, no deletes), and for MPP read performance. Leveraged correctly, these technologies deliver millisecond response times on billions of records.
The use of temp tables flies in the face of these optimizations for a number of reasons:
INSERT ONLY ≈ READ ONLY
In a high performance Vertica data mart, non-read disk operations are limited to BULK INSERTS at specific times of day. In other words, with exception to specific windows (usually at off hours), the disk activity in a high performance Vertica cluster is READ ONLY.
Temp tables are contrarian in this I/O environment because the temp data is INSERTED (at query time), and then DELETED. Boo!
Think: “petting the kitty backwards”
MPP = Data Access Alternatives
The Vertica MPP cluster optimizes reads across many nodes, each of which is interchangeable.
Temp tables in this environment are contrarian because their data is written onto a SINGLE NODE of the cluster. All queries to it require access to that one node’s resources.
Think: "You’re doing it wrong"
In the log files, or perhaps in an error message, any reference at all to a schema by the name of "TableauTemp" is a sure sign of the Tableau Server Data Engine. The name “TableauTemp” is never used for any object created in the backend database. In other words, since your database for some reason won't allow for other more preferable strategies, Tableau has been forced to create a temporary .tde extract. And it places that extract into a schema on the Tableau Server called "TableauTemp".
The error message “DataSourceException: Tableau Data Engine Error: 4: no such schema (TableauTemp) ” has been observed to occur in a number of situations. There are apparently multiple reasons it can happen. Basically, the error message telling you that the Data Engine has deleted the temporary .tde file, but for some reason we actually wish we still had it.
Combine with the fact that resource intensive heavy lifting should rather be performed in the database, and it becomes easy to posit this simple rule of thumb:
If you’re seeing the “No Such Schema (TableauTemp)” error message, or even if you’re seeing “TableauTemp” in your log files & just want better performance, then your first solution is to seek ways avoid being forced into the Data Engine to begin with.
Goldilocks: Just Right
By now, if you’ve read the above carefully, then you’re likely beginning to reach some conclusions about the three strategies that Tableau Server might employ depending upon your data source. With Vertica, only one of them is “just right”.
- yikes!: temp table in the database
- just right: sub-query in the database
- yikes!: data engine .tde (TableauTemp)
And the Tableau engineers agree with you! Starting in version 8.2.2 Tableau has, for Vertica source systems, changed the preferred sequence to use sub-queries by default. Prior to 8.2.2, Tableau would build a temp table in Vertica by default and then fall back to the Data Engine.
Now only a handful of grouping actions still require temp tables in Vertica, sub-queries are almost always used by default, and the Data Engine is there only as a strategy of last resort.
This means that if yours is plain vanilla Tableau on Vertica, then it’s likely you can simply upgrade and marvel at the faster performance.
But if you had previously customized your Vertica connections, for example by publishing your data source to the server which would (in the past, at least) invoke the Data Engine, or by fine tuning the data connection with a customized .tds file, then you still need to pay attention. Those previous customizations you made to avoid temp tables in Vertica by intentionally diverting Tableau’s strategy decision onto the Data Engine are likely still taking effect after the upgrade. As a result, you may still be forcing the Data Engine "yikes!", even after the default behavior has evolved to sub-query "just right".
Word Count: 1,062
- "Data Mart", Wikipedia, January 18, 2015
- "Scale-Out MPP Architecture", Vertica.com, January 18, 2015
- "Columnar Storage & Execution", Vertica.com, January 18, 2015
- "Turbocharging Your Dashboards for Performance", Kate Morris and Dan Cory, Tableau Customer Conference, September 8 - 12, 2014
- "Customizing and Tuning ODBC Connections", Tableau Knowledge Base, October 22, 2014