Detailing Twitter mentions from across four years of the annual Tableau Conference, in a collection of 45 interactive network graphs, this project is published in close collaboration with Chris DeMartini. He is also presenting a curated collection of his beautiful hive plots from the same data.
You can find our two pieces on Tableau Public at these links:
Bringing It Together
My interest in the analysis of network graphs first piqued while studying in Stanford Online MOOC, Social and Economic Networks: Models and Analysis. A graduate level course intensive in math and theory, it was challenging; and also left me wanting for real world application of the concepts I had learned.
Bringing together my recent studies in R, Alteryx and Tableau, this project is that application.
If public data from Twitter is perhaps relatively benign? Then consider the power of enabling visual exploration of other more highly valued network data sets. Here is a great example:
Once online, our every movement, every click, sent or received email, our every activity produces a vast amount of invisible traces. These traces, once collected, put together and analysed, can reveal our behavioral patterns, location, contacts, habits and most intimate interests. They often reveal much more than we feel comfortable sharing.
The 2015 data for this project was harvested by Ratchahan Sujithan, working for me as an intern during the weeks leading up to TC15. Ratchahan tended to the python scripts diligently, each day calling the twitter API to collect and reshape the data. As the largest Tableau conference to date and also the year with the most ample collection of tweets before and after the event, the data volume for 2015 is much larger than we have for the previous years.
The 2012 to 2014 data was rescued from Excel and Tableau Data Extract files. Qualifying as as "Red Headed Step Data", it is never-the-less very sufficient for our purposes.
Many thanks to the folks from Tableau for providing those older data sets. If you happen to have a more complete or higher quality collection of Twitter data for these years, please reach out to me?
One of four Alteryx workflows used bring together these disparate sources is shown here.
Of principal importance to this pipeline is the ability to "vectorize" the data processing for each SubGraph. This means, it was important to build the Alteryx workflows and R scripts so that any number of SubGraphs can be processed logically, without requiring additional effort.
This vectorization is accomplished in R with only two commands:
mentionsList <- lapply(runSubset,processMentions) mentionsDF <- ldply(mentionsList, data.frame)
R is a vectorized language, which is awesome. This makes it easy to "apply" a function to each object in a list of objects using a single command. And to then consolidate the resulting list back into a one data frame with the second command.
Yet the main reason to vectorize inside of R instead of making a more basic call to R from within an Alteryx batch macro is because, due to the open source licensing restrictions, each call to R from a batch macro must startup a completely new R instance. And the performance degrades very quickly.
Performance aspects aside, the main takeaway around vectorizing the process is that, with just two formula tools in Alteryx to parse out topics based on either hashtags or time boxed events, and just two lines of R code to run the iGraph commands over each of those topics: now the entire pipeline is flexible and resilient.
It can process & visualize any number of network SubGraphs end-to-end repeatably, from raw data to interactive Tableau dashboard, without making logic changes.
In network analysis, a "node that is central to the network" is in some way a focal point or a main figure. The nodes with a high degree of centrality are often able to exert a greater degree of influence within that network.
The Tableau work brings this concept of network centrality into focus by providing two alternative centrality measures for navigating, filtering, sorting, and highlighting the data.
Betweenness Centrality is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness centrality has a large influence on the transfer of items or information through the network, under the assumption that item transfer follows the shortest paths. Although the “Betweenness” metric is important, it doesn't necessarily predict the ranking of members by a governing metric.
Eigenvector Centrality explains the degree to which a given node is connected to the most important node in the network. An “introverted” member, one with little or no “betweenness”, could in fact still be quite important due to its influence on other members who are themselves very well connected.
Recall that the concept of Eigenvector Centrality is at the heart of Google's original Page Rank algorithm. A web page that is linked to from other important web pages is, by virtue of those links, more important.
A slider in my Tableau workbook enables you to filter by Eigenvector Centrality. This can help in certain analyses by trimming away the users with a lower centrality and "zooming in" to those who are "closer to the center of influence".
When we place these two centrality measures side-by-side, for the 2014 Hans Rosling Keynote, it becomes self-evident that they are different measures, offering distinctive insights.
Here we see the tweeps through whom the information flows most efficiently, with the least number of hops. Notice how @hansrosling himself is not prioritized by the betweenness metric.
Here we see the ranking of tweeps based upon their connectedness to other highly ranked individuals. Notice how both @albertocairo and @hansrosling are prioritized.
Navigating These Views
To navigate these views, it's best to begin with a conference year and then choose your topic of interest. As the volume of tweets has grown significantly, the ability to navigate SubGraphs by topic is vital to making this rich and dense data consumable.
After choosing a year, topic, and a centrality measure, you can then further refine your exploration with any combination of the following mechanics.
Highlight Your Tweeps
Where is Andy
Find your tweeps using the Data Highlighter. For example, here's what it looks like when we play Where's Andy? during the TC15 Data Night Out.
But, in an extremely dense haystack, perhaps only finding the needle is insufficient?
For this reason, another key feature of this project is the ability to hover or select any user in the Jump Plot and filter to that person's inbound first degree network.
This makes it possible to remove all the nodes but those involved in conversation with a specific individual. A very different question! So here's what it looks like to play the new game.
Who is Speaking to Andy?
If you would like to use this workbook for data mining, please feel free. Since the hover action is slow through the web, consider downloading the workbook to explore hands-on and locally from Tableau Desktop.
An artifact of reality, the @Tableau twitter handle tends to be mentioned very frequently during the Tableau conference. And as a result of that reality, the @Tableau handle also tends to dominate the network centrality metrics.
In the image above, even using the data highlighter, notice how it can be difficult to hover your mouse exactly over @acotgreave in the Jump Plot? That's because, well, just like everybody else, he has been scrunched down into the extreme left of the betweenness centrality axis during the chaotic period of Data Night Out.
To lighten up on that scrunching effect, you might prefer to Hide @Tableau from the centrality measure Jump Plot.
Switching for this example to the TC14 Opening Keynote, here is the difference between deciding whether to show or hide @Tableau:
Filter by Eigenvector Centrality
In a dense network, for certain analyses it can be additionally helpful to further reduce clutter by zooming into the tweeps with higher degrees of Eigenvector Centrality.
For example, in the extremely busy graph of all 37,540 mentions in the #data15 hashtag, by adjusting the Eigenvector Filter we can incrementally remove layers from the outer edges of the network, like an onion.
All of #Data15
Eventually we reach a core nucleus of the #data15 inner circle. Those with an Eigenvector Centrality measure above 0.2.
Of these central tweeps, two are exceptional in that they are neither a Zen Master, nor a Tableau Ambassador, nor an employee. Congratulations to Gregory Lewandowski and Lyndi Thompson, you're on the inside of the velvet rope!
Highlight by Category
As seen above, it can be additionally insightful to use the shapes legend to highlight Attendees, Vendors, Zen Masters and Tableau Ambassadors.
If you have any contributions or corrections these category assignments, please send me a note!
Pan and Zoom
Lastly from the perspective of navigation, do also make use of the View Toolbar, especially when exploring the large networks.
Remember, if you pin the XY coordinates to a specific location, then you should also un-pin them when you change Topics.
Data Driven Insights?
Which new insights can we glean from Network Graphs?
Beyond their inherent beauty, the power of a visual network analysis is in the relative ease with which the underlying relationships can be explored and understood.
More than satisfying a curiosity, identifying visual patterns in those relationships offers an improved understanding of the real world dynamics at play behind vast amounts of social or economic data.
If you're inspired to download the workbook & mine this rich data in greater detail, then I'm curious. Which patterns do you see? Which insights do you discover? Please send them to me. I can write them up along with my own discoveries in a future post.
Thanks and Appreciation
Working in 3 tools, the amount of effort behind project has been significant. And as is often the case, when working with data we sometimes encounter challenges that are greater than ourselves. This is where the valuable support, help & input from our friends and colleagues is ever beneficial.
Much appreciation to Joe Mako! For his eternal kindness and, specifically, his assistance with adding cartesian joins to my Alteryx workflow for the richer presentation of the "inbound first degree" filter. And then, again for helping me to understand and work with the final granularity after all of the various forms of data duplication were done.
The data for 2012 - 2014 were provided by Michelle Wallace, Andy Cotgreave, and Mike Klaczynski. Jonathan Drummey was immediately responsive to our questions about conflicting URL filters. And Ali Sayeed graciously helped me to overcome a challenge in Alteryx when vectorizing the workflow.
This project comes to fruition as a wonderful collaboration with Chris DeMartini. We've done our best to assist one another, coordinate efforts, and cross link with URL actions between our visualizations.
Chris has been an absolute pleasure to work with. His hive plots on this same data set are absolutely stunning in their elegance! His valuable input on my data manipulations and the presentation has been prescient. And he was super helpful with troubleshooting the jump plot.
As a result of this recent collaboration, we hope to expand further upon this Twitter work by merging our efforts during TC16 in Austin. Here is a link to Chris' write-up on his hive plots: The Tableau Conference Network.
Be sure to check them out!
Word Count: 1,891
- Chris DeMartini, Tableau Public, August 10, 2016
- Tableau Conference Twitter Networks, Keith Helfrich, Tableau Public
- Tableau Conference Over the Years, Chris DeMartini, Tableau Public
- Social and Economic Networks: Models and Analysis, Stanford Online, April 1, 2013
- Metadata Investigation: Inside Hacking Team, Share Lab Investigative Data Reporting Lab, October 29, 2015
- ‘igraph’, R Package Documentation, CRAN, June 26, 2015
- Betweenness Centrality, Wikipedia, August 9th, 2016
- Eigenvector Centrality, Wikipedia, August 9th, 2016
- Quickly Find Marks in Context with Tableau 10's New Highlighter, Amy Forstrom, Tableau.com, June 2, 2016
- Tableau Conference 2016, Tableau Software, Austin, Texas, November 7 - 11, 2016
- Joe Mako, www.joemako.com, August 10, 2016
- Jonathan Drummey, DataBlick, August 10, 2016
- Tableau Conference Over the Years, Chris DeMartini, Tableau Public
- The Tableau Conference Network, Chris DeMartini, DataBlick, August 10, 2016