Tips for designing interactive visualizations

This is Jessica. This week I was working with some collaborators on a project that involved coming up with an interactive visualization. In this case the specific problem was related to ML error evaluation, but the process reminded me more generally how chaotic this kind of interactive design work can feel early on, where you have some class of datasets you want to support and some high level tasks or comparisons you want to let users make with the tool, but you need to somehow find a starting place without overfixating on some subset of the huge space of different ways you could encode the relevant subsets of data.  

Here’s a process I find helpful. It assumes data with some categorical or ordinal attributes and some quantitative variables: 

  • Start with the idea of a big grid, where rows and columns can be used to represent categorical variables, and each cell contains the correspond subset of data you get from crossing two categorical variables.  
  • Imagine each data point is a unique mark in the grid.
  • Now consider the queries you want to support (if you’re designing an interactive vis there are probably multiple). Consider what the outcome variable is for each query. If discrete, the levels of the outcome can be assigned to either the rows or columns of the grid, or it can be mapped to color (hue) of the marks. If continuous, and the query involves multiple categorical predictors, assume its plotted to an axis within each cell. If continuous and you only have one categorical variable, you could plot it to a single axis running the length of the grid rather than separate axes within each cell.
  • If you have remaining categorical predictors, assign them to the rows and columns, then to color. Only to shape if you must. If you have remaining continuous predictors, assign them first to the remaining axis in the cells, and then to size or color (default to single hued sequential color schemes).
  • Ask yourself what’s missing? If queries involve more than about 5 variables, you will run out of reasonable visual encodings, so interactivity may be necessary.But remember that if you start adding buttons or dropdowns for remaining categorical variables, and sliders for remaining quantitative variables, it will be hard to compare within those because people have to remember what they just saw. Maybe you need to let people select multiple values for these variables at once and see the corresponding views side by side.

Essentially, this process defaults to the idea of a trellis plot and combines it with the idea of unit visualization. Many visualization tools rely on trellis plots (aka small multiples, facet grids) because they prioritize position encodings which are generally the most accurate way for people to decode information. Unit visualization on the other hand has been less popular as a default in systems, but has been the topic of some research on teaching people how to use visualization effectively; it can be easier to think about data when there’s no confusion about what a data point is. My tendency to gravitate to unit viualization is as an easy direct way to try to counteract insensitivity to sample size in human judgments from viusalized data. If every mark is a point, its harder to ignore the amount of data you’re dealing with. 

I like this as a starting point for design, because you’ve baked in these two rules of thumb in and any deviation from using trellis plots and presenting sample size directly should consequently be because you had a good reason, not because you started brainstorming with some fancy encodings already in mind. I should probably give this advice in my interactive vis course when students are starting to design! I see a lot of visualizations that overuse interactive widgets and overlook the value of position encodings for allowing comparisons with the eyes that otherwise require memory, despite my many reminders that it’s all about making the comparisons as direct as possible.

This process does assume some things about the data, including mutually exclusive levels of categorical variables, but I think it could be helpful even if your data deviates slightly. In the project that inspired this post for example, the data points are associated with pairs of attributes, but a grid view still works where the same categorical variable is used for rows and columns in some views. 

Leave a Reply

Your email address will not be published. Required fields are marked *