How to Visualize Scientific Information
Tableau Software Tutorial
Scientific data includes text or categorical information, quantitative information, and other qualitative information in terms of observations. With most interactive visualization tools, data that includes multiple categorical variables and a time variable are somewhat easier to work with than data that has only one category or no time factor. In general, most scientific data has a temporal component while the number of categories can vary. In this project, we generally dealt with quantitative variables over years. One set of data encountered contained multi-monthly data and had to be processed to generate annual averages.
- After collecting the data, you will need to review the variables to see what may be of interest to visualize.
- Keep in mind when selecting variables that for interactive visualization tools, complete data sets are usually required, meaning for the entire time period under study, there should be no holes in the data across all variables. Usually multiple sources of data are needed to complete a data set.
Formatting the Data
- Data for selected variables must be in some pre-defined structured format to be properly interpreted by information visualization tools. The structure is generally tool dependent. Data usually found in Excel spreadsheets has generally been cross tabulated and/or aggregated in rows.
- For Tableau and Goggle Motion Chart, it is necessary to have complete data sets. Prior to reformatting data into proper format for your tool, it is first necessary to review your data thoroughly and delete categories, time periods, or numerical variables for which data is missing; for example,
- In one of the data sets studied, CO2 emissions number was the main variable. CO2 data generally existed for years 1971 to 2007; for Armenia, however, data only existed as of 1992 so this country had to be deleted from our list.
- There was CO2 emissions for Kiribati but for another variable under study, fossil fuel consumption, there was no data available, so this country had to be deleted from our study list.
- The main data set from the World Bank had data for variables ranging from 1960 to 2009. As CO2 emissions data only existed for a subset of that time period, all data in the years prior to 1971 and after 2007 were deleted.
- For large data sets, including those included within this study, usually need to be normalized back to “raw data” structure, resulting in displaying the data in column format. If you are knowledgeable on macros, you could write a macro to use and pivot the data to the proper format. For Tableau, the Tableau Data Tool can be downloaded and installed into Excel as an Add-In functionality to help reshape the data. Use the Reshape Data Add-In to reformat your data from row to column format.
- For analyzing scientific data over time, after data has been re-formatted, there should be one row of data for each categorical value be it country, state, vaccine type, client for a numerical variable (CO2 emission) for a single unit of time (year, month, day).
- As a check for data set completeness, you should compile the data for all variables across one sheet. On this sheet, verify that there is the same number of rows of data across all variables by going to the last row of data and seeing that all the columns match up with no blank spaces at the end of each column. If one column or variable of data is too short, then there was a time period or periods for which data was missing and you have an incomplete data set. Leaving data in this incorrect format will cause problems with the visualization tool and force the tool to delete the variable, leaving you with the inability to plot the variable and consider it in the study.
Importing the Data
- To import data into Tableau Desktop, open up a new Tableau worksheet, and go to label “Connect to Data”. Select your data source type, whether it is Microsoft Access, Microsoft Excel, or some other database.
- Browse to the location of your data spreadsheet and select the appropriate spreadsheet file. Tableau will then import the names of the various sheets within the worksheet.
- If all of your data is compiled into one sheet, choose Single Table and choose the desired sheet.
- Select Multiple Tables if your data is spread out across more than one sheet. You will then need to move through other screens to select the additional sheets and the fields for the desired variables from each sheet. You will then need to combine the tables using the “Join” feature.
- After selecting table type, choose the selection which then best fits the field names, whether the first row of data contains the header names or whether you want Tableau to generate them. Then make sure the connection name for Tableau matches the selected spreadsheet name.
- On the Import screen, select Import All. You can choose to “Connect Live to Data” if your data source is located on a server and you want your visualization to update as the data source is updated
Selecting Variables and Visualization Type
- When your sheet is imported, Tableau attempts to recognize the types of variables based on field names and attempts to divide them into Dimensions and Measures. Dimensions are considered categorical type data such as Country, Continent, State, etc. Measures are quantitative data or data that can be added together. Review the variables in each section and then you can choose to move some variables from Dimensions to Measures and from Measures to Dimensions.
- Choose the variables from Dimensions and Measures area that you want to try to visualize. If you have an idea of what type of visualization chart you want to create – Line Graph, Bar Graph, Scatter Plot, Pie Chart, Heat Map, etc. -, move the variable names into the Columns and Rows fields. If you know that you want a Map chart, move “Longitude” from the Measures area into the Columns field and “Latitude” into the Rows field. If you choose Maps, you may need to edit location names if the imported location names are not recognized by Tableau. To edit location names, right click on a Dimension variable (State, Country, Continent, etc) and go to Geographic Role. Choose a geographic standard if appropriate and then go back to choose Edit Locations. A graphic will be displayed with the names of Data Values and what Tableau things is the Location. If the Location is incorrect, it will appear in Red type. Tableau will propose a list of locations which may be better suited to the data value.
- If you are not sure of what chart type to use, highlight the variables in Measures and Dimensions by selecting a variable and holding down the CTRL-Click. Continue to select variables while holding the CTRL key. With the variables highlighted, select “ShowMe” from the toolbar. Using this option, Tableau will highlight chart types that best fit the variables chosen. Choose one of the highlighted chart types. You can go back and change the type if later if the initial graph type chosen does not seem to be the best choice.
Creating the Visualization
- After the initial visualization is generated, you will want to add clarity and depth to the data by defining the symbol for marks or points, selecting color gradients, and setting mark sizes. Additional variables can be visualized through the Color and Size fields.
- To define the symbol to be used for your data points, go the Marks shelf and scroll through the choices. You can choose Circle, Bar, Square, etc. It is recommended that you choose a symbol that best aligns with your data. As I am illustrating CO2 emissions, most of the visualizations use circles to depict the gas vapor (in reality, of course, these are invisible to the naked eye). If you want a filled or an unfilled mark, go to Shape and make a selection. Within Shape, you can also pick from different palettes such as Weather, Currency, Arrows, and Gender symbols.
- A single color can be used for data points. Multiple colors can also be used by defining a Measures or Dimensions variable. Drag a variable from Measures or Dimension shelf to the Color field. You can edit the color selections by moving down to the Color card and clicking on the arrow on the right hand side of the card. It is best if the visualization displays no more than 6-10 colors. To leverage the human perceptual system and have the data points stand out, it is also recommended to use bold, strong, high saturated colors for the marks, assuming that the background is of a subtle, low saturated color. The color choices are important as it helps to more easily uncover and convey trends, outliers, and patterns.
- A single size mark can be used for the data points or multiple sizes can be used by defining a variable. Like color, sizing of the marks is important to the visualization as differences can be easily noticed by the human perception system. To define multiple sizes of marks, drag a variable from Measures shelf to the Size field. Edit the scaling of the marks, by moving down to the marks size card or legend marks size legend and clicking on the arrow on the right hand size. You can choose to let Tableau define the sizing scale or you can choose to enter the start point, endpoint, and center point. Adjust the sizing such that the users can easily distinguish between the endpoints and the intermediary levels.
- When a user hovers over a data point, data values appear in a graphic. You can choose to expand the level of detail by dragging over variable names to the Level of Detail card.
Make the Visualization Interactive
- Survey the visualization on the clarity of the message you are trying to convey. Adding interactivity to the visualization allows users the ability to change the data view and how they are looking at it. Users may want to look at data for specific years or countries or countries that belong to one or more continents.
- Within Tableau, separate Pages can be created using Dimensions variables. The most common choice for pages is the time variable, but pages can also be created for specific or all countries, continents, and other Dimensions variables. Drag the desired Dimension variable to the Pages shelf and then set the starting place or time and how fast you want to move through the pages.
- Interactivity can also be added by using Filters. Filters allow you the data to be parsed according to the variables chosen for the filters. Filters can be applied locally to a single visualization or across a few visualizations on a dashboard. Dashboards can utilize linked displays where a user can view different graphs or maps with only data for selected filter parameters; for example, if you have a continent filter, you can view data on all charts in a dashboard having to do with one selected continent or multiple continents, rather than all of the continents. To add filters, drag variables from the Dimensions card to the Filters card. On the dashboards, be sure to select local or global filters.
Evaluate the Visualization and Question
- After completing a visualization, you should assess it to determine it illustrates an effective message. Ask yourself: Does it illustrate what you want it to convey? Does it make sense? Does it leverage the human perceptual system well?
- Have your questions been answered or theories or observations confirmed or has the illustration brought to mind more questions?
- If new questions have arisen or your original questions have not been answered, you may want to edit you visualization or choose to look at a new visualization of other variables and/or using a different chart type.