Step by Step Processes

The page is a tutorial of sorts, that will show you how to gather and scrape, and refine and edit your data, and produce visualizations from your data that promote a better understanding of the trends and patterns you might have otherwise missed.

Webscraping - Gathering the Data

THis is an example of a Webscraper selector graph, which shows the selectors I created and how it will automatically go to scrape the next page of data Brief intro paragraph to tutorial To begin our project, we found three databases that contained data relating to Civil War monuments, markers or memorials. These databases include the National Park Service Soldiers and Sailors database, the Register of Historic Places, and the Historical Marker database. Using the Webscraper extension for Google Chrome, I, the Project Manager and Data Scientist, was able to gather two hundred and thirty six entries of data to be analyzed in our later visualizations. When using Webscraper, I had to first create a new sitemap with the URL of the webpage I Wanted to start the scraping at, which would be the root selector. Once the sitemap was created, I added a new selector under the root. When adding a new selector, I needed to give it a title, identify what type of data or action it would take from the site. The first selector I made, named "Title," was for a link that had the title of the monument, and once clicked, would have all the information I needed to scrape. After creating that selector under the root, I then created another selector under the Title selector, and asked it to scrape the text I selected, which was only three lines of text out of about twenty. Under the root selector, I then created another selector that would automatically scrape all pages of data, without me having to create a selector for each page, which would have taken a while since there were twelve of them. When I created this selector, I identified the type as a link, then placed this selector under the root, but also under itself, the key step that would enable the Webscraper to automatically scrape the following pages. Once the Webscraping was completed and the sitemap exported to an Excel spreadsheet, I was able to upload the spreadsheet into Google OpenRefine.

OpenRefine - Cleaning the Data

An example of using the text facet to edit the data and remove things like extra spaces from the ends of some words OpenRefine is a data editing program that allowed me to clean up the data and identify problems with it. For instance, I was able to use the Text Facet option in OpenRefine to delete any spaces that may be after certain data entries, so it would be grouped with other identical entries that didn't have the space to begin with. Splitting the data into multiple columns I was also able to use OpenRefine to remove columns, and split one column that had multiple sets of data into multiple columns. Once I was done cleaning up the data in OpenRefine, I could then export the project and enter it into Tableau to create visualizations for it.

Tableau - Visualizing the Data

Tableau dashboard Tableau was the program used to create visualizations from the data as a way to visually illustrate trends and patterns better than numbers in a spreadsheet could. Tableau is a relatively easy program to use, and I began by uploading my excel spreadsheet with my cleaned data. Once the data is uploaded, Tableau organizes it based on the columns that were present in the spreadsheet and if they were numerical values or not. Tableau has areas already designated as columns and rows, and I was just able to put certain data into the row field and other data into the columns field. Or, I could put two pieces of data in the row field, and then have the column field be the number of records I have in those two data fields. Tableau example of columns and rows After I learned the basics, I began experimenting with the different combinations of my data and the types of varying visualizations they produce. I was able to play with the sizes and colors of the visualizations, to produce visualizations that best captured the trends in my data I was trying to illustrate. I was able to create multiple dashboards where I could place two visualizations side by side to compare or contrast, and then, finally, I was able to export the visualizations as images, and also as interactive links that are visible on this website.