Using Google Drive to make Survey Forms and importing answers into R

I remember when Google Docs first launched. I was still in high school and I immediately became a Google evangelist. I told everyone to start using this wonderful new cloud-based service. I don’t think the term ‘cloud-based’ even existed at the time, although it’s more likely that I was simply not aware of its existence. Since then the service has grown substantially. It includes a lot more features, a significantly better UI, and it even lets people design surveys!

It's even easy to make a survey with branching questions. See:

The form can accessed here. Feel free to submit a response. This is the form we'll be using in this example. (Responses so far.)

The results of such surveys are stored as Spreadsheets on Google Drive (formerly Google Docs & Spreadsheets). But what if we want to access all those answers in R and perform some EDA or analysis? And what if we don’t want to go to the Google Drive page and download the results as a CSV manually? Let me show you how we can achieve all of this with some R code after a quick-and-easy initial setup.

I assume you've designed the form and may or may not have responses already. Open it up and you should see the spreadsheet. Click on File -> Publish to the web. Then click Start Publishing and pick CSV under Get a link to the published data

Copy the link. You will use this as the filename.

filename <- "https://docs.google.com/spreadsheet/pub?key=0ApOyZxZwgCv6dC1uUUVVbl9ieEJSQjhMQWpGZUxuYUE&output=csv"

By default, R has some issues downloading files through https; so we need to use a package called RCurl.

# Checks if RCurl is installed. If not, installs it.
if (sum(installed.packages()[, 1] == "RCurl") < 1) install.packages("RCurl")
require(RCurl)

Then we need to actually download the survey results. This is done through the following script:

txt = tryCatch(getURL(filename), error = function(e) {
    getURL(filename, ssl.verifypeer = FALSE)
})
tc = textConnection(txt)  # Opens a connection.
survey.results <- read.csv(tc, header = T, stringsAsFactors = TRUE, na.strings = "")
close(tc)  # Closes the connection.
rm(txt, tc, filename)  # Cleans up the workspace.

You'll actually get really long column names as well as a column of the dates and times of the submissions. Let's clean this up this really quick:

# remember to modify data frame name to your needs
statisticians <- survey.results[, -1]
# remember to modify as appropriate
names(statisticians) <- c("software", "used.ggplot2", "role")

Here's what the data looks like:

software used.ggplot2 role
R Yes Graduate Student
SAS NA Professional

Thanks for reading! Enjoy :)