Reading Google Sheets from Clojure

One of the user stories I had to tackle in a recent sprint was to import data maintained by a non-technical colleague in a Google Spreadsheet into our analytics database. I quickly found a Java API for Google Spreadsheets that looked promising but turned out to be more tricky to get up and running than expected at first glance. In this article, I show you how to use this library from Clojure and avoid some of the pitfalls I fell into.

Google Spreadsheets API

The GData Java client referenced in the Google Spreadsheets API documentation uses an old XML-based protocol, which is mostly deprecated. We are recommended to use the newer, JSON-based client. After chasing my tail on this, I discovered that Google Spreadsheets does not yet support this new API and we do need the GData client after all.

The first hurdle: dependencies

The GData Java client is not available from Maven, so we have to download a zip archive. The easiest way to use these from a Leiningen project is to use mvn to install the required jar files in our local repository and specify the dependencies in the usual way. This handy script automates the process, only downloading the archive if necessary. (For this project, we only need the gdata-core and gdata-spreadsheet jars, but the script is easily extended if you need other components.)

#!/bin/bash

set -e

function log () {
    echo "$1" >&2
}

function install_artifact () {
    log "Installing artifact $2"
    mvn install:install-file -DgroupId="$1" -DartifactId="$2" -Dversion="$3" -Dfile="$4" \
        -Dpackaging=jar -DgeneratePom=true
}

R="${HOME}/.m2/repository"
V="1.47.1"
U="http://storage.googleapis.com/gdata-java-client-binaries/gdata-src.java-${V}.zip"

if test -r "${R}/com/google/gdata/gdata-core/1.0/gdata-core-1.0.jar" \
        -a -r "${R}/com/google/gdata/gdata-spreadsheet/3.0/gdata-spreadsheet-3.0.jar";
then
    log "Artifacts up-to-date"
    exit 0
fi

log "Downloading $U"
cd $(mktemp -d)
wget "${U}"
unzip "gdata-src.java-${V}.zip"

install_artifact com.google.gdata gdata-core 1.0 gdata/java/lib/gdata-core-1.0.jar

install_artifact com.google.gdata gdata-spreadsheet 3.0 gdata/java/lib/gdata-spreadsheet-3.0.jar

Once we’ve installed these jars, we can configure dependencies as follows:

(defproject gsheets-demo "0.1.0-SNAPSHOT"
  :description "Google Sheets Demo"
  :url "https://github.com/ray1729/gsheets-demo"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [com.google.gdata/gdata-core "1.0"]
                 [com.google.gdata/gdata-spreadsheet "3.0"]])

The second hurdle: authentication

This is a pain, as the documentation for the GData Java client is incomplete and at times confusing, and the examples it ships with no longer work as they use a deprecated OAuth version. The example Java code in the documentation tells us:

// TODO: Authorize the service object for a specific user (see other sections)

The other sections were no more enlightening, but after more digging and reading of source code, I realized we can use the google-api-client to manage our OAuth credentials and simply pass that credentials object to the GData client. This library is already available from a central Maven repository, so we can simply update our project’s dependencies to pull it in:

:dependencies [[org.clojure/clojure "1.8.0"]
               [com.google.api-client/google-api-client "1.21.0"]
               [com.google.gdata/gdata-core "1.0"]
               [com.google.gdata/gdata-spreadsheet "3.0"]]

OAuth credentials

Before we can start using OAuth, we have to register our client with Google. This is done via the Google Developers Console. See Using OAuth 2.0 to Access Google APIs for full details, but here’s a quick-start guide to creating credentials for a service account.

Navigate to the Developers Console. Click on Enable and manage APIs and select Create a new project. Enter the project name and click Create.

Once project is created, click on Credentials in the sidebar, then the Create Credentials drop-down. As our client is going to run from cron, we want to enable server-to-server authentication, so select Service account key. On the next screen, select New service account and enter a name. Make sure the JSON radio button is selected, then click on Create.

Copy the downloaded JSON file into your project’s resources directory. It should look something like:

{
  "type": "service_account",
  "project_id": "gsheetdemo",
  "private_key_id": "041db3d758a1a7ef94c9c59fb3bccd2fcca41eb8",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "gsheets-demo@gsheetdemo.iam.gserviceaccount.com",
  "client_id": "106215031907469115769",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/gsheets-demo%40gsheetdemo.iam.gserviceaccount.com"
}

We’ll use this in a moment to create a GoogleCredential object, but before that navigate to Google Sheets and create a test spreadsheet. Grant read access to the spreadsheet to the email address found in client_email in your downloaded credentials.

A simple Google Spreadsheets client

We’re going to be using a Java client, so it should come as no surprise that our namespace imports a lot of Java classes:

(ns gsheets-demo.core
  (:require [clojure.java.io :as io])
  (:import com.google.gdata.client.spreadsheet.SpreadsheetService
           com.google.gdata.data.spreadsheet.SpreadsheetFeed
           com.google.gdata.data.spreadsheet.WorksheetFeed
           com.google.gdata.data.spreadsheet.CellFeed
           com.google.api.client.googleapis.auth.oauth2.GoogleCredential
           com.google.api.client.json.jackson2.JacksonFactory
           com.google.api.client.googleapis.javanet.GoogleNetHttpTransport
           java.net.URL
           java.util.Collections))

We start by defining some constants for our application. The credentials resource is the JSON file we downloaded from the developer console:

(def application-name "gsheetdemo-v0.0.1")

(def credentials-resource (io/resource "GSheetDemo-041db3d758a1.json"))

(def oauth-scope "https://spreadsheets.google.com/feeds")

(def spreadsheet-feed-url (URL. "https://spreadsheets.google.com/feeds/spreadsheets/private/full"))

With this in hand, we can create a GoogleCredential object and initialize the Google Sheets service:

(defn get-credential
  []
  (with-open [in (io/input-stream credentials-resource)]
    (let [credential (GoogleCredential/fromStream in)]
      (.createScoped credential (Collections/singleton oauth-scope)))))

(defn init-service
  []
  (let [credential (get-credential)
        service (SpreadsheetService. application-name)]
    (.setOAuth2Credentials service credential)
    service))

Let’s try it at a REPL:

lein repl

user=> (require '[gsheets-demo.core :as gsheets])
nil
user=> (def service (gsheets/init-service))
#'user/service
user=> (.getEntries (.getFeed service
                              gsheets/spreadsheet-feed-url
                              com.google.gdata.data.spreadsheet.SpreadsheetFeed))
(#object[com.google.gdata.data.spreadsheet.SpreadsheetEntry 0x43ab2a3e "com.google.gdata.data.spreadsheet.SpreadsheetEntry@43ab2a3e"])

Great! We can see the one spreadsheet we granted our service account read access. Let’s wrap this up in a function and implement a helper to find a spreadsheet by name:

(defn list-spreadsheets
  [service]
  (.getEntries (.getFeed service spreadsheet-feed-url SpreadsheetFeed)))

(defn find-spreadsheet-by-title
  [service title]
  (let [spreadsheets (filter (fn [sheet] (= (.getPlainText (.getTitle sheet)) title))
                             (list-spreadsheets service))]
    (if (= (count spreadsheets) 1)
      (first spreadsheets)
      (throw (Exception. (format "Found %d spreadsheets with name %s"
                                 (count spreadsheets)
                                 title))))))

Back at the REPL:

user=> (def spreadsheet (gsheets/find-spreadsheet-by-title service "Colour Counts"))
user=>  (.getPlainText (.getTitle spreadsheet))
"Colour Counts"

A spreadsheet contains one or more worksheets, so the next functions we implement take a SpreadsheetEntry object and list or search worksheets:

(defn list-worksheets
  [service spreadsheet]
  (.getEntries (.getFeed service (.getWorksheetFeedUrl spreadsheet) WorksheetFeed)))

(defn find-worksheet-by-title
  [service spreadsheet title]
  (let [worksheets (filter (fn [ws] (= (.getPlainText (.getTitle ws)) title))
                           (list-worksheets service spreadsheet))]
    (if (= (count worksheets) 1)
      (first worksheets)
      (throw (Exception. (format "Found %d worksheets in %s with name %s"
                                 (count worksheets)
                                 spreadsheet
                                 title))))))

…and at the REPL:

user=> (def worksheets (gsheets/list-worksheets service spreadsheet))
user=> (map (fn [ws] (.getPlainText (.getTitle ws))) worksheets)
("Sheet1")

Our next function returns the cells belonging to a worksheet:

(defn get-cells
  [service worksheet]
  (map (memfn getCell) (.getEntries (.getFeed service (.getCellFeedUrl worksheet) CellFeed))))

This gives us a flat list of Cell objects. It will be much more convenient to work in Clojure with a nested vector of the cell values:

(defn to-nested-vec
  [cells]
  (mapv (partial mapv (memfn getValue)) (partition-by (memfn getRow) cells)))

We now have all the building blocks for the function that will be the main entry point to our minimal Clojure API:

(defn fetch-worksheet
  [service {spreadsheet-title :spreadsheet worksheet-title :worksheet}]
  (if-let [spreadsheet (find-spreadsheet-by-title service spreadsheet-title)]
    (if-let [worksheet (find-worksheet-by-title service spreadsheet worksheet-title)]
      (to-nested-vec (get-cells service worksheet))
      (throw (Exception. (format "Spreadsheet '%s' has no worksheet '%s'"
                                 spreadsheet-title worksheet-title))))
    (throw (Exception. (format "Spreadsheet '%s' not found" spreadsheet-title)))))

With this in hand:

user=> (def sheet (gsheets/fetch-worksheet service {:spreadsheet "Colour Counts" :worksheet "Sheet1"}))
#'user/sheet
user=> (clojure.pprint/pprint sheet)
[["Colour" "Count"]
 ["red" "123"]
 ["orange" "456"]
 ["yellow" "789"]
 ["green" "101112"]
 ["blue" "131415"]
 ["indigo" "161718"]
 ["violet" "192021"]]
nil

Our to-nested-vec function returns the cell values as strings. I could have used the getNumericValue method instead of getValue, but then to-nested-vec would have to know what data type to expect in each cell. Instead, I used Plumatic Schema to define a schema for each row, and used its data coercion features to coerce each column to the desired data type – but that’s a blog post for another day.

Code for the examples above is available on Github https://github.com/ray1729/gsheets-demo. We have barely scratched the surface of the Google Spreadsheets API; check out the API Documentation if you need to extend this code, for example to create or update spreadsheets.

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *