Tuesday, April 5, 2011

Analytics Canvas v1.1: serious ETL for web analytics

Analytics Canvas from Toronto-based nModal isn't merely a Google Analytics API tool, it's closer to a powerful ETL (Extract, Transform, Load). In fact, it reminds me of SAS Enterprise Miner.

Powerful ETL

I've been on the beta group and v1.1 should be available by the time you read this [1] - James Standen and the team at Analytics Canvas know their stuff! Basically, Analytics Canvas powerful visual query interface allows you, among other things, to:
  • intuitively build complex queries and apply advanced ETL rules on the data (transformation, filtering, join, sort, segment, summarize, etc.)
  • extract more data than the 10k per call imposed by GA API
  • combine multiple GA profiles, even from different accounts
  • easily extract more than the 10 metrics constraint imposed by GA API
  • work with sampled or full data sets and visualize the results 
  • combine with other data sources, be it Excel, MS SQL, Oracle, mySQL, etc.
  • export to flat files, Excel or databases

Why is Analytics Canvas important?

The future of web analytics is dim... if we want to bring business value, we need to be business focused, not merely looking at website visits. That's why I think "web analytics" as we know it today will give way to "business analysis" and "business intelligence" - and to get there we need to achieve higher level of online analytics maturity - and that's where tools like Analytics Canvas become essential.

A real-world scenario

I want to merge online behavior data with back-office data. I want to look at visits grouped by date and hour, days since last visit (recency), visit count (frequency), pageviews, time on site for each visit, as well as 5 different goals for each of the authenticated users to my site [2]. This data will then be merged with a SQL data source containing demographics info and monetary values of past customers (without any PII, so we comply with GA TOS - see my previous post on this topic). The end result is a nice RFM analysis model I can use to analyze customer behavior and make much better recommendations that speaks to business stakeholders.

The challenge
  • the volume of data can be substantial and break the 10k rows limit per GA call,
  • I have more than 10 metrics,
  • I want to group by date/time and get only the rows where there is a known user,
  • data needs to be joined with back-office data stored in a SQL database,
  • and of course, I want to export the result to an Excel file, or back into SQL, for further analysis.
The solution
The final canvas is quite easy to understand, let's look at each component.
The final canvas
We use elements from the Block Library to create a canvas.
Blocks library to build complex ETL
The Data sources are easy to define (here, I'm showing a GA query)
Simple yet powerful query builder
We can use the Join block to create any types of joins (those familiar with SQL will relate to inner, outer, exclusive and full outer joins).
The Join block

The Filter block
The SQL data source isn't shown here, I used the Excel data source for the sake of this example. The picture below shows the Filter block with a simple rule. The output then contains a new column with a True/False result.
Sample output
And finally, when we run our query, we can preview the results that will get exported to Excel.

My take

This simple example took only a few minutes to build. Doing adjustments is a snap and doesn't require a single line of programming or tedious testing rounds. The Analytics Canvas approach is different from Excel plugins like TatvicShufflepoint or Next Analytics (which I'm using for a super-cool dashboard to be presented here in the coming days as part of the "The math behind web analytics" series). Those coming from a database or even stats background will probably be more comfortable with Analytics Canvas than, say, purely marketing people. Analytics Canvas definitely offer very strong capabilities unprecedented in any other GA API tools I've seen.

If you want to extract massive amount of GA data and merge with back-office sources, Analytics Canvas is definitely worth a look!

Analytics Canvas will be exhibiting at eMetrics Toronto, April 25-29 - I will be there too!

---
[1] other than being on the beta list, I have received no compensation for this article.
[2] as per GA TOS, you are not allowed to store any PII info [wikipedia] in GA - but you can use a unique user id (not an email!) or in a similar scenario, you could use the e-commerce transaction id to reconcile with back-office data as long as it's not merged with any PII.