Monday, March 7, 2011

The math behind web analytics: the basics

Introduction

I tutored about 700 students enrolled in web analytics and business analysis classes at UBC and nearly a hundred in the new graduate-level class in online analytics I'm teaching at Laval University. Most students at ULaval are enrolled in MBA specializing in ebusiness or marketing - one day, they will manage organizations and leverage analytics to make better business decisions. In the meantime, questions and assignments are an endless source of inspiration and challenges to solve.

This is a first post of a series entitled "the math behind web analytics". The idea stems from a question posted by a UBC student to the Yahoo! Web Analytics forum: "what mathematics does a web analyst need to know?" I was somewhat baffled by the replies: "plus, minus, min, max, average... not much practical use of it (mathematic/statistics) within web analytics" or "simple counts or averages" and of course, "percentage... because that's what you'll use most often, e.g. with KPIs, conversion rates, etc".

What?!

Assignment: basic analysis

One of the first assignment in the ULaval class is simply stated as "analyze the visits to website XYZ" and the students are provided a data set.

Learning point: When referring to a metric broken down by a time-based dimension, we refer to a "time series": a sequence of data points measured at uniform time intervals.

In this first post we address what appears to be easy and obvious: graphing the data.


All web analytics tools provide basic visualization functions, as in the example shown above from Google Analytics. This graph shows visits by month.

Learning point: Notice how I used thirteen months of data instead of twelve. This is especially important to be able to compare year-over-year and more easily spot seasonality. Basically, we should always include at least one additional period in our analysis. Here, we clearly see an upward trend and certainly some year-to-year progress.

However, a monthly breakdown hides some interesting elements. When the same data is shown by day, we can see something slightly different:

The trap

This is the extent of visualization you'll get in most tools. And most would-be analysts will report something like "there was X number of visits, and the average was Y visits/day" simply because this is what the tool says. Some will mention an upward trend but won't be able to quantify it, at best, a few will switch from monthly to dayly view and mention the very common weekdays/weekend pattern.

What's most important, I rarely see an explanation for what happened where we see spikes of traffic - which, in this case, are explained by marketing and external, business-related events.

Coming up: Excel to the rescue

In the next installment of "the math behind web analytics" we'll use Excel to do some basic analysis.