Wednesday, February 14, 2007

WASP detection model

At the heart of WASP, the Web Analytics Solution Profiler, there is some JavaScript code and configuration parameters used to detect the presence of a specific web analytics solution. This post aims to explain the technique used to provide as much information as possible while avoiding true-negative or false-positive results.

What is WASP?

WASP is the Web Analytics Solution Profiler, a Firefox extension aimed at web analytics implementation specialists, web analysts and savvy web surfers who wants to understand how their behavior is being analyzed.

Why is it important?

Renowned authors and analysts recognize the web analytics JavaScript tagging process can be error prone and sometimes complex. Note: full text edited, please refer to each reference for the complete article.

"Some web analytics tools use one standard tag to collect data. Other vendors have a custom tag all over the place... it is important that you validate in QA and production that your ... tags are each capturing exactly what they are supposed to.

I know that Omniture has a nifty utility that you can use to validate and review that data is being collected ... as it should be. This is really nice and helpful and I do like it very much. Please ask your vendor if they have something like this (and they probably do)."
Avinash Kaushik's Web Analytics Technical Implementation Best Practices

Eric Enge: I have heard that one of the largest sources of error in analytics is the accuracy and problems with implementing the Javascript. Does that make sense?
Jim Sterne: First of all, web analytics numbers are not precise... So, the question about precision is a long process. The day you implement a tool you may well get bad data. So, verify everything that you possibly can. But eventually, you are going to reach the point of diminishing returns...
Eric Enge interview of Jim Sterne
WASP aims to ease quality assurance Avinash is talking about and make it easier to reach that point of diminishing returns Jim Sterne refers to, regardless of the web analytics solution being used. Furthermore, WASP offers a significantly more intuitive and easier to use tool than vendor specific solutions available today. Wait! There's still more! (I like it, it sounds like a shopping channel info-mercial!) Most sites now includes tagging for several different but complementary purposes (check screen capture of Avinash's site below).

How does it work?

  1. The WASP Firefox extension sidebar is triggered whenever a new page is loaded, a new browser tab comes into view, or when the sidebar itself is first shown.
  2. WASP watch for all HTTP GET requests sent by your browser, regardless of their type (images, scripts, stylesheets, frames)
  3. Once the page is fully loaded (we don't want to slow down page loading and processing!), for each web analytics solution found in the configuration file, we check for the presence of a very specific object only that particular web analytics solution should be setting. This could be a variable, a function, or any unique object defined in JavaScript.
  4. If that object is found, we now look for an HTTP GET request that match a specific regular expression. Again, this pattern should be something unique to this web analytics solution.
  5. When a match is found, we can check for any particular Query String parameter being passed and cookies being sent or retrieved and display that information in the sidebar.
Using both an object and a query check reduces the risks of wrongly identifying a product. At the same time, if a product tab is shown in the WASP sidebar but there are no Query String or Cookies detected, this might be an indication that there is actually no data being sent, even if the tagging itself seems to be present in that page.

Quite simple, isn't it!

Yes and no, early prototypes built with Greasemonkey quickly revealed some solutions were rather complex to detect, while others were very simple. Some are even able to "hide" themselves... or at least, try!
Another aspect that turned out to be more complex than initially thought is the Firefox extension itself. Using the XUL (XML User Interface Language) and building the right "hooks" to the Firefox page loading events are not trivial.

Some examples

Google Analytics: the unique JavaScript object is a function named urchinTracker(), and we look for an image which path includes the value of the JavaScript variable _ugifpath. In the example below (click to enlarge), we see there are several different solutions being used.

Omniture SiteCatalyst: the unique JavaScript object is s_account, and we look for a request that contains that specific variable value.

Disclaimer: I'm not affiliated and do not receive any monetary incentive from the companies providing the solution WASP is analyzing. Neither M.Kaushik nor Omniture endorse WASP and the screen captures are shown for demonstration purposes only.

Get WASP!

WASP installation instructions are available here.

WASP is licensed under a Creative Commons Attribution NonCommercial NoDerivs 2.5 License.

If you use this tool for professional purposes, please think about a donation (look for the "Make a Donation" button in the right sidebar).