Monday, January 26, 2009

Switching Web Analytics solution: a case study

While doing quality assurance of a migration from WebSideStory HBX to Omniture SiteCatalyst we noticed some variations in the atomic metrics (or what the Web Analytics Association calls "building blocks"): page views, visits and visitors. While expected, I think it is important to understand why there are such discrepancies between various web analytics tools.

In this case study I'm demonstrating HBX and SiteCatalyst, but similar differences could be noticed with any tools. As you will see, generally the trends are identical, but the scales are different. Let's take an example from our daily life. Oven and meat thermometers both measure temperatures; a good cook will be able to use both to make a great dinner; but don't expect both thermometers to show the same result or you're calling for disaster!

In this article, I will focus only on page views, visits and visitors since almost all other reports are derived from those base metrics. I will consider the following points:
  • Hypothesis behind this analysis
  • The Page View metric
  • The Visits metric
  • The Visitors and Unique Visitors metrics
  • Factors that can influence those metrics

Methodology

The same site was tagged with both solutions, received a significant volume of traffic, and we collected data for over a month. The site only had a couple of different templates so we could easily make sure all pages were tagged and working correctly. WASP was used to audit the site and make sure the data being collected could be trusted. The exclusion filters and other configurations were double-checked to make sure they were similar.

There are few documented examples of companies switching from a product to another. Most observations are either anecdotal, the two products were not used in parallel or the web analytics switch coincided with a site redesign. For the same tools, some people have reported similar results while others have seen very different ones. The word of caution here is: "results may vary". Depending on the nature of your site and the type of traffic you get, which tools you are switching from and to, the impact will likely be very different.

Page Views

Page Views are the atomic elements of web analytics (now, along with events). As such, page view is the metric that should be the reported consistently by various web analytics vendors.

Both HBX and SiteCatalyst defines a page view as (ref. Omniture KB article #7824) "a request for a full-page document containing tracking code on a website. Each time the page is loaded, an image request is generated. Refresh, back and forward button traffic as well as pages loaded from cache will be counted as a page view." The WAA definition is rather short, stating simply it's "the number of times a page was viewed."

Looking at the graph (click for larger view), we see the discrepancy between the two solutions is acceptable, at an average of 2.2% (control limits between 1.5% and 3%, shown on the 2nd vertical axis). In this case, HBX is reporting slightly higher numbers than SiteCatalyst.

Visits

In both products, "a visit begins when a visitor first views a tracked page on a website. The visit will continue until there have been no additional tracking requests from the site for 30 minutes (default) or until the maximum visit length occurs (12 hours)."

If a visitor stays on one page for more than 30 minutes without additional tracking requests being sent and then resumes viewing additional pages, causing additional tracking requests to be sent, both products will register a new visit.

Both products require persistent cookies to count a visit. If a browser does not accept persistent cookies, both products will not calculate a visit for that visitor (but will track the page view).

This is compliant with the WAA standard definition of a visit: "A visit is an interaction, by an individual, with a web site consisting of one or more requests for a page. If an individual has not taken another action (typically additional page views) on the site within a specified time period, the visit will terminate by timing out."

The results for visits shows more variations, at 12.6% (control limits between 11.4% and 13.8%). That is, SiteCatalyst reports more visits than HBX does.

Visitors and Daily Unique Visitors

Since this metric is highly influenced by Visits, the discrepancy is also high, at 10.5% (limits between 9.1% and 11.9%). But here, HBX and SiteCatalyst handle the identification of unique visitors very differently. Although they both use persistent cookies, SiteCatalyst can fall-back to IP+User Agent to track unique visitors. The impact is significant:
  • HBX will still record page views for visitors rejecting cookies, "but will not be counted as a Visitor or Visit and will not generate visit-level data." (KB #7824)
  • SiteCatalyst, however, will still count visitors rejecting cookies as a Visitor (via fall-back) and will generate page-level data, but will not be counted as a Visit or generate visit-level data.
This explains why the discrepancy in Visitors might be lower than the one for Visits.

Factors to consider

Why such differences? Here is a list of various factors to consider:

1st party vs. 3rd party cookies
We can't insist enough on the importance of using 1st party cookies. In our case, HBX was historically tagged using 3rd party cookies (served from hitbox.com), while SiteCatalyst was tagged using 1st party cookies (served from their own domain). 3rd party cookies are much more likely to be blocked and deleted, some estimates states up 40%, even up to 50% of 3rd party cookies are either blocked or deleted on a monthly basis.

Tag code location
Some internal studies conducted by Omniture revealed that, for the same tool, putting the tagging code at the top vs. bottom of the page could lead to 3% to 8% differences in page views being collected. Depending on the site, visitors quickly sifting through page navigation might click on a link before the page is fully loaded, thus, before the tags are fired.

In our example, the SiteCatalyst code was implemented near the top of the BODY tag, while the HBX code was just before the closing BODY tag. Good practice generally recommends putting tags at the end of the page to alleviate any negative impact on the page loading time. However, there are cases where you will consciously decide to put it at the top.

Session timeout
SiteCatalyst uses a 30 minute default timeout period before considering a visit complete.

In SiteCatalyst, upon request, this time period can be customized by report suite. However, both the WAA and the IAB recommends a 30 minutes session timeout.

Visit reporting time period
Suppose I’m in the GMT-5 timezone (Quebec city, Eastern Time), and your company is somewhere in GMT-6 (Central Time), and Omniture is at GMT-6 (Utah, Mountain Time)... If my visit starts at 11:55pm (local time) and go on for 10 minutes after midnight, in which day should my traffic be recorded?

The Omniture KB article  #7824 states for both solutions that "a visit is reported only during the time period in which the visit is initiated". Furthermore, article #633 indicates that unique visitors are based strictly on the time zone specified in your report suite. Thus, if the reporting time is Mountain Time, and my visit starts at 11:55pm (EST) and last until 12:05am the next day, it would show up as being 10:55 pm when viewed from a Central Time perspective. Information regarding HBX wasn't available. However, taken globally, this would certainly not explain a difference of over 12%.

Conclusion

In this case, we know the atomic metric (Page Views) is accurate. We now have a fair explanation for the reported number of Visits & Visitors: different calculation methods, 1st party vs. 3rd party and code location plays a important role in the accuracy of your metrics. Even though the variance in Visits is a bit high, they still move in concert with one another. As David Kirschner, Principal Best Practices Consultant at Omniture, rightly pointed out "the causes for these discrepancies may be difficult to explain, but they should not prevent you from taking action on the data".

As a web analyst, and in such situations, your role is to explain in non-technical terms why there are differences in metrics. Always stress the significance of trends and if you can, run the two solutions in parallel and show the difference in scales. Use the thermometer analogy, or this one: CAD vs USD: both reports monetary amounts, they both use the same metric terminology (Dollars), but their values are different. For most people, trying to understand the intricacies of such a difference is not necessary in order to make good business decisions.

Please contact me if you need professional auditing of your web analytics implementation.

Acknowledgments: I want to thanks Penny Tietjen from Quickbooks Support at Intuit, as well as David Kirschner, Paul Kraus, Martin Liljenback, Alex Hill and James Kemp from Omniture for their great support.