Friday, August 22, 2008

WASP: a good tool to audit site tags

There's an interesting thread on the Yahoo! Web Analytics forum "What is a good tool/methodology to audit a large site for tags?". As often, my answer was getting long! It started like this:
"I'm starting a new job with a big, complicated site, and want to get a handle on its data collection strengths and weaknesses.  Any suggestions?"

Potential solutions

Even without lots of details, and as others pointed out, we can identify a couple of potential solutions. Of course, WASP comes to mind first... but let's look at other alternatives:
  •  Maxamine's flagship product, now owned by Accenture, would come to mind. But the price tag is high and it seems it's not available as a standalone product anymore. So one would have to hire Accenture consulting, which is probably an issue for lots of smaller companies, web and analytics agencies, etc.
  • EpikOne's SiteScanGA offers a Premium subscription. But unless your site is tagged with Google Analytics, this is of little use... Furthermore, SiteScanGA doesn't actually look at your site; it looks at whatever Google Search have in their cache. So it won't scan any "no-index,no-follow" areas of your site (secured sections, transactions, etc.) and since Google might not have your latest page version, this further lower the value of SiteScanGA.
  • Debugging proxies and manual method: if you are a techy, Charles and Firebug are your friends. But who would want to walk through tens of thousands of pages manually?

Poor guy's solution

Someone suggested that "if one had access to the actual server, could you not literally download the entire site to a local machine, then do a grep for the relevant text?"

That would be a very poor guy's (and very bad) way of doing quality assurance! Here's an analogy: would you trust your site works correctly just because you can see the company logo on the page?

It would only tell you the source code is there, but wouldn't guarantee it works and is actually sending the right data. I think it's how EpikOne SiteScanGA and other basic tag checking tools are doing it: they don't run the code, they simply look for a string in the source file.

Here comes WASP!

As you can see in a case study I made on a site using SiteCatalyst, there are things you would simply not be able to find if you don't actually load the page and run the code. From the start, WASP's goal was to run "in situ" of a real user. This makes WASP the most unique and robust way of doing quality assurance.

Someone suggested that since most sites are based on templates (CMS, ecommerce catalogues, etc.), why not check just one page of each template?
  • "You don't know what you don't know": ask a site owner if all their pages are tagged, at best they will say "yes", usually they will say "I think so"... WASP often tells otherwise by crawling all pages and finding whole sections that were missed! Static pages, but more surprisingly, transactions are usually the area were tags are missing or bad.
  • "All pages using a template are alike": yes and no. Since tags are often populated automatically from the data filled in the template, there might be cases where unexpected values are set: special characters, blanks, missing values, wrong data type, invalid range of acceptable values, etc. Since the person who controls the template is not the same as the one who populates it with data, the likelihood of errors is significant.
Other than the status bar tag indicator and the sidebar values being sent, WASP includes a powerful site crawling feature that allows you to start from any site page and recursively crawl to the level and number of pages you want. Use include/exclude filters, follow or not robots exclusion specifications (robots.txt), bypass alerts/confirm/print messages that could hang the crawl, etc. You can also start from an existing sitemap.xml or a file containing a list of URLs. Future versions of WASP will include incremental crawl and other features that will make it easier to be alerted when something unexpected happen.

Of course, you don't want to inflate your site statistics with a visit that would span thousand of page views. WASP already offers different ways to filter out your data: modify the User Agent string or block your IP address. Future release will include a "stealth mode" that will actually block the call from sending the data.

Ok, I need WASP now!

Hold on to your hat for a few more weeks. I'm literally working overnight to deliver WASP v1.0. In the meantime, you can get started with the beta release and help identify bugs, issues and feature requests in the WASP support forum.

I'm already receiving lots of requests for the commercial version but I want to make sure it's stable and well tested before selling it. I can tell you I've run scans of 30,000 pages in a single pass without major issues.

Case studies wanted!

I'm also looking to write other case studies. Put WASP (and me) to the challenge! Send me a note with your web analytics implementation audit challenge and I will consider it for a future case study. For free! You just have to allow me to mention you/company/client in my case study. Good deal isn't it! :)

Note: I'm putting the final touch on a video that will demonstrate WASP crawl in action. Stay tuned!