Google Analytics: Clean Up Your Stats by Grouping Hits from Other Domains

by Chadwick Wood
August 29th, 2006

If you use Google Analytics to track traffic on your websites, then you've probably run across the annoyance of seeing mysterious hits to pages that don't exist on your website. Common examples I've seen are like:

/cache.aspx?q=09823...
/search?q=cache:F093...
/translate_c?hl=de&sl=en...

These hits are the result of other websites that pull in the content (and tracking code) of your site, such as translation services and Google's caching system. The hits can be useful data that you don't want to throw away, but perhaps you don't like the way they clutter your All Navigation and Content Drilldown views.

My solution to dealing with this clutter is modify all recorded hits from domains other than my own and group them into an imaginary folder (I call mine "otherdomains") in my stats. Essentially, every hit from another domain that shows up in the stats will be recorded in the form: /otherdomains/requesturl, where requesturl is the actual URL that caused the hit (e.g. www.google.com/search?q=...). This method also has the added bonus that the hits will be grouped by domain within the "otherdomains" folder.

You will need to create two Custom, Advanced filters to use this method. Also, the order of these filters is important!

The first filter:

**Name:** Organize traffic by domain
**Field A -> Extract A:** Hostname | ^(.\*)$
**Field B -> Extract B:** Request URI | ^(.\*)$
**Output To -> Constructor:** Request URI | /otherdomains/$A1$B1
**Field A Required:** Yes
**Field B Required:** Yes
**Override Output Field:** Yes
**Case Sensitive:** No

The second filter:

**Name:** Don't organize my traffic by domain
**Field A -> Extract A:** Request URI | ^/otherdomains/www\\.mydomain\\.com(.\*)$
**Field B -> Extract B:**  -
**Output To -> Constructor:** Request URI | $A1
**Field A Required:** Yes
**Field B Required:** No
**Override Output Field:** Yes
**Case Sensitive:** No

In the second filter, replace "www\.mydomain\.com" with your site's URL, and keep the backslashes in front of any periods in the URL.

How it works:

The first filter will take every hit that Analytics records and put a "/" and the Hostname at the front of the Request URI, which is the page that generated the hit. The second filter looks at the modified Request URI produced by the first filter and removes the "/" and Hostname at the beginning if that Hostname is your site's Hostname. The result is that only those hits generated from other sites have the Hostname prepended when the hit is recorded in Google Analytics.