Google Analytics: Filtering out Robots and Ghost Referrer Spam

If you run a blog yourself you'll know just how much of a challenge it is to understand who's actually visiting your site. Amongst the bots and spam it can be a tall order to keep on top of your site metrics. Fake or incorrect stats are highly damaging when you're attempting to grow your site.

The Russian Connection

Over the past few months I've been trying to find out who, if anyone, is looking at my site, for how long and where they're coming from. I noticed just under 50% are pretty much bogus and quite a few are coming with 100% bounce rates which means measuring user engagement is next to useless. Interestingly quite a few of them eminate from Russia which led to a couple of frustrated tweets:

Rather than let the frustration get the better of me, I decided to hunt around and figure out how to understand who these real users are. First though, before we can do that we have to understand where users who are messing up our stats are coming from. From what I've seen they come from a couple of sources:

  1. Legitimate bots / crawlers (e.g. Google crawlers or other search sites)
  2. Fake referrers and crawlers that are not legitimate (this includes screen scrapers etc.)
  3. Ghost referrers (e.g. s.click.aliexpress.com, theguardlan.com or ilovevitaly)

There's a really great article on Analytics Edge that discusses these three sources I suggest you go and read it when you get a chance.

How not to deal with spammers

I've scoured the internet to find decent ways of ignoring such traffic from Analytics reports but some of them seem massively draconian:

  1. Block all IP ranges (e.g. all Russian traffic) - what about the legitimate users though???
  2. Manually deal with spam referrers by adding each one using Google Analytics filters for each referral site - this isn't recommended. For starters it's a huge resource drain and won't scale.

I was pretty unhappy with all these solutions - there absolutely must be a better way. It appears there is...

  1. Enable Google Analytics feature to exclude all hits from known bots and spiders

This is done by opening up your Google Analytics Admin panel and clicking View Settings. Firstly, you'll want to copy your default view by clicking Copy View so that you can create your new view and compare the results of filtering out all the bad traffic. Once you've copied the view click the exclude all hits from known bots and spiders bot filtering option - Google Analytics will then automatically omit most bots it knows about.

Ghost Referral Pain

Unfortunately, this is only half the story though. You still have Ghost Referral spam to deal with. Ghost referrers if you're not familiar work by cleverly spoofing your hostname and fooling your site into thinking they've visited your site when in fact they haven't. The intention is to lure you back to the referrer so they can either:

  • Take advantage of your computer
  • Boost their own SEO hits by having more people hit their site - or artificially boosting their search rankings.

They know we look at our Google Analytics metrics so they use this fact to their advantage.

Weapon of Choice

Thankfully we have another weapon at our disposal which can help eliminate this plague for good. It will also enable us to retain our decent 100% metrics that we do actually want to know in case people are bouncing immediately.

What we need to do is basically set up our site so that we set a cookie when people visit it and then get Google Analytics to only look at traffic that has that cookie set. This will mean that all that scammy Ghost Referral traffic will permanently and forever be ignored from our Analytics reports - lovely jubbly.

To do this you need to do a number of things:

  1. Set up an event to fire on your site that will set a cookie. You can do this using the Google Tag Manager. This is accomplished by...

    • Signing into Google Tag Manager
    • Add a new account
    • Set up a container (call it whatever you want - I called mine “Developer Angst Tags” for now)
    • Add a new Tag of the Universal Analyticstype
    • Add your Google Analytics Tracking ID (this looks like so UA-*-)
    • Add a Fields to Set using Field Name not-a-bot (just has to be something unique) and Value april2015 (again has to be something unique)
    • Added a firing rule for the tag where it matches only where {{dev-status}} contains april2015
  2. Add a custom dimension in Google Analytics

    • Select Custom Dimensions under PROPERTY menu options for your view
    • Create a new Custom Dimension with the User Scope - once you setup this Dimension it will be given a unique index id.
    • Add custom script to each page of your site to set this dimension every time a page is visited:
<script>  
var dimensionValue = 'april2015';  
ga('set', 'dimension1', dimensionValue);  
</script>  
  1. Set Google Analytics to Filter by including only the traffic that has those values set in the cookie
    • Select the filters option under the new view you created earlier
    • Add a new filter with the name Include only non Robot / Spam Traffic
    • Set it as an Include filter type
    • Select the Filter Field dropdown to be the Dimension you added in the earlier step.
    • Enter the filter pattern as the String april2015 - this is because we'll be matching on the cookie that the tag manager is ensuring is set on each page visited on the site.

You can test this all by running the Google Tag Manager settings in Preview Debug Mode. Doing this will mean when you visit any page on the site it will show you a dialog that should tell you your Universal Analytics event should have been fired to set the correct cookie values (e.g. “dev-status”: “april2015”)

So long Robot / Ghost Spam traffic. If you've followed all of these steps you should be spam free from Google Analytics.

I hope you found this tutorial useful. Please feel free to comment below or subscribe to my RSS feed if you want to receive updates regularly. Of course feel free to share on your social networks so others may benefit.

Have a great “spam free” year!

James Murphy

Java dev by day, entrepreneur by night. James has 10+ years experience working with some of the largest UK businesses ranging from the BBC, to The Hut Group finally finding a home at Rentalcars.

Manchester, UK
comments powered by Disqus