Google Analytics for Webcomics: Part 2 – Filtering “Ghost” Spam

Alright, as a follow up to my previous tutorials, I thought it’s high time to share a couple tricks to deal with one of the banes of webcomic creators: spam.

I think we’ve all had to deal with the pain of looking at our numbers, trying to see if people are reading our comic, and instead seeing our numbers dominated by junk like this:

SpamExample1

Some random, obviously spam website polluting your numbers with tons of visits, cranking your bounce rate up and your pages/visit down.  It’s annoying, and can really make tracking your analytics and promotional efforts difficult.  Luckily, there are rather simple tools to weed it out, right here in Google Analytics, no coding or htaccess tricks required!  If you’d rather not read my simple thing, you can read the blog where I gathered some of these tips.

First, go to your analytics page, and select the Admin tab up at the top.  There’s tons of settings and useful things here, but we’re just going to go over Filters.  Select “All Filters” from the left column.

AdminScreen

To add a new filter, just push the “+Add Filter” button.  I have a couple of filters already created to show you a couple of the simple, useful, options.

FiltersScreen

First, let’s focus on blocking “ghost spam”: a spam bot that leaves fake data in your Google Analytics reports, without actually even visiting your site!  To do this, we’ll create a filter based on your website’s hostname, to only include visits actually to your website in your analytics.  But first, let’s look at some examples of this ghost spam.

Go to your Reports tab in your Google Anayltics, and the “Aquisition -> All Traffic -> Source/Medium” subsection.  Once there, switch the “Primary Dimension” to “Hostname”.

SwitchToHostname

This will now list all your visits by the hostname.  Hopefully, the largest segment uses your actual host.  You may have a surprisingly large segment entitled “(not set)” as well as some other hostnames that seem odd to you.  The majority of this, my friends, is ghost spam.  Adding a secondary dimension of “Source/Medium” and sorting by Bounce Rate my give you a sad picture like this:

HostnameSpam1

Gross.  That’s all junk from spam sites trying to get you to visit their URLs.  Let’s get rid of it.

Hopping back to the Admin->All Filters tab, let’s take a look at the filter I’ve named “My Hosts”, where we’re going to make it so we ONLY see data from our own, valid hosts.

Hosts1

Name it as you wish, change the “Filter Type” to Custom and select “Include”.  Then, in the filter pattern, type in the name of your host.  For me, that’s demonarchives.com.  The format to enter these requires that you put a “\” behind each “.”, and a “|” (the shift character on the “\” key) between entries.  So I’d write: “demonarchives\.com|new\.demonarchives\.com|translate\.googleusercontent\.com”, etc.  That last one is the hostname that appears if people use Google Translate to view your site.  If you aren’t sure what hostnames to include, just skip back up a couple steps to the Reporting -> Acquisition step where we looked at all the hostnames you’ve had in use.

Once you add all your hostnames to the include box, scroll down a bit, select and add your site view, and press save.

Hosts2

You can create a different view if you are worried about messing something up (Admin tab, over on the right hand column, click the drop down menu under “View” (the top of the column) and select “Create New View”), but I just validate my changes to the existing view beforehand.  I do this by opening up the “Real Time Overview” reporting tab and going to visit my site to make sure I still show up.  This is important, because if you do something wrong setting this up, somehow all pageviews will stop showing up in the Analytics reports.  So make sure you double check that things are still working normally before you’re done.

This simple filter will prevent all that ghost spam, which ends up being the majority of the junk showing up in your analytics.  It doesn’t block everything, but it does get most of it.

I tried following directions I found online to block the remaining specific spam sources that I found, using my other Filter, aptly named “Block Spam Bots.”

ExcludeSpamFilter

This time it was an exclusionary filter, blocking all of these specific spammers.  Unfortunately, for some reason this ended up blocking ALL traffic and breaking it (which I discovered when I tested it by looking at my real-time segment, and seeing everything dissappear), so I disabled it.  Luckily, blocking ghost spam (and having CloudFlare active and blocking known spammer IPs) has cleared out 99% of the spam I was getting.  Hopefully it works for you too!  If not, let me know what else you’ve tried that works!

EDIT: Made a follow up tutorial with success at blocking specific spammers! The basic tip, instead of excluding based on “Filter Field: Referral”, pick “Campaign Source.” Works much better!