Unwanted visitors


We have a little spam problem on the blog. Not the kind that you can filter out, however. (We had that too, but we filtered it.) Over the last couple of weeks, we’ve gotten a lot of traffic pointing to a spam comment (which we had removed) on a post from last year. Whereas the post received¬† fewer than 30 views in the previous year, it was now getting several hundred hits a day. What happened?

Well, it turned out that some enterprising spambot deposited a comment on the post, and then created links elsewhere on the web to point to that comment. Since the comment doesn’t exist, the browser helpfully displays the page itself, and our WordPress counters record another visit.

I don’t quite understand the purpose of this scheme, but apparently the links to that comment are decorated with all sorts of desirable terms that cause people who are in search of stuff other than discussions of search like to click on. I imagine they are quite disappointed with what they find.

While these ephemeral visitors are not a problem per se, they do distort our visit statistics and make it hard to see what’s going on on the site with respect to the visitors we do care about. So what can we do?

We cannot prevent people from creating bogus links, and we cannot prevent people from clicking on them, but we might be able to prevent WordPress from counting those visits. There are at least two places where counting occurs: at WordPress proper, and in Top 10, a plug-in on our site. We have complete control of our plugin, but not of WordPress.

One possible way to prevent these bogus visits from polluting our stats is to introduce a plugin further upstream that filters requests just as they arrive on the server. What I am imagining is a black-list approach that triggers a specific response policy. What we black-list are the referer URLs that a user’s browser adds to the HTTP request. Since it’s easy to discover where these people are coming from, it should be easy to check for those URLs.

Several policies come to mind:

  1. Return an HTTP 403 or 404 code: brutal but effective at reducing collateral clicks once the searcher arrives on our site.
  2. Redirect back to the referer: subtle but amusing.
  3. Redirect to some random site more along the lines of what these people are actually looking for: this may be a bit too user-centered.

I am sure that other possibilities will be suggested by the creative reader; the point of all of these policies, though, is to prevent the rest of the WordPress machinery from counting the bogus visits.

We weren’t able to find a plugin that does exactly this, I don’t know enough about the WordPress pipeline to know whether it is easy to do with a plugin. On the other hand, this seems like a useful feature that is not at all difficult to implement. Maybe the good folks at WordPress will include something like this in their next release.

Share on: