Webstuff.Inblighty.Com

Website DIY - tricks and solutions

Why your Blog is found by spam bots, and what you can do.

There are plenty of anti-spam plugins that bloggers can use to try and prevent the posting of comment spam. These vary in effectiveness, and amount of administration involved in ensuring that genuine posts are not categorised as spam or vice versa.

If you can prevent the majority of spammers from targeting your site in the first place, then you will reduce time spent on moderation and the chances of letting spam through.

Most comment spam is generated by automated software with features like these:

  • Google search and link harvesting;
  • options to hide IP address; and
  • auto-filling of comment forms.

Requiring posters to register, and enter a Captcha before posting won’t protect you from the “better spambots”; these include pre-registration software and Captcha Cracking plugins.

The sites marketing these “tools” often deny these are comment spambots; but “if it barks like a dog and bites like a dog, it probably has fleas like a dog”.

I looked at information on 5 of these spambot tools and found they all use the same “text Footprint” method to locate blogs. One of the most popular tools claims to search for, and auto-fill comments on, WordPress, Blogengine, Movable Type, B2Evolution and NucleusCMS blogs.

These “blog commenting tools” also allow users to add custom footprints to find other popular blog platforms; and lists of identifying footprints are easy to find.

Removing these footprints will hide your site from most spambot users, without affecting how genuine visitors find your website.

Don’t feel left out if you run a BBS, there are bots looking for your site. However; they often use different footprinting methods (see last section).



What is a Footprint?


In this context a footprint is some text that identifies the type of website e.g. a WordPress blog.

Most popular blog and BBS platforms will insert some form of tagline on your pages; those already mentioned all insert “powered by ” followed by the package name.

Default text used in comment forms is another identifying footprint.

Google indexes this identifying text along with the rest of your pages content.


Footprints continued (how spambots work)


To find targets a spammer simply includes the relevant footprint(s) and a keyword in a search e.g. ‘ “powered by wordpress” “leave a reply” automobiles ‘. In this case most results will be for pages with open comment forms on WordPress sites about cars.

Spam Bots automate such searches gathering addresses for thousands of pages.

The most sophisticated bots will also autopost the spammers comment, and may circumvent registration and anti-spam measures (video demo).


Are there footprints for the package I use?


Try a search that includes the name of the software you use e.g. ‘comments footprints drupal‘. It is also possible to find  multi platform lists, try searching for ‘Scrapebox footprints‘ (Scrapebox is a “search and commenting tool”).

The entries in these lists may contain both footprints and Google limiting clauses e.g. ‘ site:.edu “what is the” “word in the phrase” drupal ‘  which will restrict Google’s search for Drupal pages with comment forms to sites on .edu domains only. You can ignore the limiting clause part of these entries.



How do I make my site invisible to spambots?


The good news: all the blog commenting software I checked used the text footprint method; and this seems to be the technique used by most all the current  generation of blog spambots. So the solution is to remove identifiable footprints from your pages e.g. you could change “Powered by …” to “Running …“; and “Leave a  Comment” to “Tell us what you think“.

As someone with a little knowledge, it took me about 10 minutes to modify two (different themed) WordPress blogs. See:  How to remove Footprints from WordPress blogs.

The bad news: the WordPress solution won’t work for other types of CMS/blog, and some will be easier to modify than others. Popular (and  therefore targeted) platforms usually have active support forums, where you can post questions on how to achieve this.

Always back-up everything before attempting any changes;  and don’t get rid of your anti-spam plugin.


Is it worth modifying my blog?


If your blog is over a year old and still not plagued by spamming attempts, then probably not.

If your blog suffers from spam, or is new, then it may be worthwhile. In many cases changing its footprint identity could hide it from most users of current generation  spam bots; and reduce automated spamming attempts to near zero.

Some blogs are more at risk of spam than others:

  • WordPress, MovableType, Blogengine, B2Evolution, and NucleusCMS: Most mass spammers don’t have time to comment on every site, so they will want to automate  as much of the process as possible. The tools I checked claimed to be able to find and auto fill comment forms on one or more of these platforms.
  • Blogs allowing “do-follow” links: Automated tools may still be used to find such blogs even if it is not possible to autopost comments on them.
    Do-follow links are much more highly valued by Google, and spammers may consider the time to manually paste in comments worthwhile. Links on Drupal sites are “do-follow” by default; and when researching this article I found posts highlighting its “do-followness” and discussing its footprints.
  • High page rank blogs: are targets for the same reason as “dofollow”. Unfortunately, some tools come with pre-compiled lists of very high ranking sites, so footprint removal will have less of an impact for these sites.

What changing your text footprints won’t do:


It won’t result in an immediate reduction in spam. A page will continue to appear in Footprint search results until it has been re-crawled and re-indexed by Google etc.

It won’t hide your site from hackers, or from every spammer:

  • There are other footprinting techniques e.g. Googling parts of a web address. This method is not as popular for comment spamming blogs because text footprints are more effective; but url footprints are often used to find BBS sites e.g. “phpbb/profile.php”. Removing these types of footprints from your site requires technical knowledge and may be impracticable.
  • Some spammers monitor “news feed” syndications for pages with fresh posts (likely to have open comment forms). However, I do not advocate you switch off your Remote Publishing (RSS) feeds. Syndication improves the visibility of your site to genuine visitors; and in my own personal experience, resulting spam is miniscule in comparison to other sources.

Andy Wrigley+ has worked in IT and Computer Audit for 30 years, and loves independent travel.

Enhanced by Zemanta

11 Comments

  1. kevinkellyx

    Quite good information and helpful for me to improve my skills in SEO.Moreover it would help me save my blog from spammers.
    Edit by amin: website http://www.boatgone.com/

  2. rob harris

    Came across your web site while doing research for my blog. Spam and its control are important to loy of folks. Basically helpful tips and insightful tips.

  3. Chris David

    Thanks Andy for the informative article. It sure would be nice to have a plugin that could “shield” your site from the view of spammers. It is a bit time consuming to manually remove footprints, especially when you have a lot of sites.

    Edit by Andy: Yep a plugin would be great but I’m not sure how feasible as display of some of the footprints is theme controlled.
    “especially when you have a lot of sites” -are you by any chance also responsible for the previous comment with author link to a site with remarkably similar content? 🙂

  4. botnet protection

    Very nice information to protect my blogs getting spmamy comments.
    i was wondering how the spammer found our site easily and do lot of comments to get link juice form my high PR sites.

    I always use nofollow tags for commenting but its showed me some pretty cool stuff.Thanks for making a blog like this

  5. Sohail

    I am for it and that is to remove such footprints. My site already suffered such attack and we needed to remove all of our forum due to spamming. The results is that my site is now no where to be seen in google.

  6. Troy

    Interesting article. I was so excited about the first handful of comments on my first blog post, and was so dissappointed when they turned out to be spam-bots. Curse you! Once the let down factor passed, it was a little funny to read through the ridiculous proceedural text they use. Oy vey, what a mess haha.

  7. Bijuterias (link removed)

    Hi AW, just been looking at this very scenario, and found your article a useful support guide. Thanks !

    • AW

      Hi Bijuterias, I’m impressed with the non spamlike appearance of your flattering comment (love the personal touch of including my “author name”) – so I’ve approved it (minus your link).

      But as your comment on another blog shows “Hi (insert author name here), just been looking at this very scenario, and found your article a useful support guide.” it is just another generic sentence designed to be posted as a comment on any article for link building purposes.

    • Bijuterias no atacado

      Ha Ha, thanks AW !

    • AW

      Genuine response + sense of humour = link included. 🙂

  8. James Thornley

    I created my first wordpress site over the weekend (I’m a programmer who normally writes websites by hand) and was astounded to start getting spam comments within hours of creating it.

    The thing is that there were no links to my actual wordpress site – just a holding page as the index. This means that the spammers must be looking for the wordpress folder by default (makes sense really), so my tip is to rename the base folder to something other than wordpress – follow the instructions here http://codex.wordpress.org/Giving_WordPress_Its_Own_Directory but they are naming it to wordpress whereas you want to name it to something else.

Comments are closed.

Copyright © 2012-2024 Webstuff.Inblighty.Com
This site recommends and is hosted by: Kualo Web Hosting.    
Theme: hemingway
 

Blog home  |  ↑ Top of Page ↑