Website DIY - tricks and solutions

Web-site Security: blocking user agents, requests & query strings

This screenshot of an attempted hack shows three attributes of a request that you can check and block.
log report

  • the IP address indicates the “visitor” is from China (for which the site has no relevance)
  • it is looking “phpmyadmin” something a legitimate visitors would not request
  • it has a known bad user agent (ZmEu).

The 404 result shows that these requests were allowed by the site; but in this case the requested files were not found. We may not be so lucky next time, and need to prevent evil requests even reaching this point.

I have already discussed blocking visitors from specific countries. Blocking bad requests, user agents (bots/crawlers/scripts), and query strings can also form part of your “first line” of defence.

The usual caveat and a warning:
You use the advice and examples below at your own risk. I neither guarantee that the examples shown below will work as expected on your site, nor that they will not harm your site.

Do not neglect other security measures – always assume your access rules have failed.

Access Rules

If you are new to acces rules, I suggest you use Jeff Starr’s 5G Blacklist (below) as your starting point and then add additional rules to increase protection against directory traversal and User Agent code injection (see relevant sections below).

Linux Apache sites: add BOTH Jeff’s 5G Blacklist, AND, his additional rules for WordPress to your .htaccess . Although the additional rules were designed for WordPress sites, they also improve security on other sites.

Microsoft Servers:Scott Stawarz has produced a clone of the 5G blacklist for IIS with simple installation instructions.

The Blacklist is compact, does not over tax your server; and has been refined (5G = fifth generation) and user tested. It has been written to work satisfactorily without modification, for the majority of web-sites. This involves compromises; and it does not claim block all evil requests; however it is easy to add your own additional rules.

This video explaining directory traversal attacks (be very afraid!) includes examples that would not be stopped by 5G rules (see below for additional checks you can apply).
Even if you don’t use it, the 5G Blacklist is simple to understand and great to use as the basis of your own solution.

About User Agents:

The User Agent string identifies the software requesting a page/response from your site. For example Internet Explorer (and many other browsers) requests include a User Agent string that starts with “Mozilla”; and Google’s crawler UA string includes the word “Googlebot”.

Some hacking attempts are easy to spot because the User Agent is empty (but see below); or known to be evil e.g. “ZmEu” and “Morfeus Fucking Scanner” (excuse the expletive).

The User Agent string can also be used for code injection exploits; see botsvsbrowsers.com for many examples. Even if you properly sanitise any header information used in your own code; security advisories show that “system” or third party software may not. So unless you block before it reaches this code your site may still be open to exploits.

Blocking by User Agent (UA)

Identifying which User Agents to block is an ongoing task; new ones appear, and old ones may cease to be used. The 5G Blacklist is fairly new, so its rules for User Agents make a useful starting point and example for adding your own rules. Check your logs for evil User Agents to block; you can also use Google, but limit your search to the last year to avoid lists of old obsolete bots.

Many of the code injection attempts identified by Botsvsbrowsers (see above) include one or both of these chunks of text <script> | a href= in the User Agent String.

To identify these threats (and a few variations) I’ve added the following lines to my htaccess:
SetEnvIfNoCase User-Agent (&lt;|<|%3C)(%20|s)*script(s|%20|>|&gt;|%3E) keep_out
SetEnvIfNoCase User-Agent a(s|%20|+)+href(s|%20)*(=|%3D) keep_out

These expressions will set a flag we’ve named “keep_out” to true, if part of the User Agent matches our unwanted text strings. If you intend to use the 5G Blacklist you can add the 2 lines immediately above the existing “SetEnvIfNoCase” lines in the “5G:[USER AGENTS]” section. If you aren’t using the Blacklist you will have to tell htaccess that you want to block the flagged items etc i.e. Deny from env=keep_outetc .

Empty User Agents: the majority of sites can assume that these are evil and block. However some sites interface with legitimate software that does not provide a User Agent (I have a vague recollection that payment services like Paypall fall in this category).
The 5G Blacklist blocks empty user agents. If you need to allow access, remove the following line:

SetEnvIfNoCase User-Agent ^$ keep_out (htaccess version)
<add input="{HTTP_USER_AGENT}" pattern=" ^$"/> (IIS web.config version)

In many attacks the User Agent string used is the same as that used by some legitimate software and your user agent rules won’t identify and block them.

Testing your own User Agent rules

You can use a package like Burp Suite (free version available) to spoof and test your site’s response to good and bad user agent strings.

URL, Rrequests and Query Strings

For the purposes of this post, a URL is the address of a “page” e.g. “http://google.com” or http://www.bing.com/search?q=what+i+searching+for&go=
and a Request String is the part of the URL preceding the question mark (yellow highlight).

The Query String is the part of the URL that follows the question mark (see green highlight). It is used to pass parameters or form data to your server’s dynamic pages and scripts. Note: forms that use the Post method do not generate query strings.

Blocking Request and Query Strings

Always assume that your “firewall” has failed; and check and sanitise any data used within your own code, including that provided in HTTP headers.

However, most attacks and probes are not looking for weaknesses in your personal code, but are targeting weaknesses in system software or popular applications such as Shopping Cart, CMS, Blog and Bulletin Board Systems. Ensure you use the latest stable release of these applications, and block obvious exploit attempts from reaching them.

The 5G Blacklist sections on “Request Strings and “Query Strings” provide an excellent starting point in shielding against common attacks.
If you do use the blacklist, I recommend you add additional rules to extend protection against directory traversal attacks demonstrated in this video.

replace this line:
RewriteCond %{QUERY_STRING} ../ [NC,OR]
by these two lines:
RewriteCond %{QUERY_STRING} (//|%2F%2F|/%2F|%2F/|<|&lt;|%3C|>|&gt;|%3E|%00) [NC,OR]
RewriteCond %{QUERY_STRING} (..|%2e%2e|%2e.|.%2e)(/|%2f) [NC,OR]

These lines check for character combinations that are interpreted by servers as “../”, “//”,”<”, “>” or null. I don’t claim to be an expert in htacces or pattern matching, feel free to comment if you have a better solution. These checks may also be too strict for some sites e.g. the first line will block requests like “http://yoursite.com/logout?redir=http:
yoursite.com/login.php” the solution: remove “//” or “|%2f%2f” from the “rule”. You will also need to remove the existing 5G query string check that tests for a “:”.

Blocking by IP Address

I consider blocking individualIP addresses a waste of time.

However, some countries are the source of more attacks and spam than others, so I do block whole address ranges for 4 or 5 countries where my content should be of no interest. There are sites which will generate self contained rules for you, to block any selected countries. The 5G blacklist also provides a skeleton section where you can add IP addresses/ranges to block.

With all the checks provided by 5G I wondered whether denying access from specific Countries was still worthwhile. So, when I recently tried the 5G rules, I removed all IP checking from my site. I found that a small but noticeable number of hacking requests were let through that would have been prevented by my old country blacklisting.

So in my view, it is still worth blocking by country. This page on blocking by IP explains which countries to select, how to find address ranges, performance issues etc.

Comments Welcome

I don’t claim to be an expert, so feel free to leave you corrections, criticisms and suggestions.

Author Andy W+
Enhanced by Zemanta


  1. Jeff Starr

    Great write-up, Andy. Many good points and some smart strategies that I’ve taken the liberty of integrating into the next generation of the g-series blacklist (6G). Totally agree that blocking by IP is a big waste of time, and would add that blocking by user-agent and referrer is becoming equally futile. There are some well-known strings to look for, but it’s trivial these days to spoof just about everything except for the actual request string, which as you’ve explained includes numerous variables such as query string, request method, and so on. Will be mentioning this article in the upcoming “6G-beta” post. Cheers!

  2. AW

    Thanks for the positive feedback Jeff, I’m looking forward to your 6G (and other Perishablepress articles).

    LOL: to quote from my own article “rules may be too strict for some sites e.g. blocking requests like “http://yoursite.com/logout?redir=http: //yoursite.com/login.php”.

    I have found WordPress uses a very similar “GET” request if you are not signed on. So when I tried to approve your comment from the link in WordPress’s “Please Moderate” email the request was 403’d.

    Everything works fine if you sign on in the usual manner and navigate to comments via the dashboard – so I WON’T be removing the block of “//”.

  3. AW

    A beta version of the 6G firewall has now been published at http://perishablepress.com/6g-beta/

  4. Tolga

    A beta version of the 7G firewall has now been published at https://perishablepress.com/7g-firewall/

Leave a Reply

Your comment will appear after its approved; usually within 12 hours but can be up to a week.
Email is optional and never published. It will only be used to contact you if clarification of your comment is needed.

Copyright © 2012-2024 Webstuff.Inblighty.Com
This site recommends and is hosted by: Kualo Web Hosting.    
Theme: hemingway

Blog home  |  ↑ Top of Page ↑