Block Bad Bots - New Security Feature From KeyCDN

By Brian Jackson
Updated on April 25, 2016
Block Bad Bots - New Security Feature From KeyCDN

KeyCDN is always looking for ways to improve its service and so we are excited to announce a new security feature, and that is the ability to block bad bots. This allows customers to save on bandwidth costs and block bad bots, spiders, and scrapers from crawling your CDN assets. This feature is now available to all customers and can be enabled from the KeyCDN dashboard. No more bots draining your credits!

Bad Bots

When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot. Googlebot is Google’s web crawling bot which crawls people’s new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your CDN bandwidth, take up server resources, and steal your content. You can see a more comprehensive list of bots at botreports.com.

block bots

Typically you can block bad bots with your robots.txt file (which you can edit from the KeyCDN dashboard). However, not all robots honor this file, which means it must be done at the server level. KeyCDN uses a comprehensive list of known bad bots and blocks them based on their User-Agent string. This is something we have had implemented in our own environment for a while now and we wanted to open it up to all KeyCDN customers so that everyone could benefit from it.

How to Enable the Block Bad Bots Feature

The block bad bots feature is enabled by default on new zones. You can enable it on your existing zones by following the steps below.

  1. Login to the KeyCDN dashboard and click into zones. 

    keycdn zones

  2. Click “Edit” on the zone you want to enable this new feature on. 

    keycdn edit zone

  3. Select “Show Advanced Features.”

    keycdn zone advanced features

  4. Scroll down to “Block Bad Bots” and select “enabled.” Then make sure to save your changes.

    block bad bots

451 HTTP Error Status Code

When a bad bot, who is blocked, hits our edge servers a 451 HTTP error status code is returned. Don’t forget you can always run a live tail on your zone or whole account using our real-time logs. {"zone":"yourzonename","status":"451"}. The HTTP 451 error code was approved by the IESG on December 18, 2015 and is intended to be used when resource access is denied for legal reasons, e.g. censorship or government-mandated blocked access. We chose to use 451 as opposed to 403, 404, or 405 because those are generally used for troubleshooting and thought it best to keep them separated. Read our more in-depth post on analyzing CDN traffic to your website.

Blocking Bad Bots on Your Origin Server

block robots

Using the new feature above will only block bad bots on your CDN assets. You can also block bad bots from accessing your origin server. If you wanted to block multiple User-Agent strings at once, you could add the following to your .htaccess file.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(agent1|Cheesebot|Catall Spider).*$ [NC]
RewriteRule .* - [F,L]

Or you can also use the BrowserMatchNoCase directive like this:

BrowserMatchNoCase "agent1" bots
BrowserMatchNoCase "Cheesebot" bots
BrowserMatchNoCase "Catall Spider" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

And here is an example on Nginx:

if ($http_user_agent ~ (agent1|Cheesebot|Catall Spider) ) {
    return 403;
}

If you are running a popular CMS there are also extensions and plugins available that can be used to block bots. See our security guides:

Summary

KeyCDN is committed to providing you further ways to decrease your bandwidth costs while providing additional security. We are excited that we could open up this new feature to the public. If you have any questions please feel free to comment below.

Leave a comment