Block Bad Bots - New Security Feature from KeyCDN

By Brian Jackson
Updated on April 25, 2016
Block Bad Bots - New Security Feature from KeyCDN

KeyCDN is always looking for ways to improve its service and so we are excited to announce a new security feature, and that is the ability to block bad bots. This allows customers to save on bandwidth costs and block bad bots, spiders, and scrapers from crawling your CDN assets. This feature is now available to all customers and can be enabled from the KeyCDN dashboard. No more bots draining your credits!

Bad bots

When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot. Googlebot is Google's web crawling bot which crawls people's new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your CDN bandwidth, take up server resources, and steal your content.

Typically you can block bad bots with your robots.txt file (which you can edit from the KeyCDN dashboard). However, not all robots honor this file, which means it must be done at the server level.

Block Bad Bots feature

KeyCDN uses a comprehensive list of known bad bots and blocks them based on their User-Agent string. This is something we have had implemented in our own environment for a while now and we wanted to open it up to all KeyCDN customers so that everyone could benefit from it.

451 HTTP error status code

When a bad bot, who is blocked, hits our edge servers a 451 HTTP error status code is returned.

The HTTP 451 error code was approved by the IESG on December 18, 2015 and is intended to be used when resource access is denied for legal reasons (e.g. censorship or government-mandated blocked access). We chose to use 451 as opposed to 403, 404, or 405 because those are generally used for troubleshooting and thought it best to keep them separated.

Blocking bad bots on your origin server

Using the new feature above will only block bad bots on your CDN assets. You can also block bad bots from accessing your origin server. If you wanted to block multiple User-Agent strings at once, you could add the following to your .htaccess file:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(agent1|Cheesebot|Catall Spider).*$ [NC]
RewriteRule .* - [F,L]

Or you can also use the BrowserMatchNoCase directive like this:

BrowserMatchNoCase "agent1" bots
BrowserMatchNoCase "Cheesebot" bots
BrowserMatchNoCase "Catall Spider" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

And here is an example on Nginx:

if ($http_user_agent ~ (agent1|Cheesebot|Catall Spider) ) {
    return 403;
}

If you are running a popular CMS there are also extensions and plugins available that can be used to block bots. See our security guides:

Summary

KeyCDN is committed to providing you further ways to decrease your bandwidth costs while providing additional security. We are excited that we could open up this new feature to the public. If you have any questions please feel free to comment below.

  • Share

Supercharge your content delivery 🚀

Try KeyCDN with a free 14 day trial, no credit card required.

Get started

Comments

Comment policy: Comments are welcomed and encouraged. However, all comments are manually moderated and those deemed to be spam or solely promotional in nature will be deleted.
  • **bold**
  • `code`
  • ```block```
KeyCDN uses cookies to make its website easier to use. Learn more