Block Bad Bots – New Security Feature From KeyCDN

block bad bots

KeyCDN is always looking for ways to improve its service and so we are excited to announce a new security feature, and that is the ability to block bad bots. This allows customers to save on bandwidth costs and block bad bots, spiders, and scrapers from crawling your CDN assets. This feature is now available to all customers and can be enabled from the KeyCDN dashboard. No more bots draining your credits!

Bad Bots

When it comes to the web, there are good bots and bad bots. An example of a good bot would be Googlebot. Googlebot is Google’s web crawling bot which crawls people’s new content and adds it to their search engine for indexing. An example of a bad bot would be Cheesebot. Bad bots can include spiders, crawlers, and scrapers. They are not always malicious, however most of the time it is also not necessary that they crawl your site. They consume your CDN bandwidth, take up server resources, and steal your content. You can see a more comprehensive list of bots at botreports.com.

Typically you can block bad bots with your robots.txt file (which you can edit from the KeyCDN dashboard). However, not all robots honor this file, which means it must be done at the server level. KeyCDN uses a comprehensive list of known bad bots and blocks them based on their User-Agent string. This is something we have had implemented in our own environment for a while now and we wanted to open it up to all KeyCDN customers so that everyone could benefit from it.

How to Enable the Block Bad Bots Feature

The block bad bots feature is enabled by default on new zones. You can enable it on your existing zones by following the steps below.

  1. Login to the KeyCDN dashboard and click into zones. 
    keycdn zones
  2. Click “Edit” on the zone you want to enable this new feature on. 
    keycdn edit zone
  3. Select “Show Advanced Features.”
    keycdn zone advanced features
  4. Scroll down to “Block Bad Bots” and select “enabled.” Then make sure to save your changes.
    block bad bots

451 HTTP Error Status Code

When a bad bot, who is blocked, hits our edge servers a 451 HTTP error status code is returned. Don’t forget you can always run a live tail on your zone or whole account using our real-time logs. {"zone":"yourzonename","status":"451"}. The HTTP 451 error code was approved by the IESG on December 18, 2015 and is intended to be used when resource access is denied for legal reasons, e.g. censorship or government-mandated blocked access. We chose to use 451 as opposed to 403, 404, or 405 because those are generally used for troubleshooting and thought it best to keep them separated. Read our more in-depth post on analyzing CDN traffic to your website.

Blocking Bad Bots on Your Origin Server

Using the new feature above will only block bad bots on your CDN assets. You can also block bad bots from accessing your origin server. If you wanted to block multiple User-Agent strings at once, you could add the following to your .htaccess file.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(agent1|Cheesebot|Catall Spider).*$ [NC]
RewriteRule .* - [F,L]

Or you can also use the BrowserMatchNoCase directive like this:

BrowserMatchNoCase "agent1" bots
BrowserMatchNoCase "Cheesebot" bots
BrowserMatchNoCase "Catall Spider" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

And here is an example on Nginx.

if ($http_user_agent ~ (agent1|Cheesebot|Catall Spider) ) {
    return 403;
}

If you are running a popular CMS there are also extensions and plugins available that can be used to block bots. See our security guides:

Summary

KeyCDN is committed to providing you further ways to decrease your bandwidth costs while providing additional security. We are excited that we could open up this new feature to the public. If you have any questions please feel free to comment below or join us in community for a longer discussion on blocking bad bots.

Related Articles

Block Bad Bots – New Security Feature From KeyCDN was last modified: April 25th, 2016 by Brian Jackson
  • takepara

    Awsome!
    Great feature.

    I want custom robots.txt at zone alias, like.

  • Love it!

    One question… are you guys using the bad bot list from botreports.com? Or is it your own, custom list?

    • Hi Dave, we are using our own custom list that has been built up over time from analyzing CDN traffic.

  • Is “DDoS” equivalent “Bad Bots”?

    • Hi Maxim, bad bots can be used to perform a variety of malicious activities. Indeed, one of which can be to launch a DDoS attack.

  • Akihiro HARAI

    This is an awsome feature. However, This feature also blocks Outbrain, which is essential to our service. Is there any way to unblock it while enabling this feature?

  • Agets

    Block bad boys are suited for the security system. This eared in style for the real benefits. People love to use this as it’s important for the security action. This leads for the security in style for the people.

  • Madentopere1930

    Block bad boys are suited for the security system. This eared in style for the real benefits. People love to use this as it’s important for the security action. This leads for the security in style for the people.

  • Silvia

    My automated test to request status code on loaded images ends up with 451 because my requests through selenium are seen as bad bots.
    How can I overcome this?

    Thx.

    • Sebastian

      Hi Silvia,
      you can disable the bot detection in the zone settings.

Share This