What Is Rate Limiting?

Updated on April 28, 2023

Have you ever experienced a slow-loading website or an error message when trying to access a particular service online? You might have encountered rate limiting, a common technique used by online services to manage traffic and prevent overloading of their systems. In this post, we'll be diving deeper into various types of rate limiting methods, implementation examples, and how rate limiting works in conjunction with KeyCDN.

What is rate limiting?

Rate limiting is used to control the amount of incoming and outgoing traffic to or from a network. For example, let's say you are using a particular service's API that is configured to allow 100 requests/minute. If the number of requests you make exceeds that limit, then an error will be triggered. The reasoning behind implementing rate limits is to allow for a better flow of data and to increase security by mitigating attacks such as DDoS.

Rate limiting also comes in useful if a particular user on the network makes a mistake in their request, thus asking the server to retrieve tons of information that may overload the network for everyone. With rate limiting in place however, these types of errors or attacks are much more manageable.

Rate limiting can be applied at different levels of service, including the network level, the application level, or the API level. For example, a network-level rate limiter could restrict the amount of traffic that a service receives from a particular IP address, while an API rate limiter could limit the number of API calls that a user or an application can make per second.

How does rate limiting work?

Rate limiting works by setting predefined limits on the number of requests that a user or an application can make within a specific time frame. When a user or an application exceeds these limits, the service will either slow down their requests, return an error message, or block them entirely.

There are several different rate limiting algorithms that a service can use to enforce these limits. Some of the most common algorithms include:

Fixed Window: In this algorithm, a fixed time window is set, and the number of requests made within that window is counted. If the number of requests exceeds the limit set for that window, the service will block or throttle the user or application.
Rolling Window: In this algorithm, the time window is moving, and the number of requests made within that window is counted. The window moves forward with each request, and if the number of requests exceeds the limit set for that window, the service will block or throttle the user or application.
Token Bucket: In this algorithm, tokens are added to a bucket at a fixed rate. Each request made by a user or application consumes a token from the bucket. If there are no tokens left in the bucket, the service will block or throttle the user or application.
Leaky Bucket: In this algorithm, requests are fed into a virtual "bucket" at a fixed rate. If the bucket overflows, the service will block or throttle the user or application.

Types of rate limits

There are various methods and parameters that can be defined when setting rate limits. The rate limit method that should be used will depend on what you want to achieve as well as how restrictive you want to be. The section below outlines three different types of rate limiting methods that you can implement.

User rate limiting: The most popular type of rate limiting is user rate limiting. This associates the number of requests a user is making to their API key or IP (depending on which method you use). Therefore, if the user exceeds the rate limit, then any further requests will be denied until they reach out to the developer to increase the limit or wait until the rate limit timeframe resets.
Geographic rate limiting: To further increase security in certain geographic regions, developers can set rate limits for particular regions and particular time periods. For instance, if a developer knows that from midnight to 8:00 am users in a particular region won't be as active, then they can define lower rate limits for that time period. This can be used as a preventative measure to help further reduce the risk of attacks or suspicious activity.
Server rate limiting: If a developer has defined certain servers to handle certain aspects of their application then they can define rate limits on a server-level basis. This gives developers the freedom to decrease traffic limits on server A while increasing it on server B (a more commonly used server).

Why is rate limiting crucial for online services?

Rate limiting is crucial for online services for several reasons:

Preventing bot attacks

In general, rate limiting is an effective technique for preventing bot attacks that rely on making a large number of requests within a short period of time. By limiting the number of requests that can be made, it's possible to prevent bots from overloading the server with requests and ensure that legitimate users are able to access the website or application without interruption.

The following attacks can be stopped by rate limiting:

DDoS Attacks: DDoS (Distributed Denial of Service) attacks are a common threat to online services. By overwhelming a service with a massive amount of traffic, DDoS attacks can cause downtime, slow response times, and even data breaches. Rate limiting can help prevent DDoS attacks by limiting the amount of traffic that a service can receive from a particular IP address or network.
Brute force attacks: Brute force attacks are bot attacks that attempt to guess passwords or other authentication credentials by trying multiple combinations of usernames and passwords until a valid match is found. By implementing rate limiting on login endpoints, it's possible to limit the number of login attempts that can be made within a specific time frame, preventing bots from guessing passwords or usernames.
Web scraping attacks: Web scraping attacks are bot attacks that attempt to scrape large amounts of data from a website or application. By implementing rate limiting on data retrieval endpoints, it's possible to limit the number of requests that can be made within a specific time frame, preventing bots from overloading the server with requests.
Credential stuffing attacks: Credential stuffing attacks are another type of bot attack that attempt to use stolen login credentials to gain access to user accounts. By implementing rate limiting on login endpoints, it's possible to limit the number of login attempts that can be made within a specific time frame, preventing bots from guessing passwords or usernames.
Content scraping attacks: Content scraping attacks are bot attacks that attempt to scrape large amounts of data from a website or application. By implementing rate limiting on data retrieval endpoints, it's possible to limit the number of requests that can be made within a specific time frame, preventing bots from overloading the server with requests.
API abuse attacks: API abuse attacks are bot attacks that attempt to abuse an API by making excessive requests, using stolen API keys, or exploiting vulnerabilities. By implementing rate limiting on API endpoints, it's possible to limit the number of requests that each user or application can make within a specific time frame, preventing bots from overloading the server with requests or exploiting vulnerabilities.
Spamming attacks: Spamming attacks are bot attacks that attempt to send large volumes of spam messages or comments to a website or application. By implementing rate limiting on message or comment submission endpoints, it's possible to limit the number of submissions that can be made within a specific time frame, preventing bots from overloading the server with spam messages.

Protecting resources

Online services have finite resources, including server capacity, bandwidth, and memory. By restricting the number of requests that a user or an application can make, rate limiting helps prevent the overloading of these resources, which can cause downtime or slow response times.

Fairness and equality

Rate limiting helps ensure fairness and equality among users of an online service. By limiting the number of requests that a user or an application can make, rate limiting prevents abusive behavior that could give some users an unfair advantage over others.

Compliance with regulations

Some online services are required by law or industry regulations to implement rate limiting to protect users' privacy and security. For example, financial services are required to implement rate limiting to prevent fraudulent transactions.

Rate limit implementation

There are various ways to go about actually implementing rate limits. This can be done at the server level, it can be implemented via a programming language or even a caching mechanism. The two implementation examples below show how to integrate rate limiting either via Nginx or Apache.

Nginx

If you're using Nginx as your web server and would like to implement rate limiting at the server-level then you can take advantage of the module ngx_http_limit_req_module. This can be implemented directly within your Nginx configuration file. Using this method, Nginx rate limits based on the user's IP address.

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=2r/s;
    ...

server {
    ...
    location /promotion/ {
        limit_req zone=one burst=5;
    }
}

The snippet above allows not more than 2 request per second at an average, with bursts not exceeding 5 requests.

Apache

Similarly, Apache users can also implement rate limiting within the Apache configuration file using more or less the same method as Nginx users. With Apache, the module mod_ratelimit must be used in order to limit client bandwidth. Throttling is applied to each HTTP response instead of being aggregated at the IP/client level.

<Location "/promotion">
    SetOutputFilter RATE_LIMIT
    SetEnv rate-limit 400
    SetEnv rate-initial-burst 512
</Location>

The values in the snippet above are defined in KiB/s. Therefore the rate-limit environment variable, used to specify the connection speed to be simulated is 400 KiB/s while the initial amount of burst data is 512 KiB/s.

Rate limiting and KeyCDN

KeyCDN also makes use of rate limiting for our API. API requests are currently rate limited at 20 queries per 60 seconds. KeyCDN also provides two HTTP headers to provide you with additional information regarding where your utilization is for the given rate limit.

X-Rate-Limit-Limit HTTP header is the rate limit ceiling for that given request.
X-Rate-Limit-Remaining HTTP header the number of requests left for the 60 seconds window.

In the event that you reach the rate limit, KeyCDN will return a 429 Too Many Requests response code.

You may also feel the effects of rate limiting on your origin server once implementing KeyCDN. In certain cases, users will receive a 500 Origin Not Reachable error which means that KeyCDN was not able to get a valid or any response from the origin server. In the event that this occurs and everything seems to be working fine on your origin, check with your hosting provider to see whether or not they have rate limiting applied and if so, they may need to increase the limit for your particular use case.

Summary

Rate limiting can be a great method to help fight against infrastructure attacks as well as block other types of suspicious activity. As mentioned, there are various methods you can use to implement rate limiting whether it be at the server level, user level, etc. If you are experiencing a 429 Too Many Requests error for a particular API you're working with, be sure to reach out to the developers or check their docs to verify what the rate limit is currently configured to and modify your usage to fit within those limits.

Support