HTTP Cache Headers - A Complete Guide
This article highlights important information on HTTP caching headers and associated CDN behavior. In case you are looking for in-depth information on the role of HTTP cache headers in the modern web, here’s everything you need to know.
HTTP Cache Headers Explained
Caches work with content mainly through freshness and validation. A fresh representation is available instantly from a cache while a validated representation rarely sends the entire representation again if it hasn’t changed. In cases where there is no validator present (i.e. an ETag/Last-modified header), and a lack of explicit freshness info, it will usually (but not always) be considered uncacheable. Let’s shift our focus to the kind of headers you should be concerned about.
Every resource can define its own caching policy via the Cache-Control HTTP header. Cache-Control directives control who caches the response, under what conditions and for how long.
Requests that don’t need server communication are considered the best requests: local copies of the responses allow the elimination of network latency as well as data charges resulting from data transfers. The HTTP specification enables the server to send several different Cache-Control directives which control how and for how long individual responses are cached by browsers among other intermediate caches such as a CDN.
Cache-control: private, max-age=0, no-cache
These settings are referred to as response directives. They are as follows:
Public vs. Private
A response that is marked “public” can be cached even in cases where it is associated with an HTTP authentication or the HTTP response status code is not cacheable normally. In most cases, a response marked “public” isn’t necessary, since explicit caching information (i.e. “max-age”) shows that a response is cacheable anyway.
On the contrary, a response marked “private” can be cached (by the browser) but such responses are typically intended for single users hence they aren’t cacheable by intermediate caches (e.g. HTML pages with private user info can be cached by a user’s browser but not by a CDN).
No-cache and No-store
“No-cache” shows that returned responses can’t be used for subsequent requests to the same URL before checking if server responses have changed. If a proper ETag (validation token) is present as a result, no-cache incurs a roundtrip in an effort to validate cached responses. Caches can however eliminate downloads if the resources haven’t changed. In other words, web browsers might cache the assets but they have to check on every request if the assets have changed (304 response if nothing has changed).
On the contrary, “no-store” is simpler. This is the case because it disallows browsers and all intermediate caches from storing any versions of returned responses i.e. responses containing private/personal information or banking data. Every time users request this asset, requests are sent to the server. The assets are downloaded every time.
The max-age directive states the maximum amount of time in seconds that fetched responses are allowed to be used again (from the time when a request is made). For instance, “max-age=90” indicates that an asset can be reused (remains in the browser cache) for the next 90 seconds.
The “s-” stands for shared as in “shared cache”. This directive is explicitly for CDNs among other intermediary caches. This directive overrides the max-age directive and expires header field when present. KeyCDN also obeys this directive.
The must-revalidate directive is used to tell a cache that it must first revalidate an asset with the origin after it becomes stale. The asset must not be delivered to the client without doing an end-to-end revalidation. In short, stale assets must first be verified and expired assets should not be used.
The proxy-revalidate directive is the same as the must-revalidate directive, however, it only applies to shared caches such as proxies. It is useful in the event that a proxy services many users that need to be authenticated one-by-one. A response to an authenticated request can be stored in the user’s cache without needing to revalidate it each time as they are known and have already been authenticated. However, proxy-revalidate allows proxies to still revalidate for new users that have not been authenticated yet.
The no-transform directive tells any intermediary such as a proxy or cache server to not make any modifications whatsoever to the original asset. The Content-Encoding, Content-Range, and Content-Type headers must remain unchanged. This can occur if a non-transparent proxy decides to make modifications to assets in order to save space. However, this can cause serious problems in the event that the asset must remain identical to the original entity-body although must also pass through the proxy.
According to Google, the Cache-Control header is all that’s needed in terms specifying caching policies. Other methods are available which we’ll go over in this article, however, are not required for optimal performance.
The Cache-Control header is defined as part of HTTP/1.1 specifications and supersedes previous headers (i.e. Expires) used to specify response caching policies. Cache-Control is supported by all modern browsers so that’s all we need.
The old “pragma” header accomplishes many things most of them characterized by newer implementations. We are however most concerned with the
pragma: no-cache directive which is interpreted by newer implementations as
cache-control: no-cache. You don’t need to be concerned about this directive because it’s a request header which will be ignored by KeyCDN’s edge servers. It is however important to be aware of the directive for the overall understanding. Going forward, there won’t be new HTTP directives defined for pragma.
A couple of years back, this was the main way of specifying when assets expire. Expires is simply a basic date-time stamp. It’s fairly useful for old user agents which still roam unchartered territories. It is, however, important to note that cache-control headers, max-age and s-maxage still take precedence on most modern systems. It’s however good practice to set matching values here for the sake of compatibility. It’s also important to ensure you format the date properly or it might be considered as expired.
Expires: Sun, 03 May 2015 23:02:37 GMT
To avoid breaking the specification, avoid setting the date value to more than a year.
This type of validation token (the standard in HTTP/1.1):
- Is communicated via the ETag HTTP header (by the server).
- Enables efficient resource updates i.e. no data transfer takes place if the resource doesn’t change.
The following example will illustrate this. 90 seconds after the initial fetch of an asset, initiates the browser a new request (the exact same asset). The browser looks up the local cache and finds the previously cached response but cannot use it because it’s expired. This is the point where the browser requests the full content from the server. The problem with it this is that if the resource hasn’t changed, there is absolutely no reason for downloading the same asset that is already in the CDN cache.
Validation tokens are solving this problem. The edge server creates and returns arbitrary tokens, that are stored in the ETag header field, which are typically a hash or other fingerprints of content of existing files. Clients don’t need to know how the tokens are generated but need to send them to the server on subsequent requests. If the tokens are the same then resources haven’t changed thus downloads can be skipped.
The web browser provides the ETag token automatically within the “If-None-Match” HTTP request header. The server then checks tokens against current assets in the cache. A 304 Not Modified response will tell the browser if an asset in the cache hasn’t been changed and therefore allowing a renewal for another 90 seconds. It’s important to note that these assets don’t need to be downloaded again which saves bandwidth and time.
How do web developers benefit from efficient revalidation?
Browsers do most (if not) all the work for web developers. For instance, they automatically detect if validation tokens have been previously specified and appending them to outgoing requests and updating cache timestamps as required based on responses from servers. Web developers are therefore left with one job only which is ensuring servers provide the required ETag tokens. KeyCDN’s edge servers fully support ETags.
The Last-Modified header indicates the time a document last changed which is the most common validator. It can be seen as a legacy validator from the time of HTTP/1.0. When a cache stores an asset including a Last-Modified header, it can utilize it to query the server if that representation has changed over time (since it was last seen). This can be done using an If-Modified-Since request header field.
An HTTP/1.1 origin server should send both, the ETag and the Last-Modified value. More details can be found in section 13.3.4 in the RFC2616.
KeyCDN Example Response Header:
HTTP/1.1 200 OK Server: keycdn-engine Date: Mon, 27 Apr 2015 18:54:37 GMT Content-Type: text/css Content-Length: 44660 Connection: keep-alive Vary: Accept-Encoding **Last-Modified: Mon, 08 Dec 2014 19:23:51 GMT** **ETag: "5485fac7-ae74"** **Cache-Control: max-age=533280** **Expires: Sun, 03 May 2015 23:02:37 GMT** X-Cache: HIT X-Edge-Location: defr Access-Control-Allow-Origin: * Accept-Ranges: bytes
You can check your HTTP Cache Headers using KeyCDN’s HTTP Header Checker tool.
5. Extension Cache-Control Directives
Apart from the well-known Cache-Control directives outlined in the first section of this article, there also exists other directives which can be used as extensions to Cache-Control resulting in a better user experience for your visitors.
No conditional revalidation will be triggered even if the user explicitly refreshes a page. The immutable directive tells the client that the response body will not change over time, therefore, there is no need to check for updates as long as it is unexpired.
The stale-while-revalidate directive allows for a stale asset to be served while it is revalidated in the background.
A stale-while-revalidate value is defined to tell the cache that it has X amount of time to validate the asset in the background while continuing to deliver the stale one. An example of this would look like:
Cache-Control: max-age=2592000, stale-while-revalidate=86400
Learn more about the stale-while-revalidate directive in our stale-while-revalidate and stale-if-error guide.
The stale-if-error directive is very similar to the stale-while-revalidate directive in that it serves stale content when the max-age expires. However, the stale-if-error only returns stale content if the origin server returns an error code (e.g. 500, 502, 503, or 504) when the cache attempts to revalidate the asset.
Therefore, instead of showing visitors an error page, stale content is delivered to them for a predefined period of time. During this time it is the goal that the error has been resolved and that the asset can be revalidated.
Learn more about the stale-if-error directive in our stale-while-revalidate and stale-if-error guide.
KeyCDN and HTTP Cache Headers
KeyCDN allows you define your HTTP cache headers as you see fit. The ability to set the Expire and Max Expire values directly within the dashboard makes it very easy to configure things on the CDN side.
Furthermore, if you rather have even more control over your HTTP cache headers you can disable the Ignore Cache-Control feature in your Zone’s advanced features and have KeyCDN honor all of your cache headers from the origin. This is very useful in the event that you need to exclude a certain asset or group of assets from the CDN.
The Cache-Control (in particular), along with the ETag header field are modern mechanisms to control freshness and validity of your assets. The other values are “only” used for backward compatibility.
Do you have any thoughts on using HTTP cache headers? If so we would love to hear them below in the comments.