This article highlights important information on HTTP caching headers and associated CDN behavior. In case you are looking for in-depth information on the role of HTTP cache headers in the modern web, here’s everything you need to know.
HTTP Cache Headers Explained
Caches work with content mainly through freshness and validation. A fresh representation is available instantly from a cache while a validated representation rarely sends the entire representation again if it hasn’t changed. In cases where there is no validator present (i.e. an ETag/Last-modified header), and a lack of explicit freshness info, it will usually (but not always) be considered uncacheable. Let’s shift our focus to the kind of headers you should be concerned about.
- Every resource can define its own caching policy via the Cache-Control HTTP header.
- Cache-Control directives control who caches the response, under what conditions and for how long.
Requests that don’t need server communication are considered the best requests: local copies of the responses allow the elimination of network latency as well as data charges resulting from data transfers. The HTTP specification enables the server to send several different Cache-Control directives which control how and for how long individual responses are cached by browsers among other intermediate caches such as a CDN.
Cache-control: private, max-age=0, no-cache
These settings are referred to as response directives. They are as follows:
Public vs. Private
A response that is marked “public” can be cached even in cases where it is associated with a HTTP authentication or the HTTP response status code is not cacheable normally. In most cases, a response marked “public” isn’t necessary, since explicit caching information (i.e. “max-age”) shows that a response is cacheable anyway.
On the contrary, a response marked “private” can be cached (by the browser) but such responses are typically intended for single users hence they aren’t cacheable by intermediate caches (e.g. HTML pages with private user info can be cached by a user’s browser but not by a CDN).
No-cache and No-store
“No-cache” shows that returned responses can’t be used for subsequent requests to the same URL before checking if server responses have changed. If a proper ETag (validation token) is present as a result, no-cache incurs a roundtrip in an effort to validate cached responses. Caches can however eliminate downloads if the resources haven’t changed. In other words, web browsers might cache the assets but they have to check on every request if the assets have changed (304 response if nothing has changed).
On the contrary, “no-store” is simpler. This is the case because it disallows browsers and all intermediate caches from storing any versions of returned responses i.e. responses containing private/personal information or banking data. Every time users request this asset, requests are sent to the server. The assets are downloaded every time.
The max-age directive states the maximum amount of time in seconds that fetched responses are allowed to be used again (from the time when a request is made). For instance, “max-age=90” indicates that an asset can be reused (remains in the browser cache) for the next 90 seconds.
The “s-” stands for shared as in “shared cache”. This directive is explicitly for CDNs among other intermediary caches. This directive overrides the max-age directive and expires header field when present. KeyCDN also obeys this directive.
The Cache-Control header is defined as part of HTTP/1.1 specifications and supersedes previous headers (i.e. Expires) used to specify response caching policies. Cache-Control is supported by all modern browsers so that’s all we need.
The old “pragma” header accomplishes many things most of them characterised by newer implementations. We are however most concerned with the
pragma: no-cache directive which is interpreted by newer implementations as
cache-control: no-cache. You don’t need to be concerned about this directive because it’s a request header which will be ignored by KeyCDN’s edge servers. It is however important to be aware of the directive for the overall understanding. Going forward, there won’t be new HTTP directives defined for pragma.
A couple of years back, this was the main way of specifying when assets expires. Expires is simply a basic date-time stamp. It’s fairly useful for old user agents which still roam unchartered territories. It is however important to note that cache-control headers, max-age and s-maxage still take precedence on most modern systems. It’s however good practice to set matching values here for the sake of compatibility. It’s also important to ensure you format the date properly or it might be considered as expired.
Expires: Sun, 03 May 2015 23:02:37 GMT
To avoid breaking the specification, avoid setting the date value to more than a year.
This type of validation token (the standard in HTTP/1.1):
- Is communicated via the ETag HTTP header (by the server).
- Enables efficient resource updates i.e. no data transfer takes place if the resource doesn’t change.
The following example will illustrate this. 90 seconds after the initial fetch of an asset, initiates the browser a new request (the exact same asset). The browser looks up the local cache and finds the previous cached response but cannot use it because it’s expired. This is the point where the browser request the full content from the server. The problem with it is, that if the resource hasn’t changed, there is absolutely no reason for downloading the same asset that is already in the CDN cache.
Validation tokens are solving this problem. The edge server creates and returns arbitrary tokens, that are stored in the ETag header field, which are typically a hash or other fingerprints of content of existing files. Clients don’t need to know how the tokens are generated but need to send them to the server on subsequent requests. If the tokens are the same then resources haven’t changed thus downloads can be skipped.
The web browser provides the ETag token automatically within the “If-None-Match” HTTP request header. The server then checks tokens against current assets in the cache. A 304 Not Modified response will tell the browser if an asset in the cache hasn’t been changed and therefore allowing a renewal for another 90 seconds. It’s important to note that these assets don’t need to be downloaded again which saves bandwidth and time.
How do web developers benefit from efficient revalidation?
Browsers do most (if not) all the work for web developers. For instance, they automatically detect if validation tokens have been previously specified and appending them to outgoing requests and updating cache timestamps as required based on responses from servers. Web developers are therefore left with one job only which is ensuring servers provide the required ETag tokens. KeyCDN’s edge servers are fully supporting ETags.
The Last-Modified header indicates the time a document last changed which is the most common validator. It can be seen as a legacy validator from the time of HTTP/1.0. When a cache stores an asset including a Last-Modified header, it can utilise it to query the server if that representation has changed overtime (since it was last seen). This can be done using an If-Modified-Since request header field.
An HTTP/1.1 origin server should send both, the ETag and the Last-Modified value. More details can be found in section 13.3.4 in the RFC2616.
KeyCDN Example Response Header:
HTTP/1.1 200 OK Server: keycdn-engine Date: Mon, 27 Apr 2015 18:54:37 GMT Content-Type: text/css Content-Length: 44660 Connection: keep-alive Vary: Accept-Encoding Last-Modified: Mon, 08 Dec 2014 19:23:51 GMT ETag: "5485fac7-ae74" Cache-Control: max-age=533280 Expires: Sun, 03 May 2015 23:02:37 GMT X-Cache: HIT X-Edge-Location: defr Access-Control-Allow-Origin: * Accept-Ranges: bytes
You can check your HTTP Cache Headers using KeyCDN’s HTTP header check tool.
The Cache-Control and ETag header field are the modern mechanism to control freshness and validity of your assets. The other values are “only” used for backwards compatibility.
Do you have any thoughts on using HTTP cache headers? If so we would love to hear them below in the comments.