Search engine crawlers (also known as bots or spiders) scan your website whether you like it or not. They scan pretty much everything that’s available, which is normally a good thing. Why is SEO with CDNs so important? As you start using a CDN, your content can appear from different domains. Nothing wrong about that as long as the search engines have clarity about your content. If the content is not clearly declared, they will penalize you for duplicate content.
We offer two options to reach the best SEO. Both options fulfill the goal of not having duplicate content and be aligned with search engines. Let’s take a closer look on both solutions.
1. Option: Canonical URLs
An extra HTTP header added to your zone lets the crawler know, that the content from the CDN is only a copy. Once we add rel=”canonical” to the HTTP header, we’re on the safe side. Crawlers are aware that this is only a copy.
The rel=”canonical” header will be applied to the whole zone. If you already send a canonical header from your origin server, there’s no need to enable it in the dashboard.
2. Option: Robots.txt
Search engines check for a robots.txt file at the root of a site. If the file is present, they will follow the instructions but if no file is present, they will scan everything. We will not enable the file by default, meaning that everything will be indexed.
Here’s the robots.txt file that we will add to your site if you enable the Robots feature in the dashboard:
User-agent: * Disallow: /
- The first line defines the crawler the rule applies to. In the example above, the robots.txt applies to every crawler. “User-agent: Google” would only apply for Google bots.
- The next line defines what path can be indexed. “Disallow: /” tells the search engine to not index anything.
You can still create your own robots.txt with more detailed instructions. Several instructions can be added in the same files.