To crawl delay, or not to crawl delay?

Taming bots not to consume extensive amount of hosting resources has been an ongoing task for us. After the successful launch of the new Anti-bot AI system, which has already blоcked more than 1 billion hits only from malicious bots, we’d like to shed some more light on another measure in that area - the crawl-delay setting. Read on to find out what it is, why you might want to consider it and why we no longer apply a default crawl-delay setting on our servers.

What is crawl-rate and crawl-delay?

By definition, the crawl-rate defines the time frame between separate requests bots make to your website. Basically, it defines how fast a bot will crawl your site. Having a crawl-delay setting tells bots, who choose to follow it (like Yahoo!, Bing, Yandex. etc.), to wait for a certain amount of time between single requests.

Why use crawl-delay setting?

If you have a lot of pages and many of them are linked from your index, a bot that starts crawling your site may generate too many requests to your site for a very short period of time. This traffic peak could possibly lead to depleting your hosting resources monitored on an hourly basis. This way, if you encounter such problems, it’s a good idea to set the crawl-delay to 1-2 seconds so the bots crawl your website in a more moderate way without causing peaks in load.

It’s important to say that the Google bot does not take into consideration the crawl-delay setting. That is why you should not worry that such a directive can influence your Google standings and you can safely use it, in case there are other aggressive bots you want to stop. It is highly unlikely to experience issues due to Google bot crawling, but if you want to lower its crawl-rate you can do this only from the Google Search Console (former Google Webmasters Tools).

No default crawl-delay on SiteGround servers

Until recently there was a default crawl delay setting applied universally on the SiteGround shared servers. It could have been overwritten by each user if different custom value was set in the user’s robot.txt files. We used this directive in order to prevent our customers from losing their server resources to bots. However, modern search engine bots are sophisticated enough to crawl without causing issues and bad bots are blocked by our AI system, so there was simply no point in keeping that setting. So we removed it.

Product Development - Technical

Enthusiastic about all Open Source applications you can think of, but mostly about WordPress. Add a pinch of love for web design, new technologies, search engine optimisation and you are pretty much there!

10 Comments

  1. Reply June 14, 2017 / 08:21 SharronSiteGround Team

    If you have any worries about bots getting jiggy with crawling your site; Wordfence plugin can be set to cause crawl delay. - It will temp-block a bot that makes too many crawl-requests in a set period. ( This applies to free & paid-for Wordfence. ) I'm not affiliated to Wordfence, but I thought I'd mention it.

    • Reply June 15, 2017 / 00:26 Hristo PandjarovSiteGround Team

      That can be potentially dangerous for Google crawling your site and may result in error reports if set to high. Just a caution notice, when you limit bots, make sure you don't overdo it.

  2. Reply June 15, 2017 / 07:47 LisaSiteGround Team

    I think it's safe to say that the AI Bot that you recently launched doesn't kill all malicious crawlers, by the attach that hit my site yesterday. I had to go through and block the malicious IP addresses. How do we set a craw delay if we want to do so, to catch the malicious bots that get through your system?

    • Reply June 16, 2017 / 01:14 Hristo PandjarovSiteGround Team

      Unfortunatelly, no system can catch 100% of the malicious traffic. However, we've blocked already billions of hits towards our servers with it and we constantly improve its performance. Setting crawl-delay will not help you against bad bots because they will simply ignore it. I would recommend opening a ticket about this with info about the traffic you think is malicious so we can check it out and make sure our system will start detecting it better or giving some firewalling plugin a try. The CloudFlare firewall in the Plus version is a great option to check too.

  3. Reply June 19, 2017 / 07:31 Craig DanielsSiteGround Team

    Always glad you're improving things against bad actors. But yesterday my site got bashed and then shut down for too much CPU usage. all the traffic came from 2 IP addresses that mostly hit 3 pages and yet they were not stopped and I had to watch as my sites where shut down for hours.

    I blocked the IP addresses but it was too late. Can't you create a setting where we limit the number of hits from all IP addresses to say 200 a day as a way of stopping thousands of hits from the same IP?

    • Reply June 20, 2017 / 01:02 Hristo PandjarovSiteGround Team

      Unfortunatelly, that wouldn't be a good idea on a larger scale because it would affect a lot of customers in a negative way. However, I will look into that case and see if a possible update of our Anti-Bot system rules could be applied.

  4. Reply June 25, 2017 / 12:54 stu rohrerSiteGround Team

    Thanks for this, but could you explain where to go to implement the crawl delay on my domain/server?

  5. Reply July 5, 2017 / 15:09 Brian ProwsSiteGround Team

    To what extent does Pingdom and WordPress' Jetpack slow down server response times? I have both set to monitor for downtimes. (I'm using free CloudFlare service.)

    Pingdom reports an average server response time of around 680. Google keeps reminding me that server response time shouldn't exceed 200ms.

    • Reply July 6, 2017 / 00:14 Hristo PandjarovSiteGround Team

      JetPack has tons of functionality and there isn't a straigh-forward answer to that question. In most cases it does not slow your site actually 🙂 I would recommend that you post a ticket in your Help Desk so we can give it a look and tell you more.

Reply

* (Required)