Taming bots not to consume extensive amount of hosting resources has been an ongoing task for us. After the successful launch of the new Anti-bot AI system, which has already blоcked more than 1 billion hits only from malicious bots, we’d like to shed some more light on another measure in that area - the crawl-delay setting. Read on to find out what it is, why you might want to consider it and why we no longer apply a default crawl-delay setting on our servers.
What is crawl-rate and crawl-delay?
By definition, the crawl-rate defines the time frame between separate requests bots make to your website. Basically, it defines how fast a bot will crawl your site. Having a crawl-delay setting tells bots, who choose to follow it (like Yahoo!, Bing, Yandex. etc.), to wait for a certain amount of time between single requests.
Why use crawl-delay setting?
If you have a lot of pages and many of them are linked from your index, a bot that starts crawling your site may generate too many requests to your site for a very short period of time. This traffic peak could possibly lead to depleting your hosting resources monitored on an hourly basis. This way, if you encounter such problems, it’s a good idea to set the crawl-delay to 1-2 seconds so the bots crawl your website in a more moderate way without causing peaks in load.
It’s important to say that the Google bot does not take into consideration the crawl-delay setting. That is why you should not worry that such a directive can influence your Google standings and you can safely use it, in case there are other aggressive bots you want to stop. It is highly unlikely to experience issues due to Google bot crawling, but if you want to lower its crawl-rate you can do this only from the Google Search Console (former Google Webmasters Tools).
No default crawl-delay on SiteGround servers
Until recently there was a default crawl delay setting applied universally on the SiteGround shared servers. It could have been overwritten by each user if different custom value was set in the user’s robot.txt files. We used this directive in order to prevent our customers from losing their server resources to bots. However, modern search engine bots are sophisticated enough to crawl without causing issues and bad bots are blocked by our AI system, so there was simply no point in keeping that setting. So we removed it.