How our new backup system saved 24+ hours of downtime

lab

Remember when we announced our new infrastructure in October last year? Part of the innovation, which we were particularly proud of, was our in-house created backup/restore system. A few days ago this system was put to its first critical real-life test and the results were impressive. We were able to restore 3 times more data, 7 times faster, compared to the previous such event when we were still using the old backup solution. Here is how we did it.

How often do we need massive backup restores?

The short answer is: very rarely. Having a highly redundant infrastructure with multiple SSDs in RAID almost eliminates the need of such restores. Normally, when an SSD fails, it is seamlessly replaced with a new piece of hardware without any noteworthy downtime or data loss. And disk failures are very common: for a provider of our size, it is normal to see such events on an almost daily basis. However, every now and then, a misfortunate coincidence of several hardware and software failures at once can make the standard hardware replacement impossible. And these are the times, when we need to restore all the accounts that were on the damaged instance from our backup copies.

Previously, before our new backup system.

The previous time we needed to make full backup restore of a whole shared hosting server was more than an year ago. Back then we were using R1Soft backup, which is among the most popular in our industry. Hosting providers like us use this software for two main reasons. First, it is quite reliable. We’ve almost never had any serious issues with missing and corrupt backups. And second, it is very lightweight and does not create significant load on the production servers while creating the backups (a resource-intensive process that takes place every day). With these two features R1Soft works perfectly in 99% of the time — when it creates the backups and when individual backup copies are needed.

However, in the rare occasions when a full restore of multiple accounts is necessary, R1Soft has one serious drawback — the recovery process is painfully slow and the affected sites can experience prolonged outage. In the event in question, all our affected accounts were down for 28 hours. It took this long for two reasons. First, R1Soft does not allow simultaneous restores from and to multiple locations. All the data needs to be recovered through one single network interface and this is slow. Another issue with R1Soft is that the recovery cannot be incremental and the server instance is down during the whole restore process. All affected sites can only come back online at the same time, after the whole information is transferred from the backup server to the production machine. Therefore, even the smallest website could not be brought back up until we have restored the full server.

Most shared hosting providers will hardly consider this story a serious problem that requires further actions once the restore is over. After all, only a single machine was affected and all customers got their websites back without data loss. The downtime of the sites was also almost negligible on an yearly basis: 28 hours are just 0,3% of the year. However, at SiteGround, we were quite unhappy with the duration of the issue and were determined to prevent this from repeating in the future.

And now, after our new backup system.

That’s how we set our minds on creating our own backup system to guarantee a faster restore process and our talented DevOps department started working on it. We launched the new solution in October 2015 but it wasn’t until just a few days ago that we had to use it in an event similar to the one described above. Compared to our then-used solution R1Soft, our own system makes distributed backups and allows simultaneous restores from multiple backup instances to multiple production servers. Thus, we now were able to recover 4TB of data (which was nearly three times more than the previous time), in just 4 hours, compared to the 28 hours from the story above. Moreover, our system allows incremental recovery and the first accounts were up just a few minutes after the issue was identified, with the longest downtime (about 4 hours) affecting only few individual sites. This brought down the average downtime for all affected accounts to less than 2 hours, compared to 28 hours from before. Quite an impressive improvement, isn’t it? But…

Can it get even faster?

Yes, it can! In our latest massive restore case, we actually were not able to use the Infiniband network connectivity between our backup servers and the production ones as planned in such cases. Thus the data was transferred through the standard network of 1 GBit/s, instead over the 10 Gbit/s Infiniband connection. This, we found, was due to a dormant hardware issue that we were able to discover only during an actual restore. However, we have already made sure that next time this will not be an issue, and thus will make the restore even faster.

Another thing is that with the new system we can theoretically restore on unlimited number of production instances simultaneously, but in practice we are limited, not by the backup system itself, but by the way our DNS system works at the moment. We had three instances affected by the issue and each of them had individual DNS. Thus we needed to restore to only three new instances using the old IPs, so that the domain names, which are not registered with us can continue to work as before and do not experience additional downtime, due to DNS propagation time. To avoid such limitation in the future we plan to work on a brand new central DNS and/or proxy system.

Our backup system story is just another example of how we approach problems. We are never satisfied to just fix the immediate issue and forget about it until the next time. We take each problem as a challenge that needs a unique solution. And if such a solution does not exist at that time, we never shy away from inventing it ourselves.

Lilyana Yakimova

Product Owners Team Lead

I have been with SiteGround since it was born and it has always amazed me to watch this company grow and develop its unique personality.

Comments ( 65 )

Silverblade

Jul 12, 2016

Good job, I'm proud to be your customer

Reply

kenny

Jul 12, 2016

NICE

Reply

Alain

Jul 12, 2016

Thanks. Happy to be your customer.

Reply

Gerald Marsh

Jul 12, 2016

Well Done! I was quite grateful for the daily backups a couple of weeks ago where a configuration problem with my Drupal installation meant that upgrading a few modules completely clobbered 4 websites. I decided that my knowledge is not up to diagnosing exactly what happened so selected to restore from the previous backup. The man-machine interface is intuitve and the process completed very quickly. I still have to work out how to overcome the config issue but at least the sites work again. Thank you very much.

Reply

Anthony Boyd

Jul 13, 2016

Best hosting ever

Reply

Carlos Amado

Jul 14, 2016

Congratulations!, Happy to be your reseller!

Reply

Randy Carson

Jul 14, 2016

Great story, and a great company. I recommend your company so often, people think I must work there. Well done.

Reply

Paul Westeneng

Jul 14, 2016

Great to hear this. I hope my sites will never need this, but if they do, I know I'm in good hands.

Reply

Rolf Kenmo - HumanGuide

Jul 14, 2016

Well, I don't understand all the technical stuff, but I am very pleased as a customer! Moreover, what I appreciate very much is that you tell us this. Such stories give credibility;-) Thanks!

Reply

Camille P

Jul 14, 2016

This is great communication! Many companies would shy away from admitting that real life applies to them too. But telling us the real story only builds confidence with us as your customers. Much appreciated and thank you for sharing!

Reply

Gail Warnaar

Jul 14, 2016

I could not be happier with the service and the confidence in your ability to recover me if I ever need it. I am with you because my former company left me really hanging when THEIR server crashed. When they finally did get me back up, the site had no resemblance to what they had built less than a year before, and no one seemed to have any idea what my site had been 10 months previous! I really should have sued them, but was, and still am, too busy trying to restore my business! I still look for an expert in VirtueMart who can help me!

Reply

Shane Poteete

Jul 14, 2016

We have been with you for several years, resell your services, and routinely recommend you to our customers. Your customer service and tech support has always been exceptional. It is great to know that you are also developing new systems and solutions to help improve that service even more! Thank you for making our work day less stressful, and a little easier!!!

Reply

Philip Wade

Jul 14, 2016

Excellent work!

Reply

Cari Adamek

Jul 14, 2016

You are the best web host I've ever used and my previous one was very good so that's saying a lot. You aren't the best because you don't make mistakes — you're the best because of how you handle your mistakes and customer problems. You could have done what every other web host does and said, "That's how it works. There's nothing we can do about it. Your site will be up in 28 hours. Take a chill pill." Instead, you felt your customers' pain and said, "How can we do it better?" And you did it! Awesome.

Reply

Beatrice Johnston

Jul 14, 2016

super! ... happy to be your customer!

Reply

Brian Mitchell

Jul 14, 2016

I love the approach. I love even more the transparency that goes with sharing your process with the world. Most companies pretend problems never happen so as not to undermine customer confidence. Any sensible person knows this for the fallacy that it is. I love companies that show the world the challenges that they face and how hard they work resolve them and avoid them in the future. I will echo the sentiment from above... I am a proud and happy client.

Reply

Paul

Jul 14, 2016

I just love this company - such a great pro-active and creative attitude - a fan for life! Keep up the great work sitegrounders!

Reply

Frank Smith

Jul 14, 2016

Very nicely done! I am in the middle of completing phase 1 of a municipal fiber optic network. I have been preaching the "3 R's" = Redundancy, Restoration, and Resiliency through our first phase. It is greatly to hear about SiteGround's commitment to keepings thing up and running with the ability to bounce back when the proverbial stuff hits the fan.

Reply

Craig Bass

Jul 14, 2016

THIS is just one reason I love SiteGround! The hosting company I previously used went down for long periods of time (several days in some instances) and all they could say was they had "every man on deck" working on the problem. PLEASE, SiteGround, NEVER SELL OUT TO ENDURANCE INTERNATIONAL!!!

Reply

Craig

Jul 14, 2016

Outstanding! I appreciate your transparency on this.

Reply

Bruce Wilson

Jul 14, 2016

Stories like this are why I'm such an enthusiastic customer and promote you service when ever we bid on a project. You are a big selling point for us when we write our proposals. I sleep better at night knowing my two VPS's with you will be working when I wake up. :-)

Reply

Paolo

Jul 14, 2016

Great Work!!! Keep it going...

Reply

Jansen Wendlandt

Jul 14, 2016

Thanks for sharing! How you respond to situations like this is a testament not just to how you respond to rare events, but to daily events as well. Thank you.

Reply

Stefan Warum

Jul 14, 2016

This really shows how dedicated you are in providing the best service for your customers. I've tried a couple of hosts, but none comes even near to the amazing service that SiteGround offers. You are the only one I can recommend wholeheartedly! I'm grateful that I've found you!

Reply

David Fraser

Jul 14, 2016

SiteGround is not a standstill company. They seem to be ahead of the curve so to speak. I'm really glad to be one of you customers. Keep up the outstanding work.

Reply

Kenneth Shea

Jul 14, 2016

Good Job SiteGround and good for us to hear! Have had numerous web host since my web presence started in 2000, SiteGround has been by far the very best, no comparison.

Reply

Ron Wilder

Jul 15, 2016

Waaaaay better than Lunarpages... My last hosting company. Was down for over a week and a half with multiple websites. You guys and gals are great! Thanks!

Reply

Ed Troxell Creative

Jul 15, 2016

I can't stop talking about you guys! You are awesome! Keep up the great work!

Reply

Trish

Jul 15, 2016

Congratulations well done

Reply

Kevin Hinchman

Jul 15, 2016

Way to be proactive! Good job.

Reply

Andrea Gallucci

Jul 15, 2016

Since we moved our customers sites with you, finally we found a competent and reliable hosting provider, either for running time, speed, and competency of your tech support. Go on like this. Congratulations. Andrea Gallucci, MD DIGITHAI Software Group.

Reply

Howard Kelley

Jul 15, 2016

SG never disappoints whether its performance or customer service. Everyday I am thankful that I have moved so of my domains to SG from one of the largest and supposedly top U.S. hosting firms.

Reply

David Hunt

Jul 15, 2016

To be static is to be going backwards. Glad to hear that your eyes are on the future and improving your customer's experience.

Reply

Keith

Jul 15, 2016

WOW. You guys take hosting to a level no one else can touch. Thank you for putting in the time and effort to make this possible.

Reply

Yogendra Rawat

Jul 15, 2016

Nice to hear that my site is in expert hands....I can focus on selling as always....awesome work guys...

Reply

Andrio Suroyo

Jul 15, 2016

Great Job! I have yet to have any problems with hosting on Siteground and my website loads perfectly every time I access / work on it. Thank you for your constant effort to get even better.

Reply

Karl Steinmann

Jul 15, 2016

Brilliant. Keep up the great work, and I will continue to spread the Siteground gospel. ;-)

Reply

Riccardo

Jul 15, 2016

Really impressive guys. Congratulations for that and thanks for supplying to us the best hosting solution out there. Riccardo R99photography.com

Reply

Gerson de Barros

Jul 15, 2016

The people already described how happy and secured they fill with your solution. What else I can say! Im really proud to be one of your new customer... keep it up. Keep your eyes in the future and with a visionary solutions that can help us all and protect against downtime and losses.

Reply

Ruslan

Jul 15, 2016

Sounds cool!

Reply

GJ

Jul 15, 2016

You guys rock..these pro active measurements keep us all happy.

Reply

Brian Wall

Jul 15, 2016

Excellent and the report just increases my confidence in you.

Reply

Jarold Villanueva

Jul 15, 2016

I'm a new customer of siteground and I'm very happy of what they are doing for their customers.... Keep it up Guy's.... :-)

Reply

Steve Squeo

Jul 15, 2016

Love it!

Reply

Ulf Tölle

Jul 15, 2016

absolutely great achievement, very inspiring!

Reply

Howard

Jul 15, 2016

Outstanding work. I switched to SiteGround two months ago and could not be happier. Thank you!!!

Reply

Scott Peterson

Jul 15, 2016

I have to say, you really can't appreciate a good web host until you've had a bad one. My previous web host provided little support and my Magento installation ran painfully slow. I was apprehensive about switching to Siteground initially because of my bad experience. Since I made the switch to Siteground support has been excellent, Magento runs at least 5 - 10 times faster than before, and I have experienced virtually no down time. My previous host was down several times (sometimes for days) during the year and a half or so that I had them. In other words, Keep up the good work!

Reply

FABIO A ESPARZA Z

Jul 15, 2016

Some days ago, (by mistake) I delete a important information of a web and I almost got crazy!!! Some hours later, I have the idea to ask Siteground for help.... In a few minutes, This problem was fixed and I have the info again... It it was really quickly... now is more quickly???? GUAUUUUUUU For this I love Siteground!!!!... Simple the best!

Reply

Bradd Graves

Jul 15, 2016

I may need to use this feature soon. I'll let you know how it goes!

Reply

Lawrence Lim

Jul 17, 2016

I've personally have tried to restore a few files recently and it is really seamless and fast. Impressive and proud to be your valued customer.

Reply

WordSuccor

Jul 19, 2016

This is great.. It is going to help us a lot.

Reply

Umesh

Jul 20, 2016

Excellent work. I have recommended SiteGround to quite a few clients and stories like this help reiterate their, and my confidence in SiteGround.

Reply

Al

Jul 20, 2016

I've been recommending SG since I moved all my sites to them and my recommendations are not because I am a customer, but also because I believe every business owner deserves a good home for their business website with a hosting company that not only provides the best hosting plans, hardware, and support, but also that thinks ahead by looking at various ways to improve their service, and making our life easier. Did I mention that I've been with previous hosting companies and SG is the only hosting company that goes above and beyond the call of duty to make sure everything is running smoothly, and provide assistance even when the issue is not their doing? Because of our review, comments and recommendation of SG, all over the web, we are constantly contacted by people asking us why we strongly recommend SG, and whether we are getting some huge kickbacks from SG, and we tell them that our reviews are honest, unbiased, and they can try them for themselves and find out how good SG (the whole team) is. Am I happy with SG? Oh YES!!!

Reply

Peter (Santa) Rodaughan

Jul 20, 2016

Your story does concern me. 4TB!!! As word gets out you will have to start thinking about 1PB or even 1EB. God help us if it even gets to 1ZB or even 1YB. Keep up the great work and you will get so much bigger. Well done everyone.

Reply

Dan

Jul 24, 2016

This sounds great. As customers, how do we access the backups to restore?

Reply

Daniel Kanchev Siteground Team

Jul 25, 2016

Hi, Dan and thanks for the great question :) On our shared hosting servers our clients may use the backup restore tool in cPanel to manage their backups and restore data. For more details check this tutorial: https://www.siteground.com/tutorials/cpanel/backup_restore.htm On our cloud servers, however, only our support team can restore a backup for you. We are in the process of implementing the same backup/restore tool for our cloud customers and once it is ready it will be installed on all cloud servers.

Reply

Stephen Goodenough

Jul 26, 2016

I was one of the customers who had a website on the affected equipment, and I think must have been near the end of the back-up process, but well done. I'm glad I wasn't at the end of a 28 hour process! I'm always impressed by SG.

Reply

Jaswinder Kaur

Jul 27, 2016

Glad to know the progress SG is doing and I am quite happy that I moved to SiteGround in January, 2016 to host my Blog Ease Bedding. Now I want my blog to be more secure with daily backups, which I don't have in StartUp Plan. The second problem I am facing is- CPU increase. Please let me know how I can solve this problem, where I can get daily backups and no worries about CPU increase. Thanks.

Reply

Marina Yordanova Siteground Team

Jul 28, 2016

Hello Jaswinder, thank you for the good words! While on the StartUp plan you could take advantage of daily backups by signing up for our Backup subscription service: https://ua.siteground.com/daily_backup.htm Alternatively you could upgrade to a GrowBig or GoGeek plan, which include by default Backup subscriptions and allow higher CPU executions, as well as give you much more additional features.

Reply

Mortiz

Aug 01, 2016

Have you ever thought of selling this solution to other hosting companies?

Reply

seo learning

Aug 02, 2016

Hi Lilyana Congratulations!, Happy to be your reseller!

Reply

AJ

Sep 08, 2016

I love that you guys share the nerdy stuff!

Reply

MS

Jul 01, 2017

Good Job ! Legendary support

Reply

Woon Fei Lai

Oct 14, 2017

Impressive with the backup system, but not sure why would I receive the email today about this article after a year it published

Reply

Hristo Pandjarov Siteground Team

Oct 16, 2017

The email was about the latest upgrade of the system that we have just applied on all our servers. Check it out, the new interface is way better :)

Reply

Start discussion