How our new backup system saved 24+ hours of downtime

lab

Remember when we announced our new infrastructure in October last year? Part of the innovation, which we were particularly proud of, was our in-house created backup/restore system. A few days ago this system was put to its first critical real-life test and the results were impressive. We were able to restore 3 times more data, 7 times faster, compared to the previous such event when we were still using the old backup solution. Here is how we did it.

How often do we need massive backup restores?

The short answer is: very rarely. Having a highly redundant infrastructure with multiple SSDs in RAID almost eliminates the need of such restores. Normally, when an SSD fails, it is seamlessly replaced with a new piece of hardware without any noteworthy downtime or data loss. And disk failures are very common: for a provider of our size, it is normal to see such events on an almost daily basis. However, every now and then, a misfortunate coincidence of several hardware and software failures at once can make the standard hardware replacement impossible. And these are the times, when we need to restore all the accounts that were on the damaged instance from our backup copies.

Previously, before our new backup system.

The previous time we needed to make full backup restore of a whole shared hosting server was more than an year ago. Back then we were using R1Soft backup, which is among the most popular in our industry. Hosting providers like us use this software for two main reasons. First, it is quite reliable. We've almost never had any serious issues with missing and corrupt backups. And second, it is very lightweight and does not create significant load on the production servers while creating the backups (a resource-intensive process that takes place every day). With these two features R1Soft works perfectly in 99% of the time -- when it creates the backups and when individual backup copies are needed.

However, in the rare occasions when a full restore of multiple accounts is necessary, R1Soft has one serious drawback -- the recovery process is painfully slow and the affected sites can experience prolonged outage. In the event in question, all our affected accounts were down for 28 hours. It took this long for two reasons. First, R1Soft does not allow simultaneous restores from and to multiple locations. All the data needs to be recovered through one single network interface and this is slow. Another issue with R1Soft is that the recovery cannot be incremental and the server instance is down during the whole restore process. All affected sites can only come back online at the same time, after the whole information is transferred from the backup server to the production machine. Therefore, even the smallest website could not be brought back up until we have restored the full server.

Most shared hosting providers will hardly consider this story a serious problem that requires further actions once the restore is over. After all, only a single machine was affected and all customers got their websites back without data loss. The downtime of the sites was also almost negligible on an yearly basis: 28 hours are just 0,3% of the year. However, at SiteGround, we were quite unhappy with the duration of the issue and were determined to prevent this from repeating in the future.

And now, after our new backup system.

That’s how we set our minds on creating our own backup system to guarantee a faster restore process and our talented DevOps department started working on it. We launched the new solution in October 2015 but it wasn’t until just a few days ago that we had to use it in an event similar to the one described above. Compared to our then-used solution R1Soft, our own system makes distributed backups and allows simultaneous restores from multiple backup instances to multiple production servers. Thus, we now were able to recover 4TB of data (which was nearly three times more than the previous time), in just 4 hours, compared to the 28 hours from the story above. Moreover, our system allows incremental recovery and the first accounts were up just a few minutes after the issue was identified, with the longest downtime (about 4 hours) affecting only few individual sites. This brought down the average downtime for all affected accounts to less than 2 hours, compared to 28 hours from before. Quite an impressive improvement, isn’t it? But...

Can it get even faster?

Yes, it can! In our latest massive restore case, we actually were not able to use the Infiniband network connectivity between our backup servers and the production ones as planned in such cases. Thus the data was transferred through the standard network of 1 GBit/s, instead over the 10 Gbit/s Infiniband connection. This, we found, was due to a dormant hardware issue that we were able to discover only during an actual restore. However, we have already made sure that next time this will not be an issue, and thus will make the restore even faster.

Another thing is that with the new system we can theoretically restore on unlimited number of production instances simultaneously, but in practice we are limited, not by the backup system itself, but by the way our DNS system works at the moment. We had three instances affected by the issue and each of them had individual DNS. Thus we needed to restore to only three new instances using the old IPs, so that the domain names, which are not registered with us can continue to work as before and do not experience additional downtime, due to DNS propagation time. To avoid such limitation in the future we plan to work on a brand new central DNS and/or proxy system.

Our backup system story is just another example of how we approach problems. We are never satisfied to just fix the immediate issue and forget about it until the next time. We take each problem as a challenge that needs a unique solution. And if such a solution does not exist at that time, we never shy away from inventing it ourselves.

Marketing Director

I have been with SiteGround since it was born and it has always amazed me to watch this company grow and develop its unique personality. My rewarding and challenging job is to help SiteGround communicate its strengths in the best way possible, learn from its mistakes and become a better person, oops, I meant a better brand!

62 Comments

  1. Reply July 12, 2016 / 03:20 SilverbladeSiteGround Team

    Good job, I'm proud to be your customer

  2. Reply July 12, 2016 / 06:12 kennySiteGround Team

    NICE

  3. Reply July 12, 2016 / 08:48 AlainSiteGround Team

    Thanks. Happy to be your customer.

  4. Reply July 12, 2016 / 14:46 Gerald MarshSiteGround Team

    Well Done!

    I was quite grateful for the daily backups a couple of weeks ago where a configuration problem with my Drupal installation meant that upgrading a few modules completely clobbered 4 websites. I decided that my knowledge is not up to diagnosing exactly what happened so selected to restore from the previous backup. The man-machine interface is intuitve and the process completed very quickly.

    I still have to work out how to overcome the config issue but at least the sites work again.

    Thank you very much.

  5. Reply July 12, 2016 / 22:27 Anthony BoydSiteGround Team

    Best hosting ever

  6. Reply July 14, 2016 / 06:40 Carlos AmadoSiteGround Team

    Congratulations!, Happy to be your reseller!

  7. Reply July 14, 2016 / 07:36 Randy CarsonSiteGround Team

    Great story, and a great company. I recommend your company so often, people think I must work there. Well done.

  8. Reply July 14, 2016 / 07:55 Paul WestenengSiteGround Team

    Great to hear this.
    I hope my sites will never need this, but if they do, I know I'm in good hands.

  9. Reply July 14, 2016 / 09:22 Rolf Kenmo – HumanGuideSiteGround Team

    Well, I don't understand all the technical stuff, but I am very pleased as a customer!

    Moreover, what I appreciate very much is that you tell us this. Such stories give credibility;-)

    Thanks!

  10. Reply July 14, 2016 / 10:04 Camille PSiteGround Team

    This is great communication! Many companies would shy away from admitting that real life applies to them too. But telling us the real story only builds confidence with us as your customers. Much appreciated and thank you for sharing!

  11. Reply July 14, 2016 / 10:40 Gail WarnaarSiteGround Team

    I could not be happier with the service and the confidence in your ability to recover me if I ever need it. I am with you because my former company left me really hanging when THEIR server crashed. When they finally did get me back up, the site had no resemblance to what they had built less than a year before, and no one seemed to have any idea what my site had been 10 months previous! I really should have sued them, but was, and still am, too busy trying to restore my business! I still look for an expert in VirtueMart who can help me!

  12. Reply July 14, 2016 / 11:19 Shane PoteeteSiteGround Team

    We have been with you for several years, resell your services, and routinely recommend you to our customers. Your customer service and tech support has always been exceptional. It is great to know that you are also developing new systems and solutions to help improve that service even more! Thank you for making our work day less stressful, and a little easier!!!

  13. Reply July 14, 2016 / 11:51 Philip WadeSiteGround Team

    Excellent work!

  14. Reply July 14, 2016 / 12:15 Cari AdamekSiteGround Team

    You are the best web host I've ever used and my previous one was very good so that's saying a lot. You aren't the best because you don't make mistakes — you're the best because of how you handle your mistakes and customer problems. You could have done what every other web host does and said, "That's how it works. There's nothing we can do about it. Your site will be up in 28 hours. Take a chill pill." Instead, you felt your customers' pain and said, "How can we do it better?" And you did it! Awesome.

  15. Reply July 14, 2016 / 12:44 Beatrice JohnstonSiteGround Team

    super! ... happy to be your customer!

  16. Reply July 14, 2016 / 13:19 Brian MitchellSiteGround Team

    I love the approach. I love even more the transparency that goes with sharing your process with the world. Most companies pretend problems never happen so as not to undermine customer confidence. Any sensible person knows this for the fallacy that it is.

    I love companies that show the world the challenges that they face and how hard they work resolve them and avoid them in the future.

    I will echo the sentiment from above... I am a proud and happy client.

  17. Reply July 14, 2016 / 13:28 PaulSiteGround Team

    I just love this company - such a great pro-active and creative attitude - a fan for life!

    Keep up the great work sitegrounders!

  18. Reply July 14, 2016 / 13:39 Frank SmithSiteGround Team

    Very nicely done! I am in the middle of completing phase 1 of a municipal fiber optic network.
    I have been preaching the "3 R's" = Redundancy, Restoration, and Resiliency through our first phase. It is greatly to hear about SiteGround's commitment to keepings thing up and running with the ability to bounce back when the proverbial stuff hits the fan.

  19. Reply July 14, 2016 / 13:40 Craig BassSiteGround Team

    THIS is just one reason I love SiteGround! The hosting company I previously used went down for long periods of time (several days in some instances) and all they could say was they had "every man on deck" working on the problem. PLEASE, SiteGround, NEVER SELL OUT TO ENDURANCE INTERNATIONAL!!!

  20. Reply July 14, 2016 / 13:49 CraigSiteGround Team

    Outstanding! I appreciate your transparency on this.

  21. Reply July 14, 2016 / 14:02 Bruce WilsonSiteGround Team

    Stories like this are why I'm such an enthusiastic customer and promote you service when ever we bid on a project. You are a big selling point for us when we write our proposals. I sleep better at night knowing my two VPS's with you will be working when I wake up. 🙂

  22. Reply July 14, 2016 / 14:22 PaoloSiteGround Team

    Great Work!!! Keep it going...

  23. Reply July 14, 2016 / 14:44 Jansen WendlandtSiteGround Team

    Thanks for sharing!
    How you respond to situations like this is a testament not just to how you respond to rare events, but to daily events as well.
    Thank you.

  24. Reply July 14, 2016 / 15:17 Stefan WarumSiteGround Team

    This really shows how dedicated you are in providing the best service for your customers.
    I've tried a couple of hosts, but none comes even near to the amazing service that SiteGround offers. You are the only one I can recommend wholeheartedly! I'm grateful that I've found you!

  25. Reply July 14, 2016 / 15:19 David FraserSiteGround Team

    SiteGround is not a standstill company. They seem to be ahead of the curve so to speak. I'm really glad to be one of you customers. Keep up the outstanding work.

  26. Reply July 14, 2016 / 17:07 Kenneth SheaSiteGround Team

    Good Job SiteGround and good for us to hear!
    Have had numerous web host since my web presence started in 2000, SiteGround has been by far the very best, no comparison.

  27. Reply July 14, 2016 / 18:04 Ron WilderSiteGround Team

    Waaaaay better than Lunarpages... My last hosting company. Was down for over a week and a half with multiple websites. You guys and gals are great! Thanks!

  28. Reply July 14, 2016 / 18:10 Ed Troxell CreativeSiteGround Team

    I can't stop talking about you guys! You are awesome! Keep up the great work!

  29. Reply July 14, 2016 / 19:13 TrishSiteGround Team

    Congratulations
    well done

  30. Reply July 14, 2016 / 19:49 Kevin HinchmanSiteGround Team

    Way to be proactive! Good job.

  31. Reply July 14, 2016 / 19:54 Andrea GallucciSiteGround Team

    Since we moved our customers sites with you, finally we found a competent and reliable hosting provider, either for running time, speed, and competency of your tech support. Go on like this. Congratulations.
    Andrea Gallucci, MD DIGITHAI Software Group.

  32. Reply July 14, 2016 / 20:08 Howard KelleySiteGround Team

    SG never disappoints whether its performance or customer service. Everyday I am thankful that I have moved so of my domains to SG from one of the largest and supposedly top U.S. hosting firms.

  33. Reply July 14, 2016 / 20:18 David HuntSiteGround Team

    To be static is to be going backwards. Glad to hear that your eyes are on the future and improving your customer's experience.

  34. Reply July 14, 2016 / 20:56 KeithSiteGround Team

    WOW. You guys take hosting to a level no one else can touch. Thank you for putting in the time and effort to make this possible.

  35. Reply July 14, 2016 / 22:50 Yogendra RawatSiteGround Team

    Nice to hear that my site is in expert hands....I can focus on selling as always....awesome work guys...

  36. Reply July 14, 2016 / 23:17 Andrio SuroyoSiteGround Team

    Great Job! I have yet to have any problems with hosting on Siteground and my website loads perfectly every time I access / work on it. Thank you for your constant effort to get even better.

  37. Reply July 14, 2016 / 23:43 Karl SteinmannSiteGround Team

    Brilliant. Keep up the great work, and I will continue to spread the Siteground gospel. 😉

  38. Reply July 15, 2016 / 00:49 RiccardoSiteGround Team

    Really impressive guys. Congratulations for that and thanks for supplying to us the best hosting solution out there.

    Riccardo
    R99photography.com

  39. Reply July 15, 2016 / 00:59 Gerson de BarrosSiteGround Team

    The people already described how happy and secured they fill with your solution. What else I can say!
    Im really proud to be one of your new customer... keep it up. Keep your eyes in the future and with a visionary solutions that can help us all and protect against downtime and losses.

  40. Reply July 15, 2016 / 03:06 RuslanSiteGround Team

    Sounds cool!

  41. Reply July 15, 2016 / 04:14 GJSiteGround Team

    You guys rock..these pro active measurements keep us all happy.

  42. Reply July 15, 2016 / 04:14 Brian WallSiteGround Team

    Excellent and the report just increases my confidence in you.

  43. Reply July 15, 2016 / 06:26 Jarold VillanuevaSiteGround Team

    I'm a new customer of siteground and I'm very happy of what they are doing for their customers.... Keep it up Guy's.... 🙂

  44. Reply July 15, 2016 / 06:42 Steve SqueoSiteGround Team

    Love it!

  45. Reply July 15, 2016 / 07:10 Ulf TölleSiteGround Team

    absolutely great achievement, very inspiring!

  46. Reply July 15, 2016 / 07:50 HowardSiteGround Team

    Outstanding work. I switched to SiteGround two months ago and could not be happier. Thank you!!!

  47. Reply July 15, 2016 / 12:54 Scott PetersonSiteGround Team

    I have to say, you really can't appreciate a good web host until you've had a bad one. My previous web host provided little support and my Magento installation ran painfully slow. I was apprehensive about switching to Siteground initially because of my bad experience. Since I made the switch to Siteground support has been excellent, Magento runs at least 5 - 10 times faster than before, and I have experienced virtually no down time. My previous host was down several times (sometimes for days) during the year and a half or so that I had them.

    In other words, Keep up the good work!

  48. Reply July 15, 2016 / 13:13 FABIO A ESPARZA ZSiteGround Team

    Some days ago, (by mistake) I delete a important information of a web and I almost got crazy!!!

    Some hours later, I have the idea to ask Siteground for help.... In a few minutes, This problem was fixed and I have the info again...

    It it was really quickly... now is more quickly???? GUAUUUUUUU

    For this I love Siteground!!!!... Simple the best!

  49. Reply July 15, 2016 / 13:27 Bradd GravesSiteGround Team

    I may need to use this feature soon. I'll let you know how it goes!

  50. Reply July 16, 2016 / 23:09 Lawrence LimSiteGround Team

    I've personally have tried to restore a few files recently and it is really seamless and fast. Impressive and proud to be your valued customer.

  51. Reply July 19, 2016 / 05:13 WordSuccorSiteGround Team

    This is great.. It is going to help us a lot.

  52. Reply July 20, 2016 / 05:34 UmeshSiteGround Team

    Excellent work.
    I have recommended SiteGround to quite a few clients and stories like this help reiterate their, and my confidence in SiteGround.

  53. Reply July 20, 2016 / 06:12 AlSiteGround Team

    I've been recommending SG since I moved all my sites to them and my recommendations are not because I am a customer, but also because I believe every business owner deserves a good home for their business website with a hosting company that not only provides the best hosting plans, hardware, and support, but also that thinks ahead by looking at various ways to improve their service, and making our life easier. Did I mention that I've been with previous hosting companies and SG is the only hosting company that goes above and beyond the call of duty to make sure everything is running smoothly, and provide assistance even when the issue is not their doing?

    Because of our review, comments and recommendation of SG, all over the web, we are constantly contacted by people asking us why we strongly recommend SG, and whether we are getting some huge kickbacks from SG, and we tell them that our reviews are honest, unbiased, and they can try them for themselves and find out how good SG (the whole team) is.

    Am I happy with SG? Oh YES!!!

  54. Reply July 20, 2016 / 16:43 Peter (Santa) RodaughanSiteGround Team

    Your story does concern me. 4TB!!! As word gets out you will have to start thinking about 1PB or even 1EB. God help us if it even gets to 1ZB or even 1YB. Keep up the great work and you will get so much bigger. Well done everyone.

  55. Reply July 24, 2016 / 02:01 DanSiteGround Team

    This sounds great. As customers, how do we access the backups to restore?

    • Reply July 25, 2016 / 05:17 Daniel KanchevSiteGround Team

      Hi, Dan and thanks for the great question 🙂 On our shared hosting servers our clients may use the backup restore tool in cPanel to manage their backups and restore data. For more details check this tutorial:

      https://www.siteground.com/tutorials/cpanel/backup_restore.htm

      On our cloud servers, however, only our support team can restore a backup for you. We are in the process of implementing the same backup/restore tool for our cloud customers and once it is ready it will be installed on all cloud servers.

  56. Reply July 26, 2016 / 14:21 Stephen GoodenoughSiteGround Team

    I was one of the customers who had a website on the affected equipment, and I think must have been near the end of the back-up process, but well done. I'm glad I wasn't at the end of a 28 hour process! I'm always impressed by SG.

  57. Reply July 27, 2016 / 16:31 Jaswinder KaurSiteGround Team

    Glad to know the progress SG is doing and I am quite happy that I moved to SiteGround in January, 2016 to host my Blog Ease Bedding.

    Now I want my blog to be more secure with daily backups, which I don't have in StartUp Plan. The second problem I am facing is- CPU increase. Please let me know how I can solve this problem, where I can get daily backups and no worries about CPU increase.

    Thanks.

    • Reply July 28, 2016 / 05:04 Marina YordanovaSiteGround Team

      Hello Jaswinder, thank you for the good words! While on the StartUp plan you could take advantage of daily backups by signing up for our Backup subscription service: https://ua.siteground.com/daily_backup.htm
      Alternatively you could upgrade to a GrowBig or GoGeek plan, which include by default Backup subscriptions and allow higher CPU executions, as well as give you much more additional features.

  58. Reply August 1, 2016 / 13:33 MortizSiteGround Team

    Have you ever thought of selling this solution to other hosting companies?

  59. Reply August 1, 2016 / 22:36 seo learningSiteGround Team

    Hi Lilyana
    Congratulations!, Happy to be your reseller!

  60. Reply September 8, 2016 / 17:10 AJSiteGround Team

    I love that you guys share the nerdy stuff!

Reply

* (Required)