Google sets the pace in achieving 99.9%+ SLA performance for cloud-based application suites

by Tony Redmond
Oct 03, 2011

Despite IBM’s claim that Lotus Live is a true contender, in the eyes of most companies who are considering cloud-based applications Google Apps and Office 365 are the only games in town. Given Microsoft’s difficulties in achieving its SLA, it’s worth looking at the SLA record of its major competitor. And some interesting facts come into focus if you look at the information presented in the Google Enterprise blog.

First, in 2011 Google decided that they would exclude planned maintenance from the calculation of downtime. Scheduled maintenance has always been the get-out clause for services providers but now Google says that all downtime counts when they calculate SLA. Google further claims that they are the first major cloud provider to eliminate maintenance windows from their SLA. By comparison, the latest version of the Service Level Agreement for Microsoft Online Services (dated June 1, 2011 and available here) says that “Downtime means the total minutes in a month during which the aspects of a service specified … are unavailable, multiplied by the number of affected users, excluding (1) Scheduled Downtime…”


It seems that Google has created a competitive advantage in how it measures its SLA. However,counting scheduled downtime or not in the measurement of an SLA really doesn’t matter if real outages occur. In this respect, Google goes on to state that Gmail achieved an SLA of 99.984% in 2010 for both consumer and business users. In human terms this means that Gmail was unavailable for approximately seven minutes per month. I doubt that real-life people noticed the seven minutes as some of this time will have accumulated through glitches that occur for a few seconds at a time (the Internet, as we all know, is prone to glitches) and some of the time will probably have happened when users were asleep.

In fact, outages really only become bothersome when they last longer than the length of time it takes an average user to go and get a cup of coffee, or whatever is your beverage of choice. The logic here is that users will recognize that a problem is happening and put it down to their PC, local network, local IT, or something else and then go and get a drink. If the problem persists after they return refreshed and ready to work then they get annoyed. On the other hand, if service has been resumed then they are happy. The two outages suffered by Office 365 in August and September 2011 resulted in a total of circa 330 minutes downtime (if you were one of the users affected by both outages) demonstrates the unwanted attention and stress that flow from extended outages that fail my "is it time for a coffee" test. For those not close to a calculator, 330 minutes is equivalent to roughly 47 months of outage at the 99.984% level.

Microsoft hopes that Office 365 will deliver better reliability than BPOS and that it won't have to refund users with the kind of 25% credits for monthly subscriptions that it's been forced to pay for the two incidents to date. Even though it's been off to a bad start, I think that Office 365 will prove to be much more reliable over the long term. And if it does, then Google won’t be able to point out salient points such as:

Comparable data for Microsoft BPOS® is unavailable, though their service notifications show 113 incidents in 2010: 74 unplanned outages, and 33 days with planned downtime.

I have not been through the BPOS data to validate Google’s claim but I imagine that they wouldn’t make such an assertion without covering themselves with chapter and verse.

In terms of 2011 performance, in a September 27 article about their new status dashboard for Google Apps, Google state that they achieved an SLA of 99.99% for Gmail in the first six months of the year, or about five minutes downtime per month. I assume that this is measured against their new SLA calculation including scheduled downtime so this is really a terrific performance.

Gmail isn’t perfect and it has its ups and downs too. The disappearing inbox syndrome suffered by some 20,000 users on 28 February 2011 (a software update was later blamed) is an example of where Google has run into choppy waters. However, you can’t argue that Google has set the pace for SLA delivery for cloud email and application suite services and that Office 365 has work to do to get close to Google’s record.

Cloud email still only occupies a small but growing portion of the overall market. Customer faith and trust that cloud-based solutions really work will increase over time as performance demonstrates the worth and reliably of the solutions allied to some of the attractive financial aspects that salespeople invariably focus upon. Office 365 has started poorly and Microsoft now has to focus on achieving its 99.9% SLA target for the last quarter of 2011, then the first six months of 2012, then for all of 2012, and on into the future. And once it has a truly unapproachable record of delivery, the other strengths of Office 365 such as class-leading clients, user familiarity with the Office applications, and the potential of SharePoint will make it an even fiercer competitor than it is today.

Discuss this Blog Entry 2

on Oct 3, 2011
I noticed only two Google Apps for Business outages - calendaring for about 30 minutes and docs list for approximately an hour. Calendar outage I noticed when my iPhone reminded me of a meeting but my Google Calendar window didn't pop up a notice simultaneously. When trying to access the calendar online, it was not working. No big deal... had to go to the meeting anyway, and by the time the meeting was over, so was the outage. The docs list outage was more frustrating. It happened shortly after I returned from lunch. I wanted to search for a particular string and match it to documents, but the docs list would not open. I ended up having to use the Google Site I created to run through the documents one by one to search for the keyword. By the time I became very frustrated with this and had moved on to other tasks, the docs list popped back up and I could again perform a quick search. These two instances aside, I have had rock-solid performance with Google Apps. Some workers feel more comfortable being able to blame the flakiness of local file servers and email servers on IT employees... but after years of dealing with multi-day outages on local servers, 30 minutes to an hour being the new big deal outage is quite welcome.
on Oct 4, 2011
Tony, I am confused. You give kudos to Google for achieving 99.984% and 99.99% availability. You also mention the two major Office365 incidents. You then go on to say "...I think that Office 365 will prove to be much more reliable over the long term." On what do you base this opinion? You did not mention the duration of the two major Office365 outages, nor did you mention that for one of the outages Microsoft did not acknowledge a problem for over 90 minutes. And while alluding to the ongoing performance and reliability of BPOS, you provide no indication that Microsoft has learned lessons and improved operations for Office365. In fact, the early troubles for Office365 indicate that Microsoft still has difficulty managing a multi-tenant environment. Office365 is a package of "2010" generation servers running as virtualized systems on shared hardware. Exchange 2010 and its security architecture are fundamentally single-tenant in design. Until Microsoft can figure out how to run a multi-tenant cloud using with a single-tenant architecture (think square peg and round hole), you should not expect the reliability of Office365 to improve. Looking at the reasons behind outages provides insight into future performance (

Please log in or register to post comments.