How do you measure Cloud up time?

CloudsSo, Amazon had four major outages of their cloud services last year with the longest and most notable being the one that took Netflix streaming down for 23 hours during the busy holiday period.  This article over on TechTarget discusses how Cloud up time is many times higher than enterprise data center up time.  This got me to considering how Cloud up time should be measured compared to data center up time.  If you have any experience managing SLA’s you have discovered how up time is defined (if at all) determines your vendor’s incentive and therefore level of efforts in restoring your application.

Applications are the name of the game and especially in Cloud based services.  Normally when you consider a Cloud solution you mainly look at hosting a specific application or set of applications.  This is where the legal jargon of what a vendor considers availability versus what you may consider availability come into play.  In the traditional data center we measure the availability of the major subsystems from HVAC to SAN and Network.  But how does this all translates to a service provider?  If your virtual machine (VM) is up and running but the Cloud the private connection to your database hosted in your DC is unavailable how is up time calculated? Who has responsibility in ensuring the connection stays available?  This is just one possibility of cross organizational support issues that exist in a Cloud environment.

It doesn’t mean much to have a VM physically up and running but not have access to the application.  Some vendors will still consider the above state as operational and not count against up time.  I do sympathize with Cloud vendors as this is a slippery slope.  How do you create a demarcation between the Cloud provider’s assets and the customer’s.  You can use application monitoring tools that create synthetic transactions to measure up time but what if the application failure is on your end and not the Cloud provider’s?

This is one of the areas I think providers like Rackspace have an opportunity to gain market share with their OpenStack Cloud.  Their “Fanatical” support approach looks to help customers keep their applications up and running no matter the source of the issue.  They will even upon request log into you VM and help figure out the issue.  I’ve dinged Rackspace in the past for playing word games with their SLA’s but bottom line they do have some of the best and most transparent support in the hosted business.

Even with this different take on Cloud support, measuring and evaluating up time metrics is a interesting challenge for Cloud customers.  It introduces a new skill is vendor SLA management.  If you are looking to migrate your first application to the Cloud, I’d take an especially hard look at the Cloud provider’s wording for service availability and what parts of the infrastructure and service they are ensuring is available and the level of support offered

As an enterprise customer looking at Cloud providers how do you evaluate (and value) Cloud providers on availability?

Published by Keith Townsend

Now I'm @CTOAdvisor

11 thoughts on “How do you measure Cloud up time?

      1. The more I think about, I believe there is only one solution if you want your applications to stay online. Replicate your data to a second cloud provider and have everything prepared so you can deploy the systems needs to run your applications when the first one goes down. When it’s really bad, start it all up, change the DNS and your are done ?

      2. I believe a lot of people would like to take that approach and that’s where solutions from Cloud Brokers can come into play. But it’s basically an DR exercise and can be extremely difficult to execute across different providers. This is especially true with large data sets and high volume transactions. The other challenge is keeping images cloud vendor neutral. Openstack promises to help solve some of the complexity but not all.

      3. Yes, it is a DR. At some point you’ll need to make a choice as business: no new transactions, or lose the last few transactions when dealing with very high volumes.

        But it is not the job of OpenStack to solve it, it is at the wrong layer, it can not help you, it is the job of configuration management (like puppet, chef) or service orchestration (like juju) to deploy the same services on a different cloud(provider).

      4. There is a difference between goal and implementation. OpenStack can be used to build a cloud. OpenStack can be used by a provider or for building a private cloud.

        You can use the same management software to talk to the OpenStack private cloud as the OpenStack public cloud. If you choose the same hypervisor as your public cloud provider you can move your images without doing any changes.

        Their intention is to improve on this of course, but they aren’t there yet.

        They currently basically have what Eucalyptus for EC2. A private cloud with full EC2 compatibility.

  1. I think Lennie has a good point, about the cold DR on another provider but it’s a pricey proposition. Maybe if Netflix actually managed to get that “Qwikster” hoodwink over on us they would’ve duplicated at Rackspace!

    I must say though, I’m wary about seeing individual vendors promote the “open / interoperable” facet of OpenStack because common sense dictates that it’s thin ice over a deep black pool of commercial interest.

    Our cloud brokerage / IaaS marketplace start-up hopes to help people sort out some of the madness – we just starting to offer historical uptime data courtesy of Cedexis, as well as latency numbers. In the end, don’t trust a vendor’s word – trust their customers!

    1. How pricey it is totally depends on how much data you have. It isn’t that pricey, you only need to duplicate the data (and thus machines and traffic which are needed to replicate the data). Applications and so on don’t need to run at the DR-provider.

      I’m slightly on the fence when it comes to OpenStack myself, they have a lot of developers from different companies. If they can keep the (core) code vendor neutral, then they can do the same as the Linux kernel developers did.

      When someone comes up with a support for a new type of device for the Linux kernel, they might not be asked to come up with a very generic interface yet.

      But when other vendors come with similair devices then a generic interface has to be created (it is hard to come up with a generic interface if you only know one device, you are pretty much garanteed to get it wrong). And the first vendor has to be moved to the new interface and the old system deprecated.

      I’ve seen this work in OpenStack a couple of times, so I think this is a very good sign. It is their policy to do so.

      As shippable code (a working product) really is the only thing that counts AND if OpenStack keeps sticking to the policies as mentioned above then they really will be the Linux kernel of cloud computing.

      They might even be a good competitor to giants like Amazon. Linux now dominates all markets, except for the (Intel-based) desktop. Will their be an ARM-based desktop to speak of ? I don’t know, it might depends on if people want something like the Ubuntu smartphone. Linux can already deliver the same experience as on ARM-based desktops as on Intel-based desktops, right now. This inclides pretty much all of the applications available on that desktop. Mac OS X and Windows can’t say the same thing.

      OpenStack tries to be as flexible as the Linux kernel and incorporate what any vendor wants to contribute, but on their terms: those policies around generic code.

      It looks promising.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: