When will IT infrastructure become invisible?

TCloudshe recent Amazon and Rackspace Xen security patching shows how far infrastructure needs to go before it becomes the invisible plumbing of the Internet let alone enterprise IT. Netflix wrote an interesting blog post that shows the lengths they went to in order to keep their services unaffected. We should be getting to the point where the application layer shouldn’t care about the chaos of the underlying infrastructure. Sure there will always be infrastructure aware applications for extreme cases but this should become the exception and not the norm.

I wrote a post over on TechRepublic that discusses a couple of cool technologies such as Docker and CoreOS that aim to make OS level patching better. However, there is still a need for all parts of the infrastructure to be self-healing that includes the hypervisor. I found Amazon’s comment that live-migration isn’t a silver bullet to maintenance issues. I’ve given the comment a couple of days of thought in the back of my mind. Outside of lower level issues such as networking or storage I can think of any hypervisor level maintenance that live-migration wouldn’t prevent down time. I’m also not responsible for an infrastructure that runs millions of VM’s either so there’s that.

Ultimately, infrastructure just like residential plumbing, no one cares about the details until something goes wrong. It’s infrastructure engineers and architects that should be responsible for making that layer disappear.

Published by Keith Townsend

Now I'm @CTOAdvisor

6 thoughts on “When will IT infrastructure become invisible?

  1. You probably linked to wrong TechRepublic article.

    The article you linked to is: “Practice what you preach: Adopt cloud tech in your home virtualization lab”

    The article I assume you wanted to link to is:
    “CoreOS eliminates downtime from server OS patching”


    The funny thing is, the solution you talk about in the second article can also solve the problem you stated in the first article. Because in the second article you are talking about Docker. And one of the reasons Docker became popular with developers is because you can easily run many, many more containers on a developer laptop/desktop than you can run VMs.

    More and more developers are moving to building microservices and at scale in the cloud you would run one microservice per VM. So the cloud can auto-scale VMs. With Docker containers the ‘unit of scale’ is smaller.

    Thus containers make it possible again to run the same software on the developer laptop/desktop as on the servers.

    1. Thanks. I made the update. The one problem Docker doesn’t solve from the 1st post is capacity. It may help allow you to do more with less but ultimately most of us can’t afford to scale up to large enough in our home labs to support really complex installations.

      1. Just saying it might be that the trend is also improving because of Docker and microservices.

        With microservices it is easier to run just a part of the services you need.

        The biggest risk for developers to not be able to develop on their laptop is using APIs provided by the cloud provider. Like a using AWS DynamoDB because you can’t get access to the software.

        Personally, I’m a big advocate of trying to stay away from vendor lock-in.

  2. Based on what I’ve read AWS can live-migrate with Xen. But in case of AWS my guess would be, that this patch needed to be deployed very quickly. AWS can’t easily live migrate all the VMs in a short time because most of the VMs also use local storage on the host. If they had to move all that data in a short time over the network it would have been swamped.

    1. Having to do both memory and storage migration would be troublesome based on the servirty of the patch. I didn’t consider that local storage may be leveraged. You would think they’d have storage mirrored from host to host similar to VSAN.

      1. When you get a VM in AWS you (can ?) get scratch-space. Which is backed by local storage, no replication. Which can make it higher performance because it is local (depending on the other users on the same host). But that storage will be lost when you shutdown the VM/machine. It’s ephemeral. A lot of people use it, by handling the replication at the application layer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: