The recent Amazon and Rackspace Xen security patching shows how far infrastructure needs to go before it becomes the invisible plumbing of the Internet let alone enterprise IT. Netflix wrote an interesting blog post that shows the lengths they went to in order to keep their services unaffected. We should be getting to the point where the application layer shouldn’t care about the chaos of the underlying infrastructure. Sure there will always be infrastructure aware applications for extreme cases but this should become the exception and not the norm.
I wrote a post over on TechRepublic that discusses a couple of cool technologies such as Docker and CoreOS that aim to make OS level patching better. However, there is still a need for all parts of the infrastructure to be self-healing that includes the hypervisor. I found Amazon’s comment that live-migration isn’t a silver bullet to maintenance issues. I’ve given the comment a couple of days of thought in the back of my mind. Outside of lower level issues such as networking or storage I can think of any hypervisor level maintenance that live-migration wouldn’t prevent down time. I’m also not responsible for an infrastructure that runs millions of VM’s either so there’s that.
Ultimately, infrastructure just like residential plumbing, no one cares about the details until something goes wrong. It’s infrastructure engineers and architects that should be responsible for making that layer disappear.