I’m grateful for the transparency of Netflix. This helps the overall cloud community understand to complexity of building redundant architectures built on public cloud resources.
Last week’s Amazon Web Services outage might have outsmarted Netflix’s Chaos Monkeys, but the content-distribution giant isn’t about to turn its back on cloud computing. According to a Friday blog post from the Netflix (s nflx) cloud team, the outage (which started with a generator failure and resulted in a cascading bug that took down AWS’s (s amzn) Elastic Load Balancer feature) exposed some flaws in Netflix’s operations both within and beyond its control, but it was a relatively small blip in what has been better overall availability since the company made the move entirely to the cloud.
That the AWS outage resulted in a control plane backlog that prohibited customers from failing over into Availability Zones not affected by the generator failure was Amazon’s fault. However, Netflix’s Greg Orzell and Ariel Tseitlin write, the outage also highlighted some problems with its own load-balancing architecture that ended up compounding the problem by “essentially…
View original post 330 more words