Joe Masters Emison| Informationweek
Several months ago, I said that Netflix’s push to drive the adoption of its open-source cloud-management toolkit, NetflixOSS, had the potential to “ruin cloud computing.”
Part of my argument revolved around the Netflix Aminator tool, which facilitates the creation of Amazon Machine Images (AMIs). While Aminator may be a good choice for Netflix, I argued that it encourages exactly the wrong habits for the majority of companies trying to deploy applications in the cloud. After many long discussions about that initial article, I believe there is still significant confusion about how one should best use AMIs (or the equivalent from other public cloud providers), and so I am dedicating this column to what I call “the baking debate.”
Let’s start with some background: Anyone who wants to use a public infrastructure-as-a-service (IaaS) provider like Amazon Web Services (AWS) needs to use machine images. These machine images are what they sound like — an image of a virtual server that you can launch, and once it is active (within minutes), it’s your server to use as you wish. The machine image has, at a minimum, an operating system on it, but it can have as much other stuff as you want to cram in. Most IaaS providers make it easy to launch one machine image, make changes to the image (like installing and configuring software), and then “save” the resulting machine as a new image. The “baking debate” revolves around how you should use machine images. Essentially, wrong choices will bite you down the road.
There are three positions in this debate: the “bootstrappers,” who focus on being able to orchestrate the creation and management of servers throughout their lifecycles; the “bakers,” who focus on building machine images for speed and consistency; and the “babes in the woods,” usually developers who’ve found that building machine images is a quick-and-dirty way to construct backups of servers that can then be cloned and replicated.
Bootstrappers like to use a small number of machine images — core, base operating system images that are fairly easy to establish across multiple IaaS providers. Then, after launching the base image, they run configuration management/orchestration software (like Puppet or Chef) to instantiate the full server environment — installing and configuring software, setting up connections to other servers, even restoring data from a backup and setting up replication between database servers.
Then, the same orchestration software can be used to modify already-running instances; for example, by rolling out software updates (take a server out of the load balancer, update code/software/configuration, run the test suite and, if it passes, add it back to the load balancer).
In the view of the Bootstrapper, every instance is flexible and dynamic, and machine images are often taken directly from experts (read: the IaaS provider).
Bakers like to make their own machine images so they don’t have to deal with installs or configuration after launch. Configuration management is part of the image-baking process. Running instances are perfect copies of machine images; whenever bakers need to make changes to those instances, they make new machine images, launch new instances and kill off the instances running the old image.
Babes in the woods are on a different planet entirely. They don’t read much documentation, preferring to jump headfirst into the cloud without understanding exactly what configuration management is or how it works. They pick a machine image at random, start manually installing and configuring software and loading data onto it, and discover that baking images seems like a good way to back up their servers. This works well — until they need to install a brand new version of a core piece of software or the underlying operating system, or relaunch an entire deployment with data spread across multiple resources … you get the idea. At that point, the poor babes find themselves having to reinvent the wheel, usually without the luxury of having saved any documentation. They may find some solace in thinking themselves bakers. They’re wrong.
Bakers have two main advantages over bootstrappers: launch speed and consistency. If you bake your fully configured servers into machine images, as soon as the image is launched your server is operational. On the other hand, if you have to run configuration management on a basic/core operating system and you have to install and configure software and load data on every server launch, server launches will take longer.
Second, when your configuration is baked, if the server successfully launches it will be exactly the server you want. If you’re using configuration management on boot, each individual server has to execute a number of steps to reach an operational state, and those steps often involve having to download files from remote servers that may not be responding. So baking images makes sense when you’re launching a ton of the same type of instance; in other words, exactly what Netflix does.
The Case For Bootstrappers
Bootstrappers have three main advantages over bakers: Significantly less complexity, an easier path to multicloud deployments and better cloud architectures, and more “orchestrate-able” servers.
However, managing images is not a trivial task. NetflixOSS tools make it easier, but adding a layer of infrastructure that has to be maintained and validated means adding complexity. What happens if your images don’t get baked properly? What happens if they get deleted? Every time you need to run your test suite, you need to bake images and launch and test instances for every image you bake. And if you’re image-reliant, then multicloud is going to be an even bigger pain than it already is; you now need to bake images for every cloud (and what happens if you can’t get an image to bake for a particular cloud?)
Perhaps the biggest advantage bootstrappers have over bakers is that bootstrappers don’t have to treat instances as disposable. The baker philosophy is that it’s easier to have an army of replicas than to worry about the fact that instances could be different, but that makes sense only when one actually has an army of replicas.
In my experience, most cloud deployments have a variety of different types of instances as opposed to many replicas of one type — and the most common operation that is done on these servers is not bursting to many additional instances, but rather updating code/software/configuration on a number of different servers. And in these cases, the bootstrappers have the upper hand, because making updates to servers in a properly bootstrapped deployment means running an update scenario across servers (which is often just a single click) as opposed to having to bake new images for each server type, kill off all of your old servers, and launch new ones.
One of the reasons I see NetflixOSS as inappropriate for most cloud users is because it is fundamentally a toolkit for bakers, not bootstrappers. NetflixOSS has the machine image at its core, and ultimately requires bootstrappers to become bakers if they want to use it properly.
For Netflix, it makes zero sense to bootstrap; the advantages of baking are critical to Netflix’s success on the cloud. And the babes too often see themselves as being bakers, when they’re really just doing things wrong. I think we’ll see better cloud architectures if we default to bootstrapping — and avoid NetflixOSS unless we have a compelling use case for it.