Brandon Butler| Networkworld
Netflix’s Open Source Software strategy started on June 23, 2011. One of the company’s senior software engineers had an idea: “At some point, I think it would be valuable to open source the Zookeeper library I’ve written,” Jordan Zimmerman wrote to his bosses, talking about a piece of customized code he helped develop.
“Does Netflix have a policy on that?” The response he got: “Go for it. Our policy is no policies ;-)”
During the past two years Netflix has pulled back the curtains to provide a behind-the-scenes glimpse into how it runs one of the most popular video streaming websites on the Internet, almost entirely in the public cloud. The company has open sourced dozens of tools it’s developed internally. In doing so, some argue that Netflix is turning into one of the most important cloud computing companies in the industry, not only by proving that a company making $3.7 billion annually can run some of its most critical workloads in the public cloud, but also by sharing with developers how it’s being done and providing others with a path to follow.
Netflix Open Source (OSS) is a collection of Apache code bundles that the company has created and open sourced. As one of the biggest users of Amazon Web Service’s public cloud, many are related to plugins for using AWS resources; others are add-ons for other open source projects like Apache Hadoop, Cassandra and Pig. But mostly they focus on deploying public cloud computing resources, creating tools for automating and managing tasks, ensuring high availability and analyzing use.
Perhaps the most notable of Netflix’s OSS tools is the Simian Army – a series of tools that test for the tolerance of your cloud deployment by randomly shutting down certain systems. Chaos Monkey automatically selects individual virtual machines to collapse, while Chaos Gorilla does the same thing on a larger scale by replicating an entire Availability Zone in AWS’s cloud to shut down. Other projects like Asgard provide a cloud management dashboard to manage resources, while ICE tracks cloud spending by usage. Revealing the inner secrets of how it manages the tens of thousands of instances it uses in Amazon’s cloud at any given time isn’t all altruistic for Netflix though.
“There’s this massive realization in the industry that if you’re benefitting from these projects, then why not pay it forward and get the benefit of community input,” says Michael Skok, an industry watcher and venture capitalist at North Bridge Venture Partners who advises early stage cloud startups. “Ultimately you’re reducing your costs and increasing your value when you’re contributing to a movement. Everybody wins.”
The chief architect behind Netflix’s cloud and OSS strategy is Adrian Cockcroft, a former distinguished engineer at eBay and Sun, who says Netflix has many agendas in developing OSS. For one, it’s working to establish Netflix’s process as a best practice way of operating in the public cloud. Doing so allows the company to benefit from the knowledge of the broader open source community who recommend improvements. Furthermore, it helps Netflix hire and retain top engineering talent all while building up the company’s technology brand.
Now other companies beyond Netflix are starting to benefit from the company’s work. Eucalyptus makes software to run private clouds that are modeled off, and have fidelity with Amazon Web Service’s public cloud, and has worked to ensure Netflix OSS tools work on top of its private cloud platform. Vice president of products at Eucalyptus, Andy Knosp, says while not every customer will use tools such as the Simian Army or ICE, Netflix is providing a reference architecture of how to best use the public cloud.
There are other impacts Netflix OSS has on the industry. For one, Netflix is cultivating developers both internally and outside the organization who have become experts in executing large-scale AWS cloud deployments, says Matt Asay, vice president of corporate strategy at NoSQL database company 10Gen, who recently wrote an article on ReadWriteWeb titled “Netflix positioned to lead the next wave of cloud adoption.”
Those experts will be recruited by other companies and take what they’ve learned from Netflix and apply it to other organizations, expanding the scope of organizations using the cloud in the same manner as Netflix.
That’s exactly what happened with Carl Quinn, who used to manage the infrastructure engineering team at Netflix before being recruited to Riot Games, maker of one of the Internet’s most popular online streaming games, League of Legends. The game actually has just about as many streaming players as Netflix has streaming video customers, both at about 35 million, so the scale is equivalent between the two. Riot scooped up Quinn because of his skills in managing Netflix’s AWS use, which Quinn hopes to expand at Riot.
“Having these (Netflix OSS) tools at our disposal is huge,” he says. Netflix has already figured out ways of running massive-scale cloud platforms, which at the least Riot can use as a reference architecture for Riot deployments, and at most use some specific projects from Netflix OSS.
Not everyone is buying this blueprint, however. Some question how well Netflix has actually architected its cloud. The company famously fell silent on Christmas Eve in 2012 when Amazon’s cloud had an outage, bringing down the video streaming service on the holiday night. In a post titled “How Netflix is ruining cloud computing,” Joe Masters Emison argues that the best practices Netflix is encouraging aren’t really the best. Netflix is completely in Amazon’s cloud and users who follow that path are locking themselves into using only one cloud provider, he says, for example. Many believe the future of cloud computing will involve many public cloud options, not having all your eggs in one basket.
Cockcroft, Netflix’s cloud architect, says the company is exploring the use of other clouds, but none are at a feature and scale parity yet compared to Amazon for Netflix to use in a significant way. His goal is to run Netflix 100% from the cloud though, so he’s exploring all options. As for the outages, Cockcroft says that’s a fact of life in large-scale distributed systems like the public cloud.
Even Amazon’s CTO Werner Vogels warns customers that failures will happen. How well of a job Netflix has done being resilient to those failures is in the eye of the beholder: Some AWS outages have not impacted Netflix, others have.
Carl Brooks, a cloud watcher at the 451 Research Group, says realistically the Netflix OSS tools are good for companies who use a lot of AWS resources, and especially video streaming companies, because that’s what Netflix specializes in. Are the tools applicable to typical AWS customers? Maybe, but at least right now they’re not the next major cloud platform that will be adopted in the enterprise.
What Netflix can continue to do however, is prove that the cloud can be used to run important applications. For enterprises and organizations worried about using the cloud for their mission-critical services, Netflix is proving that it can be done. And it’s even sharing how to do it.