Yesterday we celebrated Independence Day here in the United States. It’s a day where we embrace the freedoms we enjoy as a country, look back on where we have come, and celebrate the promise of the future. Yesterday was also a different kind of Independence Day for my teams at AOL. A Data Center Independence Day, if you will.
You may or may not have been following the progress of the work that we have been doing here at AOL over the last 14 or so months but the pace of change has been simply breathtaking. One of the first things I did when I entered into the company was deeply review all of the aspects of Operations. From Data Centers to Network Engineering, to the engineering teams supporting the products and services and everything in between. The net of the exercise was that AOL was probably similar to most companies out there in terms of technology mix, from the CRUFT that I mentioned in a previous post, to latest technologies. There were some incredible technologies built over the last three decades, some outdated processes and procedures, and if I am honest traces of a culture where the past had more meaning of the present or future.
In a very short period of time all of that changed. We aggressively made changes to the organization, re-aligned priorities, and perhaps most of all we created and defined a powerful collection of changes and evolutions we would need to bring about with very aggressive timelines. These changes were part of a defined Technology Roadmap that broke the work we needed to accomplish into three categories of work. The categorization focused on the internal technical challenges and tools we needed to make to enhance our own internal efficiencies. The second categorization focused on the technical challenges and aggressive things we could do to enhance and bring greater scalability to our products and services. This would include things like additional services and technology suites to our internally developed cloud infrastructure, and other items that would allow for more rapid product delivery of our products and services. The last categorization of work, was for the incredibly aggressive “wish list” types of changes. Items that could be so disruptive, so incredibly game-changing for us, that they could redefine our work on the whole. In fact we named this group of work “Nibiru” after a mythical planet that is said to cross into our solar system and wreaks havoc and brings about great change.
On July 4, 2012, one of our Nibiru items arrived and I am extremely ecstatic to state that we achieved our “Data Center Independence Day”. Our primary “Nibiru” goal was to develop and deliver a data center environment without the need of a physical building. The environment needed to require as minimal amount of physical “touch” as possible and allow us the ultimate flexibility in terms of how we delivered capacity for our products and services. We called this effort the Micro Data Center. If you think about the amount of things that need to change to evolve to this type of strategy it’s a bit mind-boggling.
Here is just a few of the things required to look at/change/and automate to even make this kind of achievement possible:
- Developing an entirely new Technology Suite and the ability to deliver that capacity anywhere in the world with minimal to no staffing.
- Delivering extremely dense compute capacity (think the latest technology) to give us the longest possible use of these assets once deployed into the field.
- The ability to deliver a “Microdata Center” anywhere on the planet regardless of temperature and humidity settings
- The ability to support/maintain/and administer remotely.
- The ability to fit into the power envelope of a normal office building
- Participation in our cloud environment and capabilities
- The processes by which these facilities are maintained and serviced
- and much much more…
In my mind, Its one thing to claim a technical achievement, its quite another to operationalize that achievement and make the process of supporting it repeatable. That’s my measure as to when you can REALLY declare victory. Science Experiments don’t count. It has to just plain work. To that end our first “beta” site for the technology was the AOL campus in Dulles, Virginia. Out on a lonely slab of concrete in the back of one of the buildings our future has taken shape.
Thanks in part to a lot of the work going on in the data center containerization space, we were able to jump start much of the work in a relatively quick fashion. In fact the pace set the Data Center and Technology Operations teams to deliver this achievement is more than a bit astounding. Most, if not all, of the existing AOL Data Centers would fall somewhere around a traditional Tier III / Tier II Uptime Institute definition. The teams really pushed ahead way outside their comfort zones to deliver some incredibly evolutions in a very short period of time. Of course there were steps along the way to get here. But those steps now seem to be in double time. A few months back we announced the launching of ATC, Our first completely automated facility. The work that went into ATC, was foundational to our achievement yesterday. It allowed us to really start working on the hard stuff first. That is to say the ‘Operationalization’ of these kinds of environments. It set the stage of how we could evolve to this next tier of evolution. Below is a summary of some of the achievements of our ATC launch, but if you were curious for the specifics on our work there feel free to click the ‘Breaking the Chrysalis’ post I did at that time. You can see how the work that we have been driving in our own internal cloud environments, the changes in operational procedure, the change in thought is additive and fundamental to our latest achievement. Its especially interesting to note that with all of the interesting blips and hiccups occurring in the ‘cloud industry’ like the leap second and the terrible storms on the East Coast this week which affected many data centers, that ATC, our completely unmanned facility just kept humming along with no issues (To be fair neither did our traditional facilities) despite much of the initial negative feedback we had received was solely based around the reliability of such moves. It goes to show how important engineering FOR Operation is. For AOL we have built this in from the start.
What does this actually buy AOL?
Ok, we stuck some computers in a box and we made sure it requires very little care and feeding – what does this buy us? Quite a bit actually. Jay Moran, the Distinguished Engineer who was in charge of driving this effort is always quick to point out that the problem space here is not just about the Technology. It has to be a marriage with the business side as well. Obviously the inherent flexibility of the design allows us a greater number of places around the planet we can deploy capacity to and that in and of itself is pretty revolutionary. We are no longer tied to traditional data center facilities or colocation markets. That doesn’t mean we wont use them, it means we now have a choice. Of course this is only possible because of the internally developed cloud infrastructure but we have freed ourselves from having to be bolted onto or into existing big infrastructure. It allows us to have an incredible amount geo-distributed capacity at a very low cost point in terms of upfront capital and ongoing operational expense. This is a huge game changer. So much so, allow me to do a bit of the ‘back of the napkin math’ with you. Lets call our global capacity in terms of compute, storage, etc. that we have today in our traditional environments – the Total Compute Capability or TCC. Its essentially the bandwidth for the work that we can get done. Inside the cost for TCC you have operating costs such power, lease costs, Data Center facility maintenance costs, support staff, etc. You additionally have the depreciation for the facilities themselves (or the specific buildouts – if colocating), the server and other equipment depreciation, and the rest. Lets call that baseline X. The MicroData Center strategy built out with the latest, our most dense server standards and infrastructure would allow us to have 5X the amount of total TCC in less than 10% of the cost and physical footprint. If you think about how this will allow us to aggregate and grow over time it ultimately drives us to a VERY LOW operational cost structure for delivering our products and services. Additionally it positions us for the future in very significant ways.
- It redefines software architecture for greater resiliency
- It allows us an incredibly flexible platform for driving and addressing privacy laws, regulatory oversight, and other such concerns allowing us to respond rapidly.
- It further reduces energy consumption and carbon footprint emissions (important as taxation evolves around the world, as well as ongoing operational costs)
- Gives us the ability to drive Edge Computing delivery to potentially bypass CDNs for certain content.
- Gives us the capability to drive ‘Community-in-a-box’ whereby we can quickly launch new products in markets, quickly expand existing footprints like Patch in a low cost, but still hyper-local platform, allow the Huffington Post a platform to rapidly partner and enter new markets with minimal cost turn ups.
- The fact that the technology mix in our SKUs is comprised of compute, storage, and network capacity maximizes the amount of products and services we can deploy to it.
As Always its really about the People
I cannot let a post about this huge win for us to go by without mentioning the teams involved in delivering this capability. This is not just a win for AOL, or to a lesser degree the industry at large in another proof-point that it cant evolve if it puts its mind to changing, but rather the Technology Teams at AOL. When I was first approached about joining AOL, my slightly sarcastic and comedic response was probably much like yours – ‘Are they still around?’ But the fact of the matter is that AOL has a vision of where they want to go, and what they want to be. That was compelling for me personally, compelling enough for me to make the move. What has truly amazed me however is the dedication and tenacity of its employees. These achievements would not be possible without the outright aggressiveness the organization has taken to moving the company forward. Its always hard to assess from the outside just how hard an effort is internally to achieve. In the case of our micro Data Center Strategy, the teams had just about every kind of barrier to deliver this capacity. Every kind of excuse to not make it, or even not to try. They put all of those things aside and just plain executed. If you allow me a small moment of bravado – Not only did my teams simply kick ass, they did it in a way that moved the needle for the company, and in my mind once again catapulted themselves into the forefront of operations and technology at scale. We still have a bunch of Nibiru projects to deliver, so my guess is we haven’t heard the last of some of these big wins.