Breaking the Chrysalis

What has come before

When I first took my position at AOL I knew I was going to be in for some very significant challenges.   This position, perhaps more-so than any other in my career was going to push the bounds of my abilities.  As a technologist, as an operations professional, as a leader, and as someone who would hold measurable accountability to the operational success of an expansive suite of products and services.  As many of you may know, AOL has been engaged in what used to be called internally as a “Start-Around”.  Essentially an effort to try and fundamentally change the company from its historic roots to the premium content provider for the Internet. 

We no longer refer to this term internally as it is no longer about forming or defining that vision.  It has shifted to something more visceral.  More tangible.  It’s a challenge that most companies should be familiar with, It’s called Execution.  Execution is a very simple word but as any good operations professional knows, the devil is in the details, and those details have layers and layers of nuances.    Its where the proverbial rubber meets the road.  For my responsibilities within the company,  execution revolves 100% around delivering the technologies and services to ensure our products and content remain available to the world.   It is also about fundamentally transforming the infrastructural technologies and platform systems our products and content are based upon and providing the most agility and mobility we can to our business lines. 

One fact that is often forgotten in the fast-paced world of Internet Darlings, is that AOL had achieved a huge scale of infrastructure and technology investment long before many of these companies were gleams in the eyes of their founders.   While it may be fun and “new” to look at the tens of thousands of machines at Facebook, Google, or Microsoft – it is often overlooked that AOL had tens of thousands of machines (and still does!) and solved many of the same problems years ago.  To be honest it was a personal revelation for me when I joined.  There are few companies who have had to grow and operate at this kind of scale and every approach is a bit unique and different.  It was an interesting lesson, even for one who had a ton of experience doing something similar in “Internet Darling” infrastructures.

AOL has been around for over 27 years.  In technology circles, that’s like going back almost ten generations.   Almost 3 decades of “stuff”.  The stuff was not only gear and equipment from the natural growth of the business, but included the expansion of features and functionality of long standing services, increased systems interdependencies, and operational, technological, and programmatic “Cruft” as new systems / processes/ technologies were  built upon or bolted onto older systems. 

This “cruft” adds significant complexity to your operating environment and can truly limit your organization’s agility.  As someone tasked with making all this better, it struck me that we actually had at least two problems to solve.   The platform and foundation for the future, and a method and/or strategy for addressing the older products, systems, and environments and increase our overall agility as a company.

These are hard problems.  People have asked why I haven’t blogged in awhile externally.   This is the kind of challenge with multiple layers of challenges underneath that can keep one up at night.   From a strategy perspective do you target the new first?  Do you target the legacy environments to reduce the operational drag?  Or – Do you try and define a unified strategy to address both.  Its a lot harder and generally more complex, but they potential payoff is huge.   Luckily I have a world class team at AOL and together we built and entered our own cocoon and busily went to work.  We have gone down the path of changing out technology platforms, operational processes, outdated ways of thinking about data centers, infrastructure, and overall approach. Every inch fighting forward on this idea of unified infrastructure.

It was during this process that I came to realize that our particular legacy challenge, while at “Internet” scale, was more closely related to the challenges of most corporate or government environments than the biggest Internet players.  Sure we had big scale, we had hundreds of products and services, but the underlying “how to get there from here” problems were more universally like IT challenges than scaling out similar applications across commoditized infrastructure.   It ties into all the marketing promises, technological snake oil, and other baloney about the “cloud”.  The difference being that we had to quickly deliver something that worked and would not impact the business.  Whether we wanted to or not, we would be walking down some similar roads facing most IT organizations today.

As I look at the challenges facing modern IT departments across the world, their ability to “go to the cloud” or make use of new approaches is also securely anchored behind by the “cruft”  of their past.  Sometimes that cruft is so thick that the organization cannot move forward.  We were there, we were in the same boat.  We aren’t out of it yet – but we have made some pretty interesting developments that I think are pretty significant and I intend to share those learnings where appropriate. 

 

ATC

ATC IS BORN

Last week we launched a brand new data center facility we call, ATC.  This facility is fundamentally built upon the work that we have been doing around our own internal cloud technologies, shifts in operational process and methodology, and targeting our ability to be extremely agile in our new business model.  It represents a model on how to migrate the old, prepare for the new, and provide a platform upon which to build our future. 

Most people ignore the soft costs when looking at adoption of different cloud offerings, operational impacts are typically considered as afterthoughts.   What if you built those requirements in from day one… how would that change your design? your implementation? Your overall strategy?  I believe that ATC represents that kind of shift of thinking.  At least for us internally.

One of the key foundations for our ATC facility is our cloud platform and automation layer.  I like to think about this layer as a little bit country and a little bit rock and roll.  There is tremendous value in the learning’s that have come before, and nowhere else is this self evident than at AOL.  As I mentioned, the great minds of the past (as well as those in the present) had invested in many great systems that made this company a giant in the industry.   There are many such systems here, but one of the key ones in my mind is the Configuration Management System.  All organizations invest significantly into this type of platform.  If done correctly, their uses can span from more than a rudimentary asset management system, to include cost allocation systems, dependency mapping, detailed configuration and environmental data, and in some cases like ours provide the base foundation of leading us into the cloud. 

Many companies I speak with abandon this work altogether or live in a strange split/hybrid model where they treat “Cloud” as different.  In our space – new government regulations, new safe harbor laws, etc are continuing to drive the relevance of a universal system to act as a central authority.   The fact that this technology actually sped our development efforts in this automation cannot be ignored.

We went from provisioning servers in days, to getting base virtual machines up and running in under 8 seconds.  Want Service and Application images (for established products)? Add another 8 seconds or so.   Want to roll it into production globally (changing global DNS/Load balancing/Security changes)?  Lets call that another minute to roll out.   We used Open Source products and added our own development glue into our own systems to make all  this happen.  I am incredibly proud of my Cloud teams here at AOL, because what they have been able to do in such a relatively short period of time is to roll out a world class cloud and service provisioning system that can be applied to new efforts and platforms or our older products.   Better yet, the provisioning systems were built to be universal so that if required we can do the same thing with stand-alone physical boxes or virtual machines.  No difference.  Same system. This technology platform was recently recognized by the Uptime Institute at its last Symposium in California. 

auto2

This technology was put to the test in the recently with the earthquake that hit the East Coast of the United States.  While thankfully the damage was minimal, the tremor of Internet traffic was incredible.   The AOL homepage, along with our news sites started to get hammered with traffic and requests.  In the past this would have required a massive people effort to provision more capacity for our users.  With the new technology in place we were able to start adding additional machines to take the load extremely quickly with very minimal impact to our users.  In this particular case these machines were provisioned from our systems in existing data centers (not ATC), but the technology is the same.

This kind of technology and agility has some interesting side effects too.   It allows your organization to move much more quickly and aggressively than ever before.   I have seen Jevon’s paradox manifest itself over and over again in the Technology world.    For those of you who need a refresher, Jevons paradox is is the proposition that technological progress that increases the efficiency with which a resource is used tends to increase (rather than decrease) the rate of consumption of that resource. 

Its like when car manufacturers started putting the Miles per Gallon (MPG) efficiency on autos, the direct result was not a reduction of driving, but rather an overall increase of travel.

For ATC, which officially launched on October 1, 2011.  It took all of an hour to have almost 100 virtual machines deployed to it as soon as it was “turned on”.   It has since long passed that mark and in fact this technology usage is happening faster than coordinating executive schedules to attend our executive ribbon cutting ceremony this week.

While the Cloud development and technology efforts are cornerstones of the facility, it is not this work alone that is providing for something unique. After all however slick our virtualization and provisioning systems are, however deeply integrated they are into our internal tools and configuration management systems, those characteristics in and of themselves does not reflect the true evolution that ATC represents.

ATC is a 100% lights out facility.  There are absolutely no employees stationed at the facility full time, contract, or otherwise.   The entire premise is that we have moved from a reactive support model to a proactive or planned work support model.  If you compare this with other facilities (including some I built myself in the past) there is always personnel on site even if contractor.   This has fundamentally led to significant changes in how we operate our data centers, how, what, and when we do our work, and has impacted (downward) the overall costs to operate our environments.  Many of these are efficiencies and approaches I have used before (100% pre-racked/vendor integrated gear and systems integration) to fundamentally brand new approaches.  These changes have not been easy and a ton of credit goes to our operations and engineering staff in the Data Centers and across the Technology Operations world here at AOL.  Its always culturally tough to being open to fundamentally changing business as usual.   Another key aspect of this facility and infrastructure is that from network perspective its nearly 100% non-blocking.   My network engineers being network engineers pointed out that its not completely non-blocking for a few reasons, but I can honestly say that the network topology is the closest I have seen to “completely” non blocking deployed in real network environments ever especially compared to the industry standard of 2:1. 

Another incredible aspect of this new data center facility and the technology deployed is our ability to Quick Launch Compute Capacity.  The total time it took to go from idea inception (no data center) to delivering active capacity to our internal users was  90 days.  In my mind this made even more incredible by the fact that this was the first time that all these work-streams came together including the unified operations deployment model and included all of the physical aspects of just getting iron to the floor.    This time frame was made possible by a standardized / modular way to build out our compute capacity in logical segments based upon the the infrastructure cloud type being deployed (low tier, mid-tier, etc.).   This approach has given us a predictability to speed of deployment and cost which in my opinion is unparalleled.

The culmination of all of this work is the result of some incredible teams devoted to the desire to affect change, a little dash of renegade engineering, a heaping helping of some new perspective, blood, sweat, tears and vision.   I am extremely proud of the teams here at AOL to deliver this ground-breaking achievement.   But then again, I am more than a bit biased.   I have seen the passion of these teams manifested in some incredible technology.

As with all things like this, it’s been a journey and there is still a bunch of work to do.  Still more to optimize.  Deeper analysis and ease of aggregation for stubborn legacy environments.   We have already set our sights on the next generation of cloud development.  But for today, we have successfully built a new foundation upon which even more will be built.  For those of you who were not able to attend the Uptime Symposium this year I will be putting up some videos that give you some flavor of our work with driving a low cost cloud compute and provisioning system from Open Source components.

 

\Mm

Monster Help for the Military

AOL'ers packing up care packages for our troops

Today I had the honor and pleasure of working side by side with 50 other volunteers in support of Monster Help Day.  While there were lots of incredible efforts going on throughout the company for worthwhile causes, as the executive sponsor of the military support group here at AOL, I felt it right to participate in our large scale effort to support our men and women in the armed services.

For those of you unfamiliar with Monster Help Day, it is a companywide day of service for all employees worldwide. It is an all employee-inspired initiative and represents AOL’s commitment to helping people and continuing to hold a meaningful place in their lives. This year, more than 3,000 AOL employees will donate over 20,000 hours of service to more than 40 charities. By providing support to charitable organizations and local communities, AOL reinforces its core values and empowers its employees with rewarding experiences.  And to top it off, its not just for employees.  AOL will also be giving consumers the opportunity to show support for causes close to their hearts. For every visitor to monsterhelp.aol.com who posts an update from their Twitter account using the tweet button provided, AOL will donate 26 cents, in honor of AOL’s 26th year, to the Monster Help Day charity of their choice.  So hurry up, you still have plenty of hours left to support your favorite cause.

In my group, there were tables for writing cards, filling care packages filled with lots of goodies from home ranging from snacks, to common things our soldiers generally miss from home.   Stations were building boxes, wrapping batteries for shipment, filling out customs forms, folding t-shirts, and if I can get away with a pun – it all seemed to run with military efficiency.  Perhaps one of the most powerful moments for me came when Sydney Murphy, our lead recruiter and primary organizer of the event asked for the volunteers to introduce themselves and perhaps a little of why they chose this event over the many other worthwhile events.

The room listened as everyone in the room told their stories.  Those stories ranged from people who came just to help out and support our troops with no connection to the military, to those who have served, to those who have had family serve, to military brats -all grown up-, to quite a few who had children and family currently serving abroad.

There was an almost electric energy in the room as people went about their tasks and the entire area was abuzz with activity.  One of the items that was packed in every box was a special moisture-wicking t-shirt that we thought might be of use in places like Iraq and Afghanistan and hope they get some good use.

Today was one of those days where you get to see some of those traits that has made AOL the company it is today.   The dedication and effort by all was definitely inspiring.

To all our armed men and women in the military I would like to profoundly thank you for your service and wish you the best and ultimately a safe landing here at home.

\Mm

I’ve Got Mail….A new Aol.

You may have seen the announcement today about my recent decision and move to join the new leadership team at Aol.  To some of my friends in the Technorati, and most specifically the Valley, this move probably seems very contrarian.  Having built some of the largest cloud infrastructure’s in the world, re-aligning operational processes at massive scale, Aol at first stroke may seem an odd choice.  I have worked in some of the largest multi-national companies in the world, I have successfully (and unsuccessfully) launched start-ups, have been a cost center and carried a P&L.  I think I have a pretty good understanding of the range and complexity of challenges (especially from a technology perspective)  from small business to large.   Across the spectrum of these types and sized companies you get a different feel.   Different cultures.  Different attitudes.    Different Vibes.

Aol is aggressively moving to redefine itself in the industry, to significantly transform and morph itself into a world that Aol itself helped create and define over 25 years ago.   There is no arguing that the first true scale challenges in dealing with the Internet at large were experienced by those first AOL’ers as they had to deal with numbers of users never before seen in our industry.  They pushed the boundaries of technology, they pushed the boundary of operations, they created whole new paradigms.  To reinvent itself in a market with such competition, such diversity is a huge challenge.

One of the most surprising things to me is that Vibe-thing I talked about a few moements ago. When walking around the company you cannot help but notice that definitely has more of a technology start-up feel to it.   Its palpable.  One of the folks I ran into called it a start-around.   A combination of a Start-up and a Turn-around.  Perhaps thats the best description I have to describe that vibe.   Sure things have been tough, sure there is alot of legacy to work through, but the level of commitment to those folks that are here is incredible.  Moreso than that.  Its a culture of beleivers.  Its all the self-sacrafice and personal investment you find in a startup, but with a team of seasoned veterans.  Its quite unique in my experience.

As I mentioned, Aol has long held a place of respect in terms of Operational best practices at scale, and a culture that recognized the importance of technology in the delivery of its mission.  Tim Armstrong, the CEO and Google veteran, has built an incredible team of passionate technology veterans from places like Google, Microsoft, and others.  The mission is focused.  The mission is deliberate.  The mission is clear.   The mission is hard.   Its a huge challenge.   Its the kind of challenge I love.   If you think its impossible you are only encouraging my energy more.  I could have taken a safe bet.  But where is the excitement?  Where is the challenge?  As the saying goes, “A ship is safe in the harbour, but thats not what ships are for!”   This ship is setting sail and my commitment is that not only will we find a new world, we will define it!

In the coming days/weeks/months, I hope to share many of the exciting things we will be endeavoring to accomplish and give you a real taste of some of the big changes I will be attempting.   As always, technology and operational processes will be key to the success of the mission the company is on and I have some very definite ideas on how we can leap frog current thinking in this space and ensure that our technology and operational approach is no only a strategic value to the business, but also industry leading in execution.

\Mm

Tags: , , , ,