Through an idea and force of will, he created an industry…

This week the Data Center Industry got the terrible news it knew might be coming for some time.   That Ken Brill, founder of the Uptime Institute had passed away.  Many of us knew that Ken had been ill for some time and although it may sound silly, were hoping he could somehow pull through it.   Even as ill as he was, Ken was still sending and receiving emails and staying in touch with this industry that quite frankly he helped give birth to.  

I was recently asked about Ken and his legacy for a Computerworld article and it really caused me to stop and re-think his overall legacy and gift to the rest of us in the industry.  Ken Brill was a pioneering, courageous, tenacious, visionary who through his own force of will saw the inefficiencies in a nascent industry and helped craft it into what it is today.

Throughout his early career experience Ken was able to see the absolute silo’ing of information, best practices, and approaches that different enterprises were developing around managing their mission critical IT spaces.    While certainly not alone in the effort, he became the strongest voice and champion to break down those walls, help others through the process and build a network of people who would share these ideas amongst each other.  Before long an industry was born.   Sewn together through his sometimes delicate, sometimes not so delicate cajoling and through it all his absolute passion for the Data Center industry at large.

One of the last times Ken and I got to speak in person.In that effort he also created and permeated the language that the industry uses as commonplace.   Seeing a huge gap in terms of how people communicated and compared mission critical capabilities he became the klaxon of the Tiering system which essentially normalized the those conversations across the Data Center Industry.   While some (including myself) have come to think it’s a time to re-define how we classify our mission critical spaces, we all have to pay homage to the fact that Ken’s insistence and drive for the Tiering system created a place and a platform to even have such conversations.  

One of Ken’s greatest strengths was his adaptability.   For example, Ken and I did not always agree.   I remember an Uptime Fellows meeting back in 2005 or 2006 or so in Arizona.  In this meeting I started talking about the benefits of modularization and reduced infrastructure requirements augmented by better software.   Ken was incredulous and we had significant conversations around the feasibility of such an approach.   At another meeting we discussed the relative importance or non-importance of a new organization called ‘The Green Grid’ (Smile)and if Uptime should closely align itself with those efforts.   Through it all Ken was ultimately adaptable. Whether it was giving those ideas light for conversation amongst the rest of the Uptime community via audio blogs, or other means, Ken was there to have a conversation.

In an industry where complacency has become commonplace, where people rarely question established norms, it was always comforting to know that Ken was there acting the firebrand, causing the conversation to happen.   This week we lost one of the ‘Great Ones’ and I for one will truly miss him.  To his family my deepest sympathies, to our industry I ask, “Who will take his place?”

 

\Mm

Uptime, Cowgirls, and Success in California

This week my teams have descended upon the Uptime Institute Symposium in Santa Clara.  The moment is extremely bittersweet for me as this is the first Symposium in quite sometime I have been unable to attend.  With my responsibilities expanding at AOL beginning this week there was simply too much going on for me to make the trip out.  It’s a down right shame too.  Why?

We (AOL) will be featured in two key parts at Symposiums this time around for some incredibly ground breaking work happening at the company.   The first is a recap of the incredible work going on in the development of our own cloud platforms.  Last year you may recall that we were asked to talk about some of the wins and achievements we were able to accomplish with the development of our cloud platform.   The session was extremely well received.   We were asked to come back, one year on, to discuss about how that work has progressed even more.   Aaron Lake, the primary developer of our cloud platforms and my Infrastructure Development Team, will be talking on the continued success, features, and functionality, and the launch of our ATC Cloud Only Data Center.   Its been an incredible break neck pace for Aaron and his team and they have delivered world-class capabilities for us internally.

Much of Aaron’s work has also enabled us to win the Uptime Institutes First Annual Server Round Up Award.  I am especially proud of this particular honor as it is the result of an amazing amount of hard work within the organization on a problem faced by companies all over the planet.   Essentially this is Operations Hygiene at a huge scale, getting rid of old servers, driving consolidation, moving platforms to our cloud environments and more.  This talk will be lead by Julie Edwards, our Director of Business Operations and Christy Abramson, our Director of Service Management.  Together these two teams led the effort to drive out “Operational Absurdities” and our “Power Hog” programs.  We have sent along Lee Ann Macerelli and Rachel Paiva who were the primary project managers instrumental in making this initiative such a huge success.  These “Cowgirls” drove an insane amount of work across the company resulting in over 5 million dollars of un-forecasted operational savings, proving that there is always room for good operational practices.  They even starred in a funny internal video to celebrate their win which can be found here using the AOL Studio Now service.

If you happen to be attending Symposium this year feel free to stop by and say hello to these amazing individuals.   I am incredibly proud of the work that they have driven within the company.

 

\Mm

AOL Power Hog Award

I have received a ton of emails after my post about our Uptime Institute Server Round Up Award asking me about our “Power Hog” Award.   In case you missed it, part of our internal analysis was going through and identifying inefficient servers and systems and we motivated the owners of those systems to migrate their installations to the cloud infrastructure that we built out.  You definitely knew you were in trouble when a Power Hog Award arrived on your desk.   I guess we were not below shame as a tactic.   So for those of you who were interested in seeing our illustrious(?) award I thought I would share a photo of one.

pig

 

\Mm

Attacking the Cruft

Today the Uptime Institute announced that AOL won the Server Roundup Award.  The achievement has gotten some press already (At Computerworld, PCWorld, and related sites) and I cannot begin to tell you how proud I am of my teams.   One of the more personal transitions and journeys I have made since my experience scaling the Microsoft environments from tens of thousands of servers to hundreds of thousands of servers has been truly understanding the complexity facing a problem most larger established IT departments have been dealing with for years.  In some respects, scaling infrastructure, while incredibly challenging and hard, is in large part a uni-directional problem space.   You are faced with growth and more growth followed by even more growth.  All sorts of interesting things break when you get to big scale. Processes, methodologies, technologies, all quickly fall to the wayside as you climb ever up the ladder of scale.

At AOL I faced a multi-directional problem space in that, as a company and as a technology platform we were still growing.  Added to that there was 27 years of what I call “Cruft”.   I define “Cruft” as years of build-up of technology, processes, politics, fiscal orphaning and poor operational hygiene.  This cruft can act as a huge boat anchor and barrier to an organization to drive agility in its online and IT operations.  On top of this Cruft a layer of what can best be described as lethargy or perhaps apathy can sometimes develop and add even more difficulty to the problem space.

One of the first things I encountered at AOL was the cruft.  In any organization, everyone always wants to work on the new, cool, interesting things. Mainly because they are new and interesting..out of the norm.  Essentially the fun stuff!  But the ability for the organization to really drive the adoption of new technologies and methods was always slowed, gated or in some cases altogether prevented by years interconnected systems, lost owners, servers of unknown purpose lost in the distant historical memory and the like.   This I found in healthy populations at AOL. 

We initially set about building a plan to attack this cruft.   To earnestly remove as much of the cruft  as possible and drive the organization towards agility.  Initially we called this list of properties, servers, equipment and the like the Operations $/-\!+ list. As this name was not very user-friendly it migrated into a series of initiatives grouped the name of Ops-Surdities.   These programs attacked different types of cruft and were at a high level grouped into three main categories:

The Absurdity List – A list of projects/properties/applications that had a questionable value, lack of owner, lack of direction, or the like but was still drawing load and resources from our data centers.   The plan here was to develop action plans for each of the items that appeared on this list.

Power Hog – An effort to audit our data center facilities, equipment, and the like looking for inefficient servers, installations, and /or technology and migrating them to new more efficient platforms or our AOL Cloud infrastructure.  You knew you were in trouble when you had a trophy of a bronze pig appear on your desk or office and that you were marked. 

Ops Hygiene – The sometimes tedious task of tracking down older machines and systems that may have been decomissioned in the past, marked for removal, or were fully depreciated and were never truly removed.  Pure Vampiric load.  You may or may not be surprised how much of this exists in modern data centers.  It’s a common issue I have had with most data center management professionals in the industry.

So here we are, in a timeline measured in under a year, and being told all along the way by“crufty old-timers” that we would never make any progress, my teams have de-comissioned almost 10,000 servers from our environments. (Actually this number is greater now, but the submission deadline for the award was earlier in the year).  What an amazing accomplishment.  What an amazing team!

So how did we do it?

As we will be presenting this in a lot more detail at the Uptime Symposium, I am not going to give away all of our secrets in a blog post and give you a good reason to head to the Uptime event and listen to and ask the primary leaders of this effort how they did it in person.  It may be a good use of that Travel budget your company has been sitting on this year.

What I will share is some guidelines on approach and some things to be wary of if you are facing similar challenges in your organization.

FOCUS AND ATTENTION

I cannot tell you how many I have spoken with that have tried to go after ‘cruft’ like this time and time again and failed.   One of the key drivers for success in my mind is ensuring that there is focus and attention on this kind of project at all levels, across all organizations, and most importantly from the TOP.   To often executives give out blind directives with little to no follow through and assume this kind of thing gets done.   They are generally unaware of the natural resistance to this kind of work there is in most IT organizations.    Having a motivated, engaged, and focused leadership on these types of efforts goes and extraordinarily long way to making headway here.  

BEWARE of ORGANIZATIONAL APATHY

The human factors that stack up against a project like this are impressive.  While they may not be openly in revolt over such projects there is a natural resistance to getting things done.  This work is not sexy.  This work is hard.  This work is tedious.  This likely means going back and touching equipment and kit that has not been messed with for a long time.   You may have competing organizational priorities which place this kind of work at the bottom of the workload priority list.   In addition to having Executive buy in and focus, make sure you have some really driven people running these programs.  You are looking for CAN DO people, not MAKE DO people.

TECHNOLOGY CAN HELP, BUT ITS NOT YOUR HEAVY LIFTER

Probably a bit strange for a technology blog to say, but its true.  We have an incredible CMDB and Asset System at AOL.  This was hugely helpful to the effort in really getting to the bottom of the list.   However no amount of Technology in place will be able to perform the myriad of tasks required to actually make material movement on this kind of work.   Some of it requires negotiation, some of it requires strength of will, some of it takes pure persistence in running these issues down…working with the people.  Understanding what is still required, what can be moved.  This requires people.   We had great technologies in place from the perspective of knowing where are stuff was, what it did, and what it was connected to.  We had great technologies like our Cloud to move some of these platforms to ultimately.    However, you need to make sure you don’t go to far down the people trap.  I have a saying in my organization – There is a perfect number of project managers and security people in any organization.  Where the work output and value delivered is highest.   What is that number?  Depends – but you definitely know when you have one too many of each.

MAKE IT FUN IF YOU CAN

From the brass pigs, to minor celebrations each month as we worked through the process we ensured that the attention given the effort was not negative. Sure it can be tough work, but you are at the end of the day substantially investing in the overall agility of your organization.  Its something to be celebrated.    In fact at the completion of our aggressive goals the primary project leads involved did a great video (which you can see here) to highlight and celebrate the win.   Everyone had a great laugh and a ton of fun doing what was ultimately a tough grind of work.  If you are headed to Symposium I strongly encourage you to reach out to my incredible project leads.  You will be able to recognize them from the video….without the mustaches of course!

\Mm

Chaos Monkeys, Donkeys and the Innovation of Action

Last week I once again had the pleasure of speaking at the Uptime Institute’s Symposium.  As one of the premiere events in the Data Center industry it is definitely one of those conferences that is a must attend to get a view into what’s new, what’s changing, and where we are going as an industry.  Having attended the event numerous times in the past, this year I set out on my adventure with a slightly different agenda.

Oh sure I would definitely attend the various sessions on technology, process, and approach.  But this time I was also going with the intent to listen equally to the presenters as well as the scuttlebutt, side conversations, and hushed whispers of the attendees.   Think of it as a cultural experiment in being a professional busy body.  As I wove my way around from session to session I was growing increasingly anxious that while the topics were of great quality, and discussed much needed areas of improvement in our technology sector – most of them were issues we have covered, talked about and have been dealing with as an industry for many years.   In fact I was hard pressed to find anything of real significance in the new category.   These thoughts were mirrored in those side conversations and hushed whispers I heard around the various rooms as well.

One of the new features of Symposium is that the 451 Group has opted to expand the scope of the event to be more far reaching covering all aspects of the issues facing our industry.   It has brought in speakers from Tier 1 Research and other groups that have added an incredible depth to the conference.    With that depth came some really good data.   In many respects the data reflected (in my interpretation) that while technology and processes are improving in small pockets, our industry ranges from stagnant to largely slow to act.  Despite mountains of data showing energy efficiency benefits, resulting cost benefits, and the like we just are not moving the proverbial ball down the field.

In a purely unscientific poll I was astounded to find out that some of the most popular sessions were directly related to those folks who have actually done something.  Those that took the new technologies (or old technologies) and put them into practice were roundly more interesting than more generic technology conversations.   Giving very specific attention to detail on the how they accomplished the tasks at hand, what they learned, what they would do differently.   Most of these “favorites” were not necessarily in those topics of “bleeding edge” thought leadership but specifically the implementation of technologies and approaches we have talked about the event for many years.   If I am honest, one of those sessions that surprised me the most was our own.   AOL had the honor of winning an IT Innovation Award from Uptime and as a result the teams responsible for driving our cloud and virtualization platforms were allowed to give a talk about what we did, what the impact was and how it all worked out.   I was surprised because I was not sure how many people would come to this side session and listen to presentation or find the presentation relevant.  Of course I thought it was relevant (We were after all going to get a nifty plaque for the achievement) but to my surprise the room was packed full, ran out of chairs, and had numerous people standing for the presentation.   During the talk we had a good interaction of questions from the audience and after the talk we were inundated with people coming up to specifically dig into more details.  We had many comments around the usefulness of the talk because we were giving real life experiences in making the kinds of changes that we as an industry have been talking about for years.  Our talk and adaption of technology even got a little conversation in some of the Industry press such as Data Center Dynamics.

Another session that got incredible reviews was the presentation by Andrew Stokes of Deutsche Bank who guided the audience through their adoption of 100% free air cooled data center in the middle of New York City.  Again, the technology here was not new (I had built large scale facilities using this in 2007) – but it was the fact that Andrew and the folks at Deutsche Bank actually went out and did something.   Not someone from those building large-scale cloud facilities, not some new experimental type of server infrastructure.  Someone who used this technology servicing IT equipment that everyone uses, in a fairly standard facility who actually went ahead and did something Innovative.  They put into practice something that others have not. Backed back facts, and data, and real life experiences the presentation went off incredibly and was roundly applauded by those I spoke with as one of the most eye-opening presentations of the event.

By listening the audiences, the hallway conversations, and the multitude of networking opportunities throughout the event a pattern started to emerge,  a pattern that reinforced the belief that I was already coming to in my mind.   Despite a myriad of talk on very cool technology, application, and evolving thought leadership innovations – the most popular and most impactful sessions seemed to center on those folks who actually did something, not with the new bleeding edge technologies, but utilizing those recurring themes that have carried from Symposium to Symposium over the years.   Air Side economization?  Not new.   Someone (outside Google, Microsoft, Yahoo, etc) doing it?  Very New-Very Exciting.  It was what I am calling the Innovation of ACTION.  Actually doing those things we have talked about for so long.

While this Innovation of Action had really gotten many people buzzing at the conference there was still a healthy population of people who were downplaying those technologies.  Downplaying their own ability to do those things.    Re-stating the perennial dogmatic chant that these types of things (essentially any new ideas post 2001 in my mind) would never work for their companies.

This got me thinking (and a little upset) about our industry.  If you listen to those general complaints, and combine it with the data that we have been mostly stagnant in adopting these new technologies – we really only have ourselves to blame.   There is a pervasive defeatist attitude amongst a large population of our industry who view anything new with suspicion, or surround it with the fear that it will ultimately take their jobs away.  Even when the technologies or “new things” aren’t even very new any more.  This phenomenon is clearly visible in any conversation around ‘The Cloud’ and its impact on our industry.    The data center professional should be front and center on any conversation on this topic but more often than not self-selects out of the conversation because they view it more as an application thing, or more IT than data center thing.   Which is of course complete bunk.   Listening to those in attendance complain that the ‘Cloud’ is going to take their jobs away, or that only big companies like Google , Amazon, Rackspace, or  Microsoft would ever need them in the future were driving me mad.   As my keynote at Uptime was to be centered around a Cloud survival guide – I had to change my presentation to account for what I was hearing at the conference.

In my talk I tried to focus on what I felt to be emerging camps at the conference.    To the first, I placed a slide prominently featuring Eeyore (from Winnie the Pooh fame) and captured many of the quotes I had heard at the conference referring to how the Cloud, and new technologies were something to be mistrusted rather than an opportunity to help drive the conversation.     I then stated that we as an industry were an industry of donkeys.  That fact seems to be backed up by data.   I have to admit, I was a bit nervous calling a room full of perhaps the most dedicated professionals in our industry a bunch of donkeys – but I always call it like I see it.

I contrasted this with those willing to evolve their thought forward, embrace that Innovation of Action by highlighting the Cloud example of Netflix.   When Netflix moved heavily into the cloud they clearly wanted to evolve past the normal IT environment and build real resiliency into their product.   They did so by creating a rogue process (on purpose) call the Chaos Monkey which randomly shut down processes and wreaked havoc in their environment.   At first the Chaos Monkey was painful, but as they architected around those impacts their environments got stronger.   This was no ordinary IT environment.  This was something similar, but new.  The Chaos Monkey creates Action, results in Action and on the whole moves the ball forward.

Interestingly after my talk I literally have dozens of people come up and admit they had been donkeys and offered to reconnect next year to demonstrate what they had done to evolve their operations.

My challenge to the audience at Uptime, and ultimately my challenge to you the industry is to stop being donkeys.   Lets embrace the Innovation of Action and evolve into our own versions of Chaos Monkeys.    Lets do more to put the technologies and approaches we have talked about for so long into action.    Next Year at Uptime (and across a host of other conferences) lets highlight those things that we are doing.  Lets put our Chaos Monkeys on display.

As you contemplate your own job – whether IT or Data Center professional….Are you a Donkey or Chaos Monkey?

\Mm

Preparing for the Cloud: A Data Center and Operations Survival Guide

image 

This May, I once again have the distinct honor of presenting at the Uptime Institute’s Symposium. This year it will be held in Santa Clara, CA from May 9 through the 12th.  This year my primary topic is entitled ‘Preparing for the Cloud: A Data Center Survival Guide.’   I am really looking forward to this presentation on two fronts.  

First, it will allow me to share some of the challenges, observations, and opportunities I have seen over the last few years and package it up for Data Center Operators and IT professionals in a way that’s truly relevant to how to start preparing for the impact on their production environments. The whole ‘cloud’ industry is now rife with competing definitions, confusing marketing, and a broad spectrum of products and services meant to cure all ills. To your organization’s business leaders the cloud means lower costs, quicker time to market, and an opportunity to streamline IT Operations and reduce or eliminate the need for home-run data center environments. But what is the true impact on the operational environments? What plans do you need to have in place to ensure this kind of move can be successful? Is you organization even ready to make this kind of move? Is the nature of your applications and environments ‘Cloud-Ready? There are some very significant things to keep in mind when looking into this approach and many companies have not thought them all through.  My hope is that this talk will help prepare the professional with the necessary background and questions to ensure they are armed with the correct information to be an asset to the conversation within their organizations.

The second front is to really dig into the types of services available in the market and how to build an internal scorecard to ensure that your organization is approaching the analysis in a true – apples to apples kind of comparison.   So often I have heard horror stories of companies

caught up in the buzz of the Cloud and pursuing devastating cloud strategies that are either far more expensive than what they had to begin with.  The cloud can be a powerful tool and approach to serve the business, but you definitely need to go in with both eyes wide open.

I will try to post some material in the weeks ahead of the event to set the stage for the talk.  As always, If you are planning on attending Symposium this year feel free to reach out to me if you see me walking the halls.  

\Mm

C02K Doubter? Watch the Presidential address today

Are you a Data Center professional who doubts that Carbon legislation is going to happen or that this initiative will never get off the ground?   This afternoon President Obama plans to outline his intention to assess a cost for Carbon consumption at a conference highlighting his economic accomplishments to date.   The backdrop of this of course is the massive oil rig disaster in the Gulf.

As my talk at the Uptime Institute Symposium highlighted this type of legislation will have a big impact on data center and mission critical professionals.  Whether you know it or not, you will be front and center in assisting with the response, collection and reporting required to react to this kind of potential legislation.  In my talk where I questioned the audience in attendance it was quite clear that most of those in the room were vastly ill-prepared and ill-equipped to this kind of effort. 

If passed this type of legislation is going to cause a severe reaction inside organizations to ensure that they are in compliance and likely lead to a huge increase of spending in an effort to collect energy information along with reporting.  For many organizations this will result in significant spending.

image The US House of Representatives has already passed a version of this known as the Waxman Markey bill.   You can bet that there will be a huge amount of pressure to get a Senate version passed and out the door in the coming weeks and months.

This should be a clarion call for data center managers to step up and raise awareness within their organizations about this pending legislation and take a proactive role in establishing a plan for a corporate response.   Take an inventory of your infrastructure and assess what will you need to begin collecting this information?  It might even be wise to get a few quotes to get an idea or ballpark cost of what it might take to bring your organization up to the task.  Its probably better to start doing this now, than to be told by the business to get it done.

\Mm

Reflections on Uptime Symposium 2010 in New York

This week I had the honor to be a keynote Speaker at the Uptime Institute’s Symposium event in New York City.   I also participated in some industry panels which is always tons of fun. However, as a keynote at the first Symposium a few years back it was an interesting experience to come back and see how it has changed and evolved over the intervening years.  This year my talk was about the coming energy regulation and its impact on data centers, and more specifically what data center managers and mission critical facilities professionals could and should be doing to get their companies ready for what I call CO2K.   I know I will get a lot of pushback on the CO2K title, but I think my analogy makes sense.  First companies are generally not aware of the impact that their data centers and energy consumption have, Second most companies are dramatically unprepared and do not have the appropriate tools in place to collect the information, which will of course lead to the third item, lots of reactionary spending to get this technology and software in place.  While Y2K was generally a flop and a lot of noise, if legislation is passed (and lets be clear about the very direct statements the Obama administration has made on this topic) this work will lead to a significant change in reporting and management responsibilities for our industry.

Think we are ready for this legislation?

Brings me back to my first reflection on Symposium this year.   I was joking with Pitt Turner just before I went on stage that I was NOT going to ask the standard three questions I ask before every data center audience.   Lets face it, I thought, that “Shtick” had gotten old, and I have been asking those same three questions for at least that last three years at every conference I have spoken at (which is a lot).  However as I got on stage, talking about the the topic of regulation I had to ask, it was like a hidden burning desire I could not quench.  So there I went, “How many people are measuring for energy consumption and efficiency today?”  “Raise you hand if in your organization, the CIO sees the power bill?”  and then finally “How many people in here today have the appropriate tooling in place to collect and reporting energy usage in their data centers?”  It had to come out.   I saw Pitt shaking his head.  What was more surprising, was the amount of people who had raised their hands on those questions. Why?  About 10% of the audience had raised their hands.  Don’t get me wrong, 10% is about the highest I have seen that number at any event.  But those of you who are uninitiated into the UI Symposium lore, you need to understand something important, Symposium represents the hardest of the hard core data center people.   This is where all of us propeller heads geek it out in mechanical and electrical splendor, we dance and raise the “floor” (data center humor).  This amazing collection of the best of the best had only had a 10% penetration on the monitoring in their environments.   When this regulation comes, its going to hurt.  I think I will do a post at a later time on my talk at Symposium and what you as a professional can do to start raising awareness.  But for now, that was my first big startle point.

My second key observation this year was the amount of people.  Symposium is truly an international event and their were over 900 attendees for the talks, and if memory serves, about 1300 for the exhibition hall.  I had heard that 20 out of the worlds 30 time-zones had representatives at the conference.  It was especially good for one of the key recurring benefits of this event: Networking.   The networking opportunities were first rate and by the looks of the impromptu meetings and hallways conversations this continued to be an a key driver for the events success.  As fun as making new friends is, it was also refreshing to spend some time and quick catch ups with old friends like Dan Costello and Sean Farney from Microsoft, Andrew Fanara, Dr. Bob Sullivan, and a host of others.

My third observation and perhaps the one I was most pleased with with the diversity of thought in the presentations.  Its a fair to say that I have been critical of Uptime for some time by a seemingly droningly dogmatic recurring set of themes and particular bend of thinking.   While those topics were covered, so too were a myriad of what I will call counter-culture topics.  Sure there were still  a couple of the salesy presentations you find at all of these kinds of events, but the diversity of thought and approach this time around was striking.   Many of them addressed larger business issues, the impact, myths, approach to cloud computing, virtualization, and decidedly non-facilities related material affecting our worlds.   This might have something to do with the purchase by the 451 Group and its related Data Center think tank organization Tier 1, but it was amazingly refreshing and they knocked the ball out of the park.

My fourth observation was that the amount of time associated with the presentations was too short.   While I have been known to completely abuse any allotted timeslots in my own talks due to my desire to hear myself talk, I found that many presentations had to end due to time just as things were getting interesting.  Many of the hallways conversations were continuations of those presentations and it would have been better to keep the groups in the presentation halls.  

 

Calvin thumb on noseMy fifth observation revolved around the quantity, penetration and maturation of container and containment products, presentations and services.   When we first went public with the approach when I was at Microsoft the topic was so avant-garde and against the grain of common practices it got quite a reception (mostly negative).  This was followed by quite a few posts (like Stirring Anthills) which got lots of press attention and resulting industry experts stating that containers and containment were never going to work for most people.   If the presentations, products, and services represented at Uptime were any indication of industry adoption and embrace I guess I would have to make a childish gesture with thumb to my nose, wiggle my fingers and say…. Nah Nah .  :)

 

I have to say the event this year was great and I enjoyed my time thoroughly.  A great time and a great job by all. 

\Mm

Open Source Data Center Initiative

There are many in the data center industry that have repeatedly called for change in this community of ours.  Change in technology, change in priorities, Change for the future.  Over the years we have seen those changes come very slowly and while they are starting to move a little faster now, (primarily due to the economic conditions and scrutiny over budgets more-so than a desire to evolve our space) our industry still faces challenges and resistance to forward progress.   There are lots of great ideas, lots of forward thinking, but moving this work to execution and educating business leaders as well as data center professionals to break away from those old stand by accepted norms has not gone well.

That is why I am extremely happy to announce my involvement with the University of Missouri in the launch of a Not-For-Profit Data Center specific organization.   You might have read the formal announcement by Dave Ohara who launched the news via his industry website, GreenM3.   Dave is another of of those industry insiders who has long been perplexed by the lack of movement and initiative we have had on some great ideas and stand outs doing great work.  More importantly, it doesn’t stop there.  We have been able to put together quite a team of industry heavy-weights to get involved in this effort.  Those announcements are forthcoming, and when they do, I think you will get a sense of the type of sea-change this effort could potentially have.

One of the largest challenges we have with regards to data centers is education.   Those of you who follow my blog know that I believe that some engineering and construction firms are incented ‘not to change’ or implementing new approaches.  The cover of complexity allows customers to remain in the dark while innovation is stifled. Those forces who desire to maintain an aura of black box complexity  around this space and repeatedly speak to the arcane arts of building out  data center facilities have been at this a long time.  To them, the interplay of systems requiring one-off monumental temples to technology on every single build is the norm.  Its how you maximize profit, and keep yourself in a profitable position. 

When I discussed this idea briefly with a close industry friend, his first question naturally revolved around how this work would compete with that of the Green Grid, or Uptime Institute, Data Center Pulse, or the other competing industry groups.  Essentially  was this going to be yet another competing though-leadership organization.  The very specific answer to this is no, absolutely not.   

These groups have been out espousing best practices for years.  They have embraced different technologies, they have tried to educate the industry.  They have been pushing for change (for the most part).  They do a great job of highlighting the challenges we face, but for the most part have waited around for universal good will and monetary pressures to make them happen.  It dawned on us that there was another way.   You need to ensure that you build something that gains mindshare, that gets the business leadership attention, that causes a paradigm shift.   As we put the pieces together we realized that the solution had to be credible, technical, and above all have a business case around it.   It seemed to us the parallels to the Open Source movement and the applicability of the approach were a perfect match.

To be clear, this Open Source Data Center Initiative is focused around execution.   Its focused around putting together an open and free engineering framework upon which data center designs, technologies, and the like can be quickly put together and more-over standardize the approaches that both end-users and engineering firms approach the data center industry. 

Imagine if you will a base framework upon which engineering firms, or even individual engineers can propose technologies and designs, specific solution vendors could pitch technologies for inclusion and highlight their effectiveness, more over than all of that it will remove much mystery behind the work that happens in designing facilities and normalize conversations.    

If you think of the Linux movement, and all of those who actively participate in submitting enhancements, features, even pulling together specific build packages for distribution, one could even see such things emerging in the data center engineering realm.   In fact with the myriad of emerging technologies assisting in more energy efficiency, greater densities, differences in approach to economization (air or water), use of containers or non use of containers, its easy to see the potential for this component based design.  

One might think that we are effectively trying to put formal engineering firms out of business with this kind of work.  I would argue that this is definitely not the case.  While it may have the effect of removing some of the extra-profit that results from the current ‘complexity’ factor, this initiative should specifically drive common requirements, and lead to better educated customers, drive specific standards, and result in real world testing and data from the manufacturing community.  Plus, as anyone knows who has ever actually built a data center, the devil is in the localization and details.  Plus as this is an open-source initiative we will not be formally signing the drawings from a professional engineering perspective. 

Manufacturers could submit their technologies, sample application of their solutions, and have those designs plugged into a ‘package’ or ‘RPM’ if I could steal a term from the Redhat Linux nomenclature.  Moreover, we will be able to start driving true visibility of costs both upfront and operating and associate those costs with the set designs with differences and trending from regions around the world.  If its successful, it could be a very good thing.  

We are not naive about this however.  We certainly expect there to be some resistance to this approach out there and in fact some outright negativity from those firms that make the most of the black box complexity components. 

We will have more information on the approach and what it is we are trying to accomplish very soon.  

 

\Mm

Modular Evolution, Uptime Resolution, and Software Revolution

Its a very little known fact but software developers are costing enterprises millions of dollars and I don’t think in many cases either party realizes it.   I am not referring to the actual cost of purchase for the programs and applications or even the resulting support costs.   Those are easily calculated and can be hard bounded by budgets.   But what of the resulting costs of the facility in which it resides?

The Tier System introduced by the Uptime Institute was an important step in our industry in that it gave us a common language or nomenclature in which to actually begin having a dialog on the characteristics of the facilities that were being built. It created formal definitions and classifications from a technical perspective that grouped up redundancy and resiliency targets, and ultimately defined a hierarchy in which to talk about those facilities that were designed to those targets.   For its time it was revolutionary and to a large degree even today the body of work is still relevant. 

There is a lot of criticism that its relevancy is fading fast due to the model’s greatest weakness which resides in its lack of significant treatment of the application.    The basic premise of the Tier System is essentially to take your most restrictive and constrained application requirements (i.e. the one that’s least robust) and augment that resiliency with infrastructure and what I call big iron.   If only 5% of your applications are this restrictive, then the other 95% of your applications which might be able to live with less resiliency will still reside in the castle built for the minority of needs.  But before you you call out an indictment of the Uptime Institute or this “most restrictive” design approach you must first look at your own organization.   The Uptime Institute was coming at this from a purely facilities perspective.  The mysterious workload and wizardry of the application is a world mostly foreign to them.   Ask yourself this question – ‘In my organization, how often does IT and facilities talk to one another around end to end requirements?’  My guess based on asking this question hundreds of times of customers and colleagues ranges between not often to not at all.  But the winds of change are starting to blow.

In fact, I think the general assault on the Tier System really represents a maturing of the industry to look at our problem space more combined wisdom.   I often laughed at the fact that human nature (or at least management human nature) used to hold a belief that a Tier 4 Data Center was better than a Tier 2 Data Center.  Effectively because the number was higher and it was built with more redundancy.   More Redundancy essentially equaled better facility.    A company might not have had the need for that level of physical systems redundancy (if one were to look at it from an application perspective) but Tier 4 was better than Tier 3, therefore we should build the best.   Its not better, just different. 

By the way, that’s not a myth that the design firms and construction firms were all that interested in dispelling either.   Besides Tier 4 having the higher number, and more redundancy, it also cost more to build, required significantly more engineering and took longer to work out the kinks.   So the myth of Tier 4 being the best has propagated for quite a long time.  Ill say it again.  Its not better, its just different.

One of the benefits of the recent economic downturn (there are not many I know), is that the definition of ‘better’ is starting to change.  With Capital budgets frozen or shrinking the willingness of enterprises to re-define ‘better’ is also changing significantly.   Better today means a smarter more economical approach.   This has given rise to the boom in Modular data center approach and its not surprising that this approach begins with what I call an Application level inventory.   

This application level inventory first specifically looks at the make up and resiliency of the software and applications within the data center environments.  Does this application need the level of physical fault tolerance that my Enterprise CRM needs?  Do servers that support testing or internal labs need the same level of redundancy?  This is the right behavior and the one that I would argue should have been used since the beginning.  The Data Center doesn’t drive the software, its the software that drives the Data Center. 

One interesting and good side effect of this is that the enterprise firms are now pushing harder on the software development firms.    They are beginning to ask some very interesting questions that the software providers have never been asked before.    For example, I sat in one meeting where and end customer asked their Financial Systems Application provider a series of questions on the inter-server latency requirements and transaction timeout lengths for data base access of their solution suite.  The reason behind this line of questioning was a setup for the next series of questions.   Once the numbers were provided it became abundantly clear that this application would only truly work from one location, from one data center and could not be redundant across multiple facilities.  This led to questions around the providers intentions to build more geo-diverse and extra facility capabilities into their product.   I am now even seeing these questions in official Requests for Information (RFI’s) and Requests for Proposal (RFPs).   The market is maturing and is starting to ask an important question – why should your sub-million dollar (euro) software application drive 10s of millions of capital investment by me?  Why aren’t you architecting your software to solve this issue.  The power of software can be brought to bear to easily solve this issue, and my money is on the fact this will be a real battlefield in software development in the coming years.

Blending software expertise with operational and facility knowledge will be at the center of a whole new train of software development in my opinion.  One that really doesn’t exist today and given the dollar amounts involved, I believe it will be a very impactful and fruitful line of development as well.    But it has a long way to go.    Most programmers coming out of universities today rarely question the impact of their code outside of the functions they are providing and the number of colleges and universities that teach a holistic approach can be counted on less than one hands worth of fingers world-wide.   But that’s up a finger or two from last year so I am hopeful. 

Regardless, while there will continue to be work on data center technologies at the physical layer, there is a looming body of work yet to be tackled facing the development community.  Companies like Oracle, Microsoft, SAP, and hosts of others will be thrust into the fray to solve these issues as well.   If they fail to adapt to the changing face and economics of the data center, they may just find themselves as an interesting footnote in data center texts of the future.

 

\Mm