Speaking on Container Solutions and Applications at Interop New York

I have been invited to speak and chair a panel at Interop, New York (November 16-20, 2009) to give a talk exploring the hype and reality surrounding Data Center based containers and Green IT in general.  

image

The goal of the panel discussion will help data center managers evaluate and approach containers by understanding their economics, key considerations and real-life customer examples.  It’s going to be a great conversation.  If you are attending Interop this year I would love to see you there!

 

\Mm

Live Chiller Side Chat Redux

I wanted to take a moment to thank Rich Miller of Data Center Knowledge, and all of those folks that called in and asked and submitted questions today in the Live Chiller Side Chat.   It was incredible fun for me to get a chance to answer questions directly from everyone.   My only regret is that we did not have enough time!

When you have a couple of hundred people logged in, its unrealistic and all but impossible to answer all of the questions.  However, I think Rich did a great job bouncing around to clue into key themes that he saw emerging from the questions.    One thing is for sure is that we will try to do another one of those given the amount of unanswered questions.  I have already been receiving some great ideas on how to possibly structure these moving forward.  Hopefully everyone got some value or insight out of the exercise.  As I warned before the meeting, you may not get the right answer, but you will definitely get my answer.  

One of the topics that we touched on briefly during the call, and went a bit under-discussed was regulation associated with data centers or more correctly, regulation and legislation that will affect our industry.    For those of you who are interested I recently completed an executive primer video on the subject of data center regulation.  The link can be found here:

image

Data Center Regulation Video.

Thanks again for spending your valuable time with me today and hope we can do it again!

\Mm

The Cloud Politic – How Regulation, Taxes, and National Borders are shaping the infrastructure of the cloud

Most people think of ‘the cloud’ as a technical place defined by technology, the innovation of software leveraged across a scale of immense proportions and ultimately a belief that its decisions are guided by some kind of altruistic technical meritocracy.  At some levels that is true on others one needs to remember that the ‘cloud’ is ultimately a business.  Whether you are talking about the Google cloud, the Microsoft cloud, Amazon Cloud, or Tom and Harry’s Cloud Emporium, each is a business that ultimately wants to make money.   It never ceases to amaze me that in a perfectly solid technical or business conversation around the cloud people will begin to wax romantic and lose sight of common sense.  These are very smart technical or business savvy people but for some reason the concept of the cloud has been romanticized into something almost philosophical, a belief system,  something that actually takes on the wispy characteristics that the term actually conjures up.  

When you try to bring them down to the reality the cloud is essentially large industrial buildings full of computers, running applications that have achieved regional or even global geo-diversity and redundancy you place yourself in a tricky place that at best labels you a kill-joy and at worst a Blasphemer.

I have been reminded of late of a topic that I have been meaning to write about. As defined by my introduction above, some may find it profane, others will choose to ignore it as it will cause them to come crashing to the ground.   I am talking about the unseemly and terribly disjointed intersection of Government regulation, Taxes, and the Cloud.   This also loops in “the privacy debate” which is a separate conversation almost all to itself.   I hope to touch on privacy but only as it touches these other aspects.

As many of you know my roles past and present have focused around the actual technical delivery and execution of the cloud.    The place where pure software developers fear to tread.  The world of large scale design, construction and operations specifically targeted at a global infrastructure deployment and its continued existence into perpetuity.   Perpetuity you say?  That sounds a bit to grandiose doesn’t it?  My take is that once you have this kind of infrastructure deployed it will become an integral part of how we as a species will continue to evolve in our communications and our technological advances.  Something this cool is powerful.  Something this cool is a game changer.  Something this cool will never escape the watchful eyes of the world governments and in fact it hasn’t. 

There was a recent article at Data Center Knowledge regarding Microsoft’s decision remove its Azure Cloud platform out of the State of Washington and relocate them (whether virtually or physically) to be run in the state of Texas.  Other articles have highlighted similar conversations with Yahoo and the state of Washington, or Google and the state of  North Carolina.   These decisions all have to do with state level taxes and their potential impact on the upfront capital costs or long term operating costs of the cloud.   You are essentially seeing the beginning of a cat and mouse game that will last for some time on a global basis.  States and governments are currently using their blunt, imprecise instruments of rule (regulations and taxes) to try and regulate something they do not yet understand but know they need to play apart of.   Its no secret that technology is advancing faster than our society can gauge its overall impact or its potential effects and the cloud is no different.

In my career I have been responsible for the creation of at least 3 different site selection programs.  Upon these programs were based the criteria and decisions of where to place cloud and data center infrastructure would reside.  Through example and practice,  I have been able to deconstruct other competitors criteria and their relative weightings at least in comparison to my own and a couple of things jump out very quickly at anyone truly studying this space.   While most people can guess the need for adequate power and communications infrastructure, many are surprised that tax and regulation play such a significant role in even the initial sighting of a facility.   The reason is pure economics over the total lifetime of an installation. 

I cannot tell you how often I have economic development councils or business development firms come to me to tell me about the next ‘great’ data center location.  Rich in both power infrastructure and telecommunications, its proximities to institutions of higher learning, etc.   Indeed there are some really great places that would seem ideal for data centers if one allowed them to dwell in the “romanticized cloud”.   What they fail to note, or understand is that there may be legislation or regulation already on the books, or perhaps legislation currently winding its way through the system that could make it an inhospitable place or at least slightly less welcoming to a data center.    As someone responsible for tens of millions or hundreds of millions, or even billions of dollars worth of investment you find yourself in a role where you are reading and researching legislation often.  Many have noted my commentary on the Carbon Reduction Commitment in the UK, or my most recent talks about the current progress and data center impacts of the Waxman-Markey bill in the US House of Representatives.  You pay attention because you have to pay attention.   Your initial site selection is supremely important because you not only need to look for the “easy stuff” like power and fiber, but you need to look longer term, you need to look at the overall commitment of a region or an area to support this kind of infrastructure.   Very large business decisions are being made against these “bets” so you better get them right.  

To be fair the management infrastructure in many of these cloud companies are learning as they go as well.   Most of these firms are software companies who have now been presented with the dilemma of managing large scale capital assets.  Its no longer about Intellectual Property, its about physical property and there are some significant learning curves associated with that.   Add to the mix that this is whole cloud thing is something entirely new.    

One must also keep in mind that even with the best site selection program and the most robust up front due diligence, people change, governments change, rules change and when that happens it can and will have very large impacts on the cloud.   This is not something cloud providers are ignoring either.  Whether its through their software, through infrastructure, through a more modular approach they are trying to solve for the eventuality that things will change.   Think about the potential impacts from a business perspective.

Lets pretend you own a cloud and have just sunk 100M dollars into a facility to house part of your cloud infrastructure.   You spent lots of money in your site selection and up front due diligence to find the very best place to put a data center.   Everything is going great, after 5 years you have a healthy population of servers in that facility, you have found a model to monetize your service, so things are going great, but then the locale where your data center lives changes the game a bit.   They pass a law that states that servers engaged in the delivery of a service are a taxable entity.  Suddenly that place becomes very inhospitable to your business model.   You now have to worry about what that does to your business.   It could be quite disastrous.   Additionally if you rule that such a law would instantly impact your business negatively, you have the small matter of a 100M asset sitting in a region where you cannot use it.   Again a very bad situation.  So how do you architect around this?  Its a challenge that many people are trying to solve.   Whether you want to face it or not, the ‘Cloud’ will ultimately need to be mobile in its design.  Just like its vapory cousins in the sky, the cloud will need to be on the move, even if its a slow move.  Because just as there are forces looking to regulate and control the cloud, there are also forces in play where locales are interested in attracting and cultivating the cloud.  It will be a cycle that repeats itself over and over again.

So far we have looked at this mostly from a taxation perspective.   But there are other regulatory forces in play.    I will use the example of Canada. The friendly frosty neighbors to the great white north of the United States.  Its safe to say that Canada and US have had historically wonderful relations with one another.   However when one looks through the ‘Cloud’ colored looking glass there are some things that jump out to the fore. 

In response to the Patriot Act legislation after 9-11, the Canadian government became concerned with the rights given to the US government with regards to the seizure of online information.  They in turn passed a series of Safe-Harbor-like laws that stated that no personally identifiable information of Canadian citizens could be housed outside of the Canadian borders.    Other countries have done, or are in process with similar laws.   This means that at least some aspects of the cloud will need to be anchored regionally or within specific countries.    A boat can drift even if its anchored and so must components of the cloud, its infrastructure and design will need to accommodate for this.  This touches on the privacy issue I talked about before.   I don’t want to get into the more esoteric conversations of Information and where its allowed to live and not live, I try to stay grounded in the fact that whether my romantic friends like it or not, this type of thing is going to happen and the cloud will need to adapt.

Its important to note that none of the legislation focuses on ‘the cloud’ or ‘data centers’ just yet.   Just as the Waxman-Markey bill or CRC in the UK doesn’t specifically call out data centers, those laws will have significant impacts on the infrastructure and shape of the cloud itself. 

There is an interesting chess board developing between technology versus regulation.   They are inexorably intertwined with one another and each will shape the form of the other in many ways.   A giant cat an mouse game on a global level.   Almost certainly, this evolution wont be  the most “technically superior” solution.  In fact, these complications will make the cloud a confusing place at times.   If you desired to build your own application using only cloud technology, would you subscribe to a service to allow the cloud providers to handle these complications?  Would you and your application  be liable for regulatory failures in the storage of  Azerbaijani-nationals?  Its going to be an interesting time for the cloud moving forward. 

One can easily imagine personally identifiable information housed in countries of origin, but the technology evolving so that their actions on the web are held elsewhere, perhaps even regionally where the actions take place.  You would see new legislation emerging to potentially combat even that strategy and so the cycle will continue.  Likewise you might see certain types of load compute or transaction work moving around the planet to align with more technically savvy or advantageous locales.  Just as the concept of Follow the Moon has emerged for a potential energy savings strategy to move load around based on the lowest cost energy, it might someday be followed with a program similarly move information or work to more “friendly” locales.     The modularity movement of data center design will likely grow as well trying to reduce the overall exposure the cloud firms have in any given market or region.   

On this last note, I am reminded of one of my previous posts. I am firm in my belief that Data Centers will ultimately become the Sub-Stations of the information utility.  In that evolution they will become more industrial, more commoditized, with more intelligence at the software layer to account for all these complexities.  As my own thoughts and views evolve around this I have come to my own strange epiphany.  

Ultimately the large cloud providers should care less and less about the data centers they live in.  These will be software layer attributes to program against.  Business level modifiers on code distribution.   Data Centers should be immaterial components for the Cloud providers.  Nothing more than containers or folders in which to drop their operational code.  Today they are burning through tremendous amounts of capital believing that these facilities will ultimately give them strategic advantage.   Ultimately these advantages will be fleeting and short-lived.  They will soon find themselves in a place where these facilities themselves will become a drag on their balance sheets or cause them to invest more in these aging assets. 

Please don’t get me wrong, the cloud providers have been instrumental in pushing this lethargic industry into thinking differently and evolving.   For that you need give them appropriate accolades.  At some point however, this is bound to turn into a losing proposition for them.  

How’s that for Blasphemy?

\Mm

At the Intersection of Marketing and Metrics, the traffic lights don’t work.

First let me start out with the fact that the need for datacenter measurement is paramount if this industry is to be able to manage itself effectively.   When I give a talk I usually begin by asking the crowd three basic questions

1) How many in attendance are monitoring and tracking electrical usage?

2) How many in attendance measure datacenter efficiency?

3) How many work for organizations in which the CIO looks at the power bills?

The response to these questions has been abysmally low for years, but I have been delighted by the fact that slowly but surely, the numbers have been rising.  Not in great numbers mind you, but incrementing.  We are approaching a critical time in the development of the data center industry and where it (and the technologies involved) will go.  

To that end there is no doubt that the PUE metric has been instrumental in driving awareness and visibility on the space.  The Green Grid really did a great job in pulling this metric together evangelizing it to the industry.  Despite a host of other potential metrics out there, PUE has captured the industry given its relatively straight forward approach.   But PUE is poised to be a victim of its own success in my opinion unless the industry takes steps to standardizes its use in marketing material and how it is talked about. 

Don’t get me wrong, I am rabidly committed to PUE as a metric and as a guiding tool in our industry.   In fact I have publicly defended the detractors of this metric for years.  So this post is a small plea for sanity. 

These days, I view each and every public statement of PUE with a full heaping shovel-full of skepticism regardless of company or perceived leadership position.   In my mind measurement of your company’s environment and energy efficiency is a pretty personal experience.   I don’t care which metric you use (even if its not PUE) as long as you take a base measurement and consistently measure over time making changes to achieve greater and greater efficiency.   There is no magic pill, no technology, no approach that gives you efficiency nirvana. It is a process that involves technology (both high tech and low tech), process, procedure,  and old fashioned roll up your sleeves operational best practices over time that gets you there. 

With mounting efforts around regulation, internal scrutiny around capital spending, lack of general market inventory and a host of other reasons, the push for efficiency has never been greater and the spotlight on efficiency as a function of “data center product” is in full swing.  Increasingly PUE is moving from the data center professional and facilities groups to the marketing department.  I view this as bad. 

Enter the Marketing Department

In my new role I get visibility to all sorts of interesting things I never got to see in my role managing the infrastructure for a globally ubiquitous cloud roll out.  One of the more interesting items was an RFP issued by a local regional government for a data center requirement.   The RFP had all the normal things you would expect to find in terms of looking for that kind of thing, but there was a caveat that this facility must have a PUE of 1.2.  When questioned around this PUE target, the person in charge stated, if Google and Microsoft are achieving this level we want the same thing and this is becoming the industry standard.   Of course the realization in differences in application make up, legacy systems, or the fact that it would have to also house massive tape libraries (read low power density) and a host of other factors made it impossible for them to really achieve this.   It was then that I started to get an inkling that PUE was starting to get away from its original intention.  

You don’t have to look far to read about the latest company that has broken the new PUE barrier of 1.5 or 1.4 or 1.3 or 1.2 or even 1.1.  Its like the space race.   Except that the claims of achieving those milestones are never really backed up with real data to prove or disprove it.  Its all a bunch of nonsensical bunk.  And its in this nonsensical bunk that we will damn ourselves with those who have absolutely no clue about how this stuff actually works.  Marketing wants the quick bullet points and a vehicle to allow them to show some kind of technological superiority or green badges of honor.  When someone walks up to me at a tradeshow or emails me braggadocios claims of PUE they are unconsciously picking a fight with me and I am always up for the task.

WHICH PUE DO YOU USE?

Lets have an honest, open and frank conversation around this topic shall we?  When someone tells me of the latest greatest PUE they have achieved, or have heard about, my first question is ‘Oh yeah?  Which PUE are they/you using?’.  I love the response I typically get when I ask the question. Eyebrows twist up and a perplexed look takes over their face.   Which PUE?

If you think about it, its a valid question.  Are they looking at Average Annual PUE?  Are they looking at AVERAGE PEAK PUE?  Are they looking at design point PUE?  Are they looking at Annual Average Calculated PUE? Are they looking at Commissioning state PUE?  What is the interval at which they are measuring? Is this the PUE rating they achieved one time at 1:30AM on the coldest night in January?

I sound like an engineer here but there is a vast territory of values between these numbers all of them and none of them may have anything to do with reality.   If you will allow me a bit of role-playing here lets walk through a scenario where we (you and I dear reader) are about to build and commission our first facility.  

We are building out a 1MW facility with a targeted PUE of 1.5.   After successful build out with no problems or hiccups (we are role-playing remember) we begin the commissioning with load banks to simulate load.   During the process of commissioning we have a measured PUE our target of 1.40.  Congratulations we have beaten our design goal! right? We have crossed the 1.5 barrier! Well maybe not.  Lets ask the question…How long did we run the Level 5 commissioning for?  There are some vendors who burn it in over a course of 12 hours.  Some a full day.  Does that 1.40 represent the average of the values collected?  Does it measure the averaged peak?  Was it the lowest value?  What month are we in?  Will it be significantly different in July? January?  May?  Where is the facility located?   The scores over time versus at commissioning will vary significantly over time.  

A few years back when I was at Microsoft, we publicly released the data below for a mature facility at capacity that has been operating and collecting information four years.  We had been tracking PUE or at least the variables used in PUE for that long.  You can see in the chart the variations of PUE.  Keep in mind this chart shows a very concentrated effort to drive efficiency over time.   Even in a mature facility where the load remains mostly constant over time, the PUE has variation and fluctuation.   Add to that the deltas between average, peak and average peak.  Which numbers are you using?

image

(source: GreenM3 blog)

Ok lets say we settle on using just average (its always the lowest number PUE with the exception of a one time measurement).  We want to look good to management right?  If you are a colo company or data center wholesaler you may even give marketing a look-see to see if there is any value in that regard.    We are very proud of ourselves.  There is much back slapping and glad handing as we send our production model out the door.

Just like an automobile our data center depreciates quickly as soon as the wheels hit the street.  Except that with data centers its the PUE that is negatively affected.

The IMPORTANCE OF USE

Our brand new facility is now empty.  The load-banks have been removed, we have pristine white floor space ready to go.   With little to no IT load in our facility we currently have a PUE somewhere between 7 and 500.  Its just math (refer back to how PUE is actually calculated).  So now our PUE will be a function of how quickly we consume the capacity.  But wait, how can our PUE be so high?  We have proof from commissioning that we have created an extremely efficient facility.   Its all in the math.  Its math marketing people don’t like.  It screws with the message.   Small revelation here – Data Centers become more “efficient” the more energy they consume!  Regulations that take PUE into account will need to worry about this troublesome side effect. 

There are lots of interesting things you can do to minimize this extremely high PUE at launch like shutting down CRAH units, removing perf tiles and replacing them with solid tiles, but ultimately your PUE is going to be much much higher regardless.  

Now lets take the actual deployment of IT ramps in new data centers.  In many cases enterprises build data centers to last them over a long period of time. This means that there is little likelihood that your facility will look close to your commissioning numbers (with load banks installed).  Add to the fact that traditional data center construction has you building out all of the capacity from the start.  This essentially means that your PUE is not going to have a great story for quite a bit of time.   Its also why I am high on the modularized approach.  Smaller, more modular units allow you to more efficiently (from cost as well as energy efficiency) grow your facility out.

So if we go back to our marketing friends, our PUE looks nothing like the announcement any more.  Future external audits might highlight this, and we may full under scrutiny of falsely advertising our numbers.  So lets pretend we are trying to do everything correctly and have projected that we will completely fill our facility in 5 years.  

The first year we successfully fill 200kw of load in our facility.  We are right on track.   Except that the 200kw was likely not deployed all at once.  It was deployed over the course of the year.   Which means my end of year PUE number may be something like 3.5 but it was much higher earlier in the year.  If I take my annual average, it certainly wont be 3.5.  It will be much higher.   In fact if I equally distribute the 200kw that first year over 12 months, my PUE looks like this:

image 

That looks nothing like the PUE we advertised does it? Additionally I am not even counting the variability introduced by time.   This is just end of month numbers.   So the frequency will have an impact on this number as well.  The second year of operation our numbers are still quite poor when compared to our initial numbers.

image

Again, if I take annual average PUE for the second year of operation, I am not at my design target nor am I at our commissioned PUE rating.  So how can firms unequivocally state such wonder PUEs?  They cant.  Even this extremely simplistic example doesn’t take into effect that load in the data center moves around based upon utilization, it also almost never achieves the power draw you think it will take.  There are lots of variables here. 

Lets be clear – This is how PUE is supposed to work!  There is nothing wrong with these calcs.  There is nothing wrong with the high values.   It is what it is.  The goal is to drive efficiency in your usage.   Falsely focusing on extremely low numbers that are the result of highly optimized integration between software and infrastructure and making them the desirable targets will do nothing more than place barriers and obstacles in our way later on.  Outsiders looking in want to find simplicity.  They want to find the quick and dirty numbers by which to manage the industry by.   As engineers you know this is a bit more complex.   Marketing efforts and focusing on low PUEs will only damn us later on.

Additionally, if you allow me to put my manager/businessman hat on – there is a law of diminishing return by focusing on lower and lower PUE.  The cost for continued integration and optimization starts losing its overall business value and gains in efficiency are offset by the costs to achieve those gains.  I speak as someone who drove numbers down into that range.   The larger industry would be better served by focusing more on application architecture, machine utilization, virtualization, and like technologies before pushing closer to 1.0. 

So what to do?

I fundamentally believe that this would be an easy thing to correct.   But its completely dependent upon how strong a role the Green Grid wants to play in this.   I feel that the Green Grid has the authority and responsibility to establish guidelines in the formal usage of PUE ratings.  I would posit the following ratings with apologies in advance as I am not a marketing guy who could come up with more clever names:

Design Target PUE (DTP) – This is the PUE rating that theoretically the design should be able to achieve.   I see too many designs that have never manifested physically.  This would be the least “trustworthy” rating until the facility or approach has been built.

Commissioned  Witnessed PUE (CWP) – This is the actual PUE witnessed at the time of commissioning of the facility.  There is a certainty about this rating as it has actually achieved and witnessed.  This would be the rating that most colo providers and wholesalers would need to use as they have little impact or visibility into customer usage.

Annual Average PUE (AAP) – This is what it says it is.  However I think that the Green Grid needs to come up with a minimal standard of frequency (my recommendation is at least 3 times a day, data collection) to establish this rating.  You also couldn’t publish this number without a full years worth of data.

Annual Average Peak PUE (APP) – My preference would be to use this as its a value that actually matters to the ongoing operation of the facility.  When you combine with the Operations challenge of managing power within a facility, you need to account for peaks more carefully especially as you approach the end capacity of the space you are deploying.    Again hard frequencies need to be established along with a full years worth of data here as well.

I think this would greatly cut back on ridiculous claims or at least get closer to a “truth in advertising’ position.  It would also allow for outside agencies to come in and audit those claims over time.  You could easily see extensions to the ISO14k and ISO27K and other audit certifications to test for it.   Additionally it gives the outsiders a peak at the complexity at the space and allows for smarter mechanics that drive for greater efficiency (how about a 10% APP reduction target per year instead). 

As the Green Grid is a consortium of different companies (some of whom are likely to want to keep the fuzziness around PUE for their own gains) it will be interesting to see if they step into better controlling the monster we have unleashed.  

Lets re-claim PUE and Metrics from the Marketing People. 

\Mm

Schneider / Digital New York Speaking Engagement

new-york-city.jpg

Just in case anyone wanted to connect – I wanted to highlight that I will be co-presenting the keynote at the Schneider Symposium at the Millenium Broadway hotel in New York City with Chris Crosby of Digital Realty Trust.  I will also be giving a talk on practical applications of Energy Efficiency and sitting on an Energy Efficiency  panel led by Dan Golding from Tier One Research.   The program kicks off at 8am on Wednesday.   Feel free to stop and say hi!

/Mm

Upcoming Webinar on Modular Data Centers

On June 22, 2009 I will be hosting a free webinar for Digital Realty Trust on Modular Data Center approaches. Its a topic I have some expertise on and quite a bit of passion around.  So I hope to not embarrass myself too thoroughly. If you have interest in the future of where data center construction, application, support and maintenance is going this might be something worth attending.   We will also touch on emerging technologies such as modular IT configuration and IT Containers and their where they might be of benefit.  

As I mentioned its a free event and you can register for it here

Digital actually has some very good taped webinars on a variety of Data Center topics you might find interesting.  The like to their video library can be found at this link.

The official blurb on the talk follows:

Digital Realty Trust would like to invite you to join us for another in our series of informational webinars. The Industrialization of the Datacentersm has given birth to a variety of modular development methods. From PODs to containers, end users are overwhelmed by a number of options and are often unsure of the best method to use for their particular application.

Monday, June 22, 2009

12:00 p.m. – 1:00 p.m. Central

In this webinar Digital Realty Trust will present the various modular alternatives that are available to today’s datacenter customers and the strengths and weaknesses of each. This presentation will also help provide attendees with a clear understanding of the potential uses for each modular approach and which datacenter requirements each is best designed to address.

Space is limited! Click here to reserve your space.

This webinar will be presented by Michael Manos, Senior Vice President of Technical Services at Digital Realty Trust. Mr. Manos is a 16-year veteran in the technology industry and most recently was responsible for the global design, construction, and operations of all of Microsoft’s datacenter facilities.

Hope to see you there!

/Mm

Stirring Anthills – My Response to the Recent Computer World Article

clip_image001

 

** THIS IS A RE-POST From my former BLOG Site, saving here for continuity and posterity **

When one inserts the stick of challenge and change into the anthill of conventional and dogmatic thinking they are bound to stir up a commotion.

That is exactly what I thought when I read the recent Computerworld article by Eric Lai on containers as a data center technology.  The article found here, outlines six reasons why containers won’t work and asks if Microsoft is listening.   Personally, it was an intensely humorous article, albeit not really unexpected.  My first response was "only six"?  You only found six reasons why it won’t work?  Internally we thought of a whole lot more than that when the concept first appeared on our drawing boards. 

My Research and Engineering team is challenged with vetting technologies for applicability, efficiency, flexibility, longevity, and perhaps most importantly — fiscal viability.   You see, as a business, we are not into investing in solutions that are going to have a net effect of adding cost for costs sake.    Every idea is painstakingly researched, prototyped, and piloted.  I can tell you one thing, the internal push-backs on the idea numbered much more than six and the biggest opponent (my team will tell you) was me!

The true value of any engineering organization is to give different ideas a chance to mature and materialize.  The Research and Engineering teams were tasked with making sure this solution had solid legs, saved money, gave us the scale, and ultimately was something we felt would add significant value to our program.  I can assure you the amount of math, modeling, and research that went into this effort was pretty significant.  The article contends we are bringing a programmer’s approach to a mechanical engineer’s problem.  I am fairly certain that my team of professional and certified engineers took some offense to that, as would Christian Belady who has conducted extensive research and metrics for the data center industry. Regardless, I think through the various keynote addresses we’ve participated in over the last few months we tried to make the point that containers are not for everyone.   They are addressing a very specific requirement for properties that can afford a different operating environment.  We are using them for rapid and standard deployment at a level the average IT shop does not need or tools to address. 

Those who know me best know that I enjoy a good tussle and it probably has to do with growing up on the south side of Chicago.  My team calls me ornery, I prefer "critical thought combatant."   So I decided I would try and take on the "experts" and the points in the article myself with a small rebuttal posted here:

Challenge 1: Russian Doll Like Nesting servers on racks in containers and lead to more moreness.

Huh?  This challenge has to do with the perceived challenges on the infrastructure side of the house, and complexity of managing such infrastructure in this configuration.   The primary technical challenge in this part is harmonics.   Harmonics can be solved in a multitude of ways, and as accurately quoted is solvable.  Many manufacturers have solutions to fix harmonics issues, and I can assure you this got a pretty heavy degree of technical review.   Most of these solutions are not very expensive and in some cases are included at no cost.   We have several large facilities, and I would like to think we have built up quite a stable of understanding and knowledge in running these types of facilities.  From a ROI perspective, we have that covered as well.   The economics of cost and use in containers (depending upon application, size, etc.) can be as high as 20% over conventional data centers.   These same metrics and savings have been discovered by others in the industry.  The larger question is if containers are a right-fit for you.  Some can answer yes, others no. After intensive research and investigation, the answer was yes for Microsoft.

Challenge 2: Containers are not as Plug and Play as they may seem.

The first real challenge in this section is about shipment of gear and that it would be a monumental task for us to determine or provide verification of functionality.   We deploy tens of thousands of servers per month. As I have publicly talked about, we moved from individual servers as a base unit, to entire racks as a scale unit, to a container of racks.   The process of determining functionality is incredibly simple to do.  You can ask any network, Unix, or Microsoft professional on just how easy this is, but let’s just say it’s a very small step in our "container commissioning and startup" process.  

The next challenge in this section is truly off base. .   The expert is quoted that the "plug and play" aspect of containers is itself put in jeopardy due to the single connection to the wall for power, network, etc.  One can envision a container with a long electrical extension cord.  I won’t disclose some of our "secret sauce" here, but a standard 110V extension cord just won’t cut it.  You would need a mighty big shoe size to trip over and unplug one of these containers. Bottom line is that connections this large require electricians for installation or uninstall. I am confident we are in no danger of falling prey to this hazard. 

However, I can say that regardless of the infrastructure technology the point made about thousands of machines going dark at one time could happen.  Although our facilities have been designed around our "Fail Small Design" created by my Research and Engineering group, outages can always happen.  As a result, and being a software company, we have been able to build our applications in such a way where the loss of server/compute capacity never takes the application completely offline.  It’s called application geo-diversity.  Our applications live in and across our data center footprint. By putting redundancy in the applications, physical redundancy is not needed.  This is an important point, and one that scares many "experts."   Today, there is a huge need for experts who understand the interplay of electrical and mechanical systems.  Folks who make a good living by driving Business Continuity and Disaster Recovery efforts at the infrastructure level.   If your applications could survive whole facility outages would you invest in that kind of redundancy?  If your applications were naturally geo-diversified would you need a specific DR/BCP Plan?   Now not all of our properties are there yet, but you can rest assured we have achieved that across a majority of our footprint.  This kind of thing is bound to make some people nervous.   But fear not IT and DC warriors, these challenges are being tested and worked out in the cloud computing space, and it still has some time before it makes its way into the applications present in a traditional enterprise data center.

As a result we don’t need to put many of our applications and infrastructure on generator backup.  To quote the article :

"Few data centers dare to make that choice, said Jeff Biggs, senior vice president of operations and engineering for data center operator Peak 10 Inc., despite the average North American power uptime of 99.98%. "That works out to be about 17 seconds a day," said Biggs, who oversees 12 data centers in southeastern states. "The problem is that you don’t get to pick those 17 seconds."

He is exactly right. I guess two points I would highlight here are: the industry has some interesting technologies called battery and rotary UPS’ that can easily ride through 17 seconds if required, and the larger point is, we truly do not care.   Look, many industries like the financial and others have some very specific guidelines around redundancy and reliability.   This drives tens of millions to hundreds of millions of extra cost per facility.   The cloud approach eliminates this requirement and draws it up to the application. 

Challenge 3: Containers leave you less, not more, agile.

I have to be honest; this argument is one that threw me for a loop at first.   My initial thought upon reading the challenge was, "Sure, building out large raised floor areas to a very specific power density is ultimately more flexible than dropping a container in a building, where density and server performance could be interchanged at a power total consumption level."   NOT!  I can’t tell you how many data centers I have walked through with eight-foot, 12-foot, or greater aisles between rack rows because the power densities per rack were consuming more floor space.   The fact is at the end of the day your total power consumption level is what matters.   But as I read on, the actual hurdles listed had nothing to do with this aspect of the facility.

The hurdles revolved around people, opportunity cost around lost servers, and some strange notion about server refresh being tied to the price of diesel. A couple of key facts:

· We have invested in huge amounts of automation in how we run and operate.   The fact is that even at 35 people across seven days a week, I believe we are still fat and we could drive this down even more.   This is running thin, its running smart.  

· With the proper maintenance program in place, with professionals running your facility, with a host of tools to automate much of the tasks in the facility itself, with complete ownership of both the IT and the Facilities space you can do wonders.  This is not some recent magic that we cooked up in our witches’ brew; this is how we have been running for almost four years! 

In my first address internally at Microsoft I put forth my own challenge to the team.   In effect, I outlined how data centers were the factories of the 21st century and that like it or not we were all modern day equivalents of those who experienced the industrial revolution.  Much like factories (bit factories I called them), our goal was to automate everything we do…in effect bring in the robots to continue the analogy.  If the assembled team felt their value was in wrench turning they would have a limited career growth within the group, if they up-leveled themselves and put an eye towards automating the tasks their value would be compounded.  In that time some people have left for precisely that reason.   Deploying tens of thousands of machines per month is not sustainable to do with humans in the traditional way.  Both in the front of the house (servers,network gear, etc) and the back of the house (facilities).   It’s a tough message but one I won’t shy away from.  I have one of the finest teams on the planet in running our facilities.   It’s a fact, automation is key. 

Around opportunity cost of failed machines in a container from a power perspective, there are ultimately two scenarios here.   One is that the server has failed hard and is dead in the container.  In that scenario, the server is not drawing power anyway and while the container itself may be drawing less power than it could, there is not necessarily an "efficiency" hit.   The other scenario is that the machine dies in some half-state or loses a drive or similar component.   In this scenario you may be drawing energy that is not producing "work".  That’s a far more serious problem as we think about overall work efficiency in our data centers.  We have ways through our tools to mitigate this by either killing the machine remotely, or ensuring that we prune that server’s power by killing it at an infrastructure level.   I won’t go into the details here, but we believe efficiency is the high order bit.   Do we potentially strand power in this scenario?  Perhaps. But as mentioned in the article, if the failure rate is too high, or the economics of the stranding begin to impact the overall performance of the facility, we can always swap the container out with a new one and instantly regain that power.   We can do this significantly more easily than a traditional data center could because I don’t have to move servers or racks of equipment around in the data center(i.e. more flexible).   One thing to keep in mind is that all of our data center professionals are measured by the overall uptime of their facility, the overall utilization of the facility (as measured by power), and the overall efficiency of their facility (again as measured by power).  There is no data center manager in my organization who wants to be viewed as lacking in these areas and they give these areas intense scrutiny.  Why?  When your annual commitments are tied to these metrics, you tend to pay attention to them. 

The last hurdle here revolves around the life expectancy of a server and technology refresh change rates and somehow the price of diesel and green-ness.

"Intel is trying to get more and more power efficient with their chips," Biggs said. "And we’ll be switching to solid-state drives for servers in a couple of years. That’s going to change the power paradigm altogether." But replacing a container after a year or two when a fraction of the servers are actually broken "doesn’t seem to be a real green approach, when diesel costs $3.70 a gallon," Svenkeson said.

Clear as mud to me.  I am pretty sure the "price of diesel" in getting the containers to me is included in the price of the containers.  I don’t see a separate diesel charge.  In fact, I would argue that "shipping around 2000 servers individually" would ultimately be less green or (at least in travel costs alone) a push.   In fact, if we dwell a moment longer on the "green-ness" factor, there is something to be said for the container in that the box it arrives in is the box I connect to my infrastructure.   What happens to all the foam product and cardboard with 2000 individual servers?  Regardless, we recycle all of our servers.  We don’t just "throw them away".On the technology refresh side of the hurdle, I will put on my business hat for a second.  Frankly, I don’t know too many people who depreciate server equipment less than three years.  Those who do, typically depreciate over one year.  But having given talks at Uptime and AFCOM in the last month the comment lament across the industry was that people were keeping servers (albeit power inefficient servers) well passed their useful life because they were "free".   Technology refresh IS a real factor for us, and if anything this approach allows us to adopt new technologies faster.   I get to upgrade a whole container’s worth of equipment to the best performance and highest efficiency when I do refresh and best of all there is minimal "labor" to accomplish it.  I would also like to point out that containers are not the only technology direction we have.  We solve the problems with the best solution.  Containers are just one tool in our tool belt.   In my personal experience, the Data Center industry often falls prey to the old adage of “if your only tool is a hammer then every problem is a nail syndrome.”

Challenge 4: Containers are temporary, not a long term solution.

Well I still won’t talk about who is in the running for our container builds, but I will talk to the challenges put forth here.   Please keep in mind that Microsoft is not a traditional "hoster".  We are an end user.  We control all aspects of construction, server deployments and applications that go into our facilities.  Hosting companies do not.   This section challenges that while we are in a growth mode now, it won’t last forever, therefore making it temporary. The main point that everyone seems to overlook is the container is a scale unit for us.  Not a technology solution for incremental capacity, or providing capacity necessarily in remote regions.   If I deploy 10 containers in a data center, and each container holds 2000 servers, that’s 20,000 servers.  When those servers are end of life, I remove 10 containers and replace them with 10 more.   Maybe those new models have 3000 servers per container due to continuing energy efficiency gains.   What’s the alternative?  How people intensive do you think un-racking 20000 servers would be followed by racking 20000 more?   Bottom line here is that containers are our scale unit, not an end technology solution.   It’s a very important distinction that seems lost in multiple conversations.  Hosting Companies don’t own the gear inside them, users do. It’s unlikely  they will ever experience this kind of challenge or need.  The rest of my points are accurately reflected in the article. 

Challenge 5: Containers don’t make a data center Greener

This section has nothing to do with containers.   This has to do with facility design.  While containers may be able to take advantage of the various cooling mechanisms available in the facility the statement is effectively correct that "containers" don’t make a data center greener.   There are some minor aspects of "greener" that I mentioned previously around shipping materials, etc, but the real "green data center" is in the overall energy use efficiency of the building.

I was frankly shocked at some of the statements in this section:

An airside economizer, explained Svenkeson, is a fancy term for "cutting a hole in the wall and putting in a big fan to suck in the cold air." Ninety percent more efficient than air conditioning, airside economizers sound like a miracle of Mother Nature, right?  Except that they aren’t. For one, they don’t work — or work well, anyway — during the winter, when air temperature is below freezing. Letting that cold, dry air simply blow in would immediately lead to a huge buildup of static electricity, which is lethal to servers, Svenkeson said.

Say what?  Airside economization is a bit more than that.  I am fairly certain that they do work and there are working examples across the planet.   Do you need to have a facility-level understanding of when to use and when not to use them?  Sure.   Regardless all the challenges listed here can be easily overcome.   Site selection also plays a big role. Our site selection and localization of design decides which packages we deploy.   To some degree, I feel this whole argument falls into another one of the religious wars on-going in the data center industry.   AC vs. DC, liquid cooled vs. air cooled, etc.  Is water-side economization effective? Yes.  Is it energy efficient? No.  Not at least when compared to air economization in a location tailor made for it.  If you can get away with cooling from the outside and you don’t have to chill any water (which takes energy) then inherently it’s more efficient in its use of energy.  Look, the fact of the matter is we have both horses in the race.  It’s about being pragmatic and intelligent about when and where to use which technology.  

Some other interesting bits for me to comment on:

Even with cutting-edge cooling systems, it still takes a watt of electricity to a cool a server for every watt spent to power it, estimated Svenkeson. "It’s quite astonishing the amount of energy you need," Svenkeson said. Or as Emcor’s Baker put it, "With every 19-inch rack, you’re running something like 40,000 watts. How hot is that? Go and turn your oven on."

I would strongly suggest a quick research into the data that Green Grid and Uptime have on this subject.   Worldwide PUE metrics (or DCiE if you like efficiency numbers better) would show significant variation in the one for one metric.   Some facilities reach a PUE of 1.2 or 80% efficient at certain times of the year or in certain locations.   Additionally the comment that every 19inch rack draws 40kw is outright wrong.  Worldwide averages show that racks are somewhere between 4kw and 6kw.  In special circumstances, densities approach this number, but as an average number it is fantastically high. 

But with Microsoft building three electrical substations on-site sucking down a total of 198 megawatts, or enough to power almost 200,000 homes, green becomes a relative term, others say. "People talk about making data centers green. There’s nothing green about them. They drink electricity and belch heat," Biggs said. "Doing this in pods is not going to turn this into a miracle."

I won’t publicly comment on the specific size of the substation, but would kindly point someone interested in the subject to substation design best practices and sizing.  How you design and accommodate a substation for things like maintenance, configuration and much more is an interesting topic in itself.  I won’t argue that the facility isn’t large by any standard; I’m just saying there is complexity one needs to look into there.   Yes, data centers consume energy, being "green" assumes you are doing everything you can to ensure every last watt is being used for some useful product of work.  That’s our mission. 

Challenge 6: Containers are a programmers approach to a mechanical engineer’s problem.

As I mentioned before, a host of professional engineers that work for me just sat up and coughed. I especially liked:

"I think IT guys look at how much faster we can move data and think this can also happen in the real world of electromechanics," Baker said. Another is that techies, unfamiliar with and perhaps even a little afraid of electricity and cooling issues, want something that will make those factors easier to control, or if possible a nonproblem. Containers seem to offer that. "These guys understand computing, of course, as well as communications," Svenkeson said. "But they just don’t seem to be able to maintain a staff that is competent in electrical and mechanical infrastructure. They don’t know how that stuff works."

I can assure you that outside of my metrics and reporting tool developers, I have absolutely no software developers working for me.   I own IT and facilities operations.   We understand the problems, we understand the physics, we understand quite a bit. Our staff has expertise with backgrounds as far ranging as running facilities on nuclear submarines to facilities systems for space going systems.  We have more than a bit of expertise here. With regards to the comment that we are unable to maintain a staff that is competent, the folks responsible for managing the facility have had a zero percent attrition rate over the last four years.  I would easily put my team up against anyone in the industry. 

I get quite touchy when people start talking negatively about my team and their skill-sets, especially when they make blind assumptions.  The fact of the matter is that due to the increasing visibility around data centers the IT and the Facilities sides of the house better start working together to solve the larger challenges in this space.  I see it and hear it at every industry event.  The us vs. them between IT and facilities; neither realizing that this approach spells doom for them both.  It’s about time somebody challenged something in this industry.  We have already seen that left to its own devices technological advancement in data centers has by and large stood still for the last two decades.  As Einstein said, "We can’t solve problems by using the same kind of thinking we used when we created them."

Ultimately, containers are but the first step in a journey which we intend to shake the industry up with.  If the thought process around containers scares you then, the innovations, technology advances and challenges currently in various states of thought, pilot and implementation will be downright terrifying.  I guess in short, you should prepare for a vigorous stirring of the anthill.