Open Source Data Center Initiative

There are many in the data center industry that have repeatedly called for change in this community of ours.  Change in technology, change in priorities, Change for the future.  Over the years we have seen those changes come very slowly and while they are starting to move a little faster now, (primarily due to the economic conditions and scrutiny over budgets more-so than a desire to evolve our space) our industry still faces challenges and resistance to forward progress.   There are lots of great ideas, lots of forward thinking, but moving this work to execution and educating business leaders as well as data center professionals to break away from those old stand by accepted norms has not gone well.

That is why I am extremely happy to announce my involvement with the University of Missouri in the launch of a Not-For-Profit Data Center specific organization.   You might have read the formal announcement by Dave Ohara who launched the news via his industry website, GreenM3.   Dave is another of of those industry insiders who has long been perplexed by the lack of movement and initiative we have had on some great ideas and stand outs doing great work.  More importantly, it doesn’t stop there.  We have been able to put together quite a team of industry heavy-weights to get involved in this effort.  Those announcements are forthcoming, and when they do, I think you will get a sense of the type of sea-change this effort could potentially have.

One of the largest challenges we have with regards to data centers is education.   Those of you who follow my blog know that I believe that some engineering and construction firms are incented ‘not to change’ or implementing new approaches.  The cover of complexity allows customers to remain in the dark while innovation is stifled. Those forces who desire to maintain an aura of black box complexity  around this space and repeatedly speak to the arcane arts of building out  data center facilities have been at this a long time.  To them, the interplay of systems requiring one-off monumental temples to technology on every single build is the norm.  Its how you maximize profit, and keep yourself in a profitable position. 

When I discussed this idea briefly with a close industry friend, his first question naturally revolved around how this work would compete with that of the Green Grid, or Uptime Institute, Data Center Pulse, or the other competing industry groups.  Essentially  was this going to be yet another competing though-leadership organization.  The very specific answer to this is no, absolutely not.   

These groups have been out espousing best practices for years.  They have embraced different technologies, they have tried to educate the industry.  They have been pushing for change (for the most part).  They do a great job of highlighting the challenges we face, but for the most part have waited around for universal good will and monetary pressures to make them happen.  It dawned on us that there was another way.   You need to ensure that you build something that gains mindshare, that gets the business leadership attention, that causes a paradigm shift.   As we put the pieces together we realized that the solution had to be credible, technical, and above all have a business case around it.   It seemed to us the parallels to the Open Source movement and the applicability of the approach were a perfect match.

To be clear, this Open Source Data Center Initiative is focused around execution.   Its focused around putting together an open and free engineering framework upon which data center designs, technologies, and the like can be quickly put together and more-over standardize the approaches that both end-users and engineering firms approach the data center industry. 

Imagine if you will a base framework upon which engineering firms, or even individual engineers can propose technologies and designs, specific solution vendors could pitch technologies for inclusion and highlight their effectiveness, more over than all of that it will remove much mystery behind the work that happens in designing facilities and normalize conversations.    

If you think of the Linux movement, and all of those who actively participate in submitting enhancements, features, even pulling together specific build packages for distribution, one could even see such things emerging in the data center engineering realm.   In fact with the myriad of emerging technologies assisting in more energy efficiency, greater densities, differences in approach to economization (air or water), use of containers or non use of containers, its easy to see the potential for this component based design.  

One might think that we are effectively trying to put formal engineering firms out of business with this kind of work.  I would argue that this is definitely not the case.  While it may have the effect of removing some of the extra-profit that results from the current ‘complexity’ factor, this initiative should specifically drive common requirements, and lead to better educated customers, drive specific standards, and result in real world testing and data from the manufacturing community.  Plus, as anyone knows who has ever actually built a data center, the devil is in the localization and details.  Plus as this is an open-source initiative we will not be formally signing the drawings from a professional engineering perspective. 

Manufacturers could submit their technologies, sample application of their solutions, and have those designs plugged into a ‘package’ or ‘RPM’ if I could steal a term from the Redhat Linux nomenclature.  Moreover, we will be able to start driving true visibility of costs both upfront and operating and associate those costs with the set designs with differences and trending from regions around the world.  If its successful, it could be a very good thing.  

We are not naive about this however.  We certainly expect there to be some resistance to this approach out there and in fact some outright negativity from those firms that make the most of the black box complexity components. 

We will have more information on the approach and what it is we are trying to accomplish very soon.  



Data Center Junk Science: Thermal Shock \ Cooling Shock

I recently performed an interesting exercise where I reviewed typical co-location/hosting/ data center contracts from a variety of firms around the world.    If you ever have a few long plane rides to take and would like an incredible amount of boring legalese documents to review, I still wouldn’t recommend it.  :) 

I did learn quite a bit from going through the exercise but there was one condition that I came across more than a few times.   It is one of those things that I put into my personal category of Data Center Junk Science.   I have a bunch of these things filed away in my brain, but this one is something that not only raises my stupidity meter from a technological perspective it makes me wonder if those that require it have masochistic tendencies.

I am of course referring to a clause for Data Center Thermal Shock and as I discovered its evil, lesser known counterpart “Cooling” Shock.    For those of you who have not encountered this before its a provision between hosting customer and hosting provider (most often required by the customer)  that usually looks something like this:

If the ambient temperature in the data center raises 3 degrees over the course of 10 (sometimes 12, sometimes 15) minutes, the hosting provider will need to remunerate (reimburse) the customer for thermal shock damages experienced by the computer and electronics equipment.  The damages range from flat fees penalties to graduated penalties based on the value of the equipment.

This clause may be rooted in the fundamental belief (and one I subscribe to given many personally witnessed tests and trials) that its not high temperatures that servers do not like, but rather change of temperature.   In my mind this is a basic tenet of where the industry is evolving to with higher operating temperatures in data center environments.    My problem with this clause is more directed at the actual levels, metrics, and duration typically found in this requirement.  It smacks of a technical guy gone wild trying to prove to everyone how smart he or she is, all the while giving some insight into how myopic their viewpoint may be.

First lets take a look at the 3 degree temperature change.  This number ranges anywhere between 3 and 5 degrees in most contracts I reviewed that had them.   The problem here is that even with a strict adherence to the most conservative approach at running and managing data centers today, a 3 to 5 degree delta easily keeps you within even the old ASHRAE recommendations.  If we look at the airflow and temperatures at a Micro-scale within the server itself, the inlet air temperatures are likely to have variations within temperature range depending upon the level of utilization the box might be at.   This ultimately means that a customer who has periods of high compute, might themselves be violating this very clause if even for only a few minutes.

Which brings up the next component which is duration.   Whether you are speaking to 10 minutes or 15 minutes intervals these are nice long leisurely periods of time which could hardly cause a “Shock” to equipment.   Also keep in mind the previous point which is the environment has not even violated the ASHRAE temperature range.   In addition, I would encourage people to actually read the allowed and tested temperatures in which the manufacturers recommend for server operation.   A 3-5 degree swing  in temperature would rarely push a server into an operating temperature range that would violate the range the server has been rated to work in or worse — void the warranty.  

The topic of “Chilled Shock or Cooling Shock” which is the same but having to do with cooling intervals and temperatures is just as ludicrous.  Perhaps even more so!

I got to thinking, maybe, my experiential knowledge might be flawed.  So I went in search of white papers, studies, technical dissertations on the potential impact and failures with these characteristics.   I went looking, and looking, and looking, and ….guess what?   Nothing.   There is no scientific data anywhere that I could find to corroborate this ridiculous clause.   Sure there are some papers regarding running consistently hot and failures related, but in those studies they can easily be balanced against a servers depreciation cycle.

So why would people really require that this clause get added to the contract?  Are they really that draconian about it?   I went and asked a bunch of account managers I know (both from my firm and outside) and asked about those customers who typically ask for it.   The answer I got was surprising, there was a consistent percentage (albeit small) of customers out there that required this in their contracts and pushed so aggressively.  Even more surprising to me was that these were typically folks on the technical side of the house more then the lawyers or business people.  I mean, these are the folks that should be more in tune with logic than say business or legal people who can get bogged down in the letter of the law or dogmatic adherence to how things have been done.  Right?  I guess not.

But this brings up another important point.  Many facilities might experience a chiller failure, or a CRAH failure or some other event which might temporarily have this effect within the facility.    Lets say it happens twice in one year that you would potentially trigger this event for the whole or a portion of your facility (your probably not doing preventative maintenance  – bad you!).  So the contract language around Thermal shock now claims monetary damages.   Based on what?   How are these sums defined?  The contracts I read through had some wild oscillations on damages with different means of calculation, and a whole lot more.   So what is the basis of this damage assessment?   Again there are no studies that says each event takes off .005 minutes of a servers overall life, or anything like that.   So the cost calculations are completely arbitrary and negotiated between provider and customer.  

This is where the true foolishness then comes in.   The providers know that these events, while rare, might happen occasionally.   While the event may be within all other service level agreements, they still might have to award damages.   So what might they do in response?   They increase the costs of course to potentially cover their risk.   It might be in the form of cost per kw, or cost per square foot, and it might even be pretty small or minimal compared to your overall costs.  But in the end, the customer ends up paying more for something that might not happen, and if it does there is no concrete proof it has any real impact on the life of the server or equipment, and really only salves the whim of someone who really failed to do their homework.  If it never happens the hosting provider is happy to take the additional money.