Reflections on Uptime Symposium 2010 in New York

This week I had the honor to be a keynote Speaker at the Uptime Institute’s Symposium event in New York City.   I also participated in some industry panels which is always tons of fun. However, as a keynote at the first Symposium a few years back it was an interesting experience to come back and see how it has changed and evolved over the intervening years.  This year my talk was about the coming energy regulation and its impact on data centers, and more specifically what data center managers and mission critical facilities professionals could and should be doing to get their companies ready for what I call CO2K.   I know I will get a lot of pushback on the CO2K title, but I think my analogy makes sense.  First companies are generally not aware of the impact that their data centers and energy consumption have, Second most companies are dramatically unprepared and do not have the appropriate tools in place to collect the information, which will of course lead to the third item, lots of reactionary spending to get this technology and software in place.  While Y2K was generally a flop and a lot of noise, if legislation is passed (and lets be clear about the very direct statements the Obama administration has made on this topic) this work will lead to a significant change in reporting and management responsibilities for our industry.

Think we are ready for this legislation?

Brings me back to my first reflection on Symposium this year.   I was joking with Pitt Turner just before I went on stage that I was NOT going to ask the standard three questions I ask before every data center audience.   Lets face it, I thought, that “Shtick” had gotten old, and I have been asking those same three questions for at least that last three years at every conference I have spoken at (which is a lot).  However as I got on stage, talking about the the topic of regulation I had to ask, it was like a hidden burning desire I could not quench.  So there I went, “How many people are measuring for energy consumption and efficiency today?”  “Raise you hand if in your organization, the CIO sees the power bill?”  and then finally “How many people in here today have the appropriate tooling in place to collect and reporting energy usage in their data centers?”  It had to come out.   I saw Pitt shaking his head.  What was more surprising, was the amount of people who had raised their hands on those questions. Why?  About 10% of the audience had raised their hands.  Don’t get me wrong, 10% is about the highest I have seen that number at any event.  But those of you who are uninitiated into the UI Symposium lore, you need to understand something important, Symposium represents the hardest of the hard core data center people.   This is where all of us propeller heads geek it out in mechanical and electrical splendor, we dance and raise the “floor” (data center humor).  This amazing collection of the best of the best had only had a 10% penetration on the monitoring in their environments.   When this regulation comes, its going to hurt.  I think I will do a post at a later time on my talk at Symposium and what you as a professional can do to start raising awareness.  But for now, that was my first big startle point.

My second key observation this year was the amount of people.  Symposium is truly an international event and their were over 900 attendees for the talks, and if memory serves, about 1300 for the exhibition hall.  I had heard that 20 out of the worlds 30 time-zones had representatives at the conference.  It was especially good for one of the key recurring benefits of this event: Networking.   The networking opportunities were first rate and by the looks of the impromptu meetings and hallways conversations this continued to be an a key driver for the events success.  As fun as making new friends is, it was also refreshing to spend some time and quick catch ups with old friends like Dan Costello and Sean Farney from Microsoft, Andrew Fanara, Dr. Bob Sullivan, and a host of others.

My third observation and perhaps the one I was most pleased with with the diversity of thought in the presentations.  Its a fair to say that I have been critical of Uptime for some time by a seemingly droningly dogmatic recurring set of themes and particular bend of thinking.   While those topics were covered, so too were a myriad of what I will call counter-culture topics.  Sure there were still  a couple of the salesy presentations you find at all of these kinds of events, but the diversity of thought and approach this time around was striking.   Many of them addressed larger business issues, the impact, myths, approach to cloud computing, virtualization, and decidedly non-facilities related material affecting our worlds.   This might have something to do with the purchase by the 451 Group and its related Data Center think tank organization Tier 1, but it was amazingly refreshing and they knocked the ball out of the park.

My fourth observation was that the amount of time associated with the presentations was too short.   While I have been known to completely abuse any allotted timeslots in my own talks due to my desire to hear myself talk, I found that many presentations had to end due to time just as things were getting interesting.  Many of the hallways conversations were continuations of those presentations and it would have been better to keep the groups in the presentation halls.  

 

Calvin thumb on noseMy fifth observation revolved around the quantity, penetration and maturation of container and containment products, presentations and services.   When we first went public with the approach when I was at Microsoft the topic was so avant-garde and against the grain of common practices it got quite a reception (mostly negative).  This was followed by quite a few posts (like Stirring Anthills) which got lots of press attention and resulting industry experts stating that containers and containment were never going to work for most people.   If the presentations, products, and services represented at Uptime were any indication of industry adoption and embrace I guess I would have to make a childish gesture with thumb to my nose, wiggle my fingers and say…. Nah Nah .  :)

 

I have to say the event this year was great and I enjoyed my time thoroughly.  A great time and a great job by all. 

\Mm

Open Source Data Center Initiative

There are many in the data center industry that have repeatedly called for change in this community of ours.  Change in technology, change in priorities, Change for the future.  Over the years we have seen those changes come very slowly and while they are starting to move a little faster now, (primarily due to the economic conditions and scrutiny over budgets more-so than a desire to evolve our space) our industry still faces challenges and resistance to forward progress.   There are lots of great ideas, lots of forward thinking, but moving this work to execution and educating business leaders as well as data center professionals to break away from those old stand by accepted norms has not gone well.

That is why I am extremely happy to announce my involvement with the University of Missouri in the launch of a Not-For-Profit Data Center specific organization.   You might have read the formal announcement by Dave Ohara who launched the news via his industry website, GreenM3.   Dave is another of of those industry insiders who has long been perplexed by the lack of movement and initiative we have had on some great ideas and stand outs doing great work.  More importantly, it doesn’t stop there.  We have been able to put together quite a team of industry heavy-weights to get involved in this effort.  Those announcements are forthcoming, and when they do, I think you will get a sense of the type of sea-change this effort could potentially have.

One of the largest challenges we have with regards to data centers is education.   Those of you who follow my blog know that I believe that some engineering and construction firms are incented ‘not to change’ or implementing new approaches.  The cover of complexity allows customers to remain in the dark while innovation is stifled. Those forces who desire to maintain an aura of black box complexity  around this space and repeatedly speak to the arcane arts of building out  data center facilities have been at this a long time.  To them, the interplay of systems requiring one-off monumental temples to technology on every single build is the norm.  Its how you maximize profit, and keep yourself in a profitable position. 

When I discussed this idea briefly with a close industry friend, his first question naturally revolved around how this work would compete with that of the Green Grid, or Uptime Institute, Data Center Pulse, or the other competing industry groups.  Essentially  was this going to be yet another competing though-leadership organization.  The very specific answer to this is no, absolutely not.   

These groups have been out espousing best practices for years.  They have embraced different technologies, they have tried to educate the industry.  They have been pushing for change (for the most part).  They do a great job of highlighting the challenges we face, but for the most part have waited around for universal good will and monetary pressures to make them happen.  It dawned on us that there was another way.   You need to ensure that you build something that gains mindshare, that gets the business leadership attention, that causes a paradigm shift.   As we put the pieces together we realized that the solution had to be credible, technical, and above all have a business case around it.   It seemed to us the parallels to the Open Source movement and the applicability of the approach were a perfect match.

To be clear, this Open Source Data Center Initiative is focused around execution.   Its focused around putting together an open and free engineering framework upon which data center designs, technologies, and the like can be quickly put together and more-over standardize the approaches that both end-users and engineering firms approach the data center industry. 

Imagine if you will a base framework upon which engineering firms, or even individual engineers can propose technologies and designs, specific solution vendors could pitch technologies for inclusion and highlight their effectiveness, more over than all of that it will remove much mystery behind the work that happens in designing facilities and normalize conversations.    

If you think of the Linux movement, and all of those who actively participate in submitting enhancements, features, even pulling together specific build packages for distribution, one could even see such things emerging in the data center engineering realm.   In fact with the myriad of emerging technologies assisting in more energy efficiency, greater densities, differences in approach to economization (air or water), use of containers or non use of containers, its easy to see the potential for this component based design.  

One might think that we are effectively trying to put formal engineering firms out of business with this kind of work.  I would argue that this is definitely not the case.  While it may have the effect of removing some of the extra-profit that results from the current ‘complexity’ factor, this initiative should specifically drive common requirements, and lead to better educated customers, drive specific standards, and result in real world testing and data from the manufacturing community.  Plus, as anyone knows who has ever actually built a data center, the devil is in the localization and details.  Plus as this is an open-source initiative we will not be formally signing the drawings from a professional engineering perspective. 

Manufacturers could submit their technologies, sample application of their solutions, and have those designs plugged into a ‘package’ or ‘RPM’ if I could steal a term from the Redhat Linux nomenclature.  Moreover, we will be able to start driving true visibility of costs both upfront and operating and associate those costs with the set designs with differences and trending from regions around the world.  If its successful, it could be a very good thing.  

We are not naive about this however.  We certainly expect there to be some resistance to this approach out there and in fact some outright negativity from those firms that make the most of the black box complexity components. 

We will have more information on the approach and what it is we are trying to accomplish very soon.  

 

\Mm

Modular Evolution, Uptime Resolution, and Software Revolution

Its a very little known fact but software developers are costing enterprises millions of dollars and I don’t think in many cases either party realizes it.   I am not referring to the actual cost of purchase for the programs and applications or even the resulting support costs.   Those are easily calculated and can be hard bounded by budgets.   But what of the resulting costs of the facility in which it resides?

The Tier System introduced by the Uptime Institute was an important step in our industry in that it gave us a common language or nomenclature in which to actually begin having a dialog on the characteristics of the facilities that were being built. It created formal definitions and classifications from a technical perspective that grouped up redundancy and resiliency targets, and ultimately defined a hierarchy in which to talk about those facilities that were designed to those targets.   For its time it was revolutionary and to a large degree even today the body of work is still relevant. 

There is a lot of criticism that its relevancy is fading fast due to the model’s greatest weakness which resides in its lack of significant treatment of the application.    The basic premise of the Tier System is essentially to take your most restrictive and constrained application requirements (i.e. the one that’s least robust) and augment that resiliency with infrastructure and what I call big iron.   If only 5% of your applications are this restrictive, then the other 95% of your applications which might be able to live with less resiliency will still reside in the castle built for the minority of needs.  But before you you call out an indictment of the Uptime Institute or this “most restrictive” design approach you must first look at your own organization.   The Uptime Institute was coming at this from a purely facilities perspective.  The mysterious workload and wizardry of the application is a world mostly foreign to them.   Ask yourself this question – ‘In my organization, how often does IT and facilities talk to one another around end to end requirements?’  My guess based on asking this question hundreds of times of customers and colleagues ranges between not often to not at all.  But the winds of change are starting to blow.

In fact, I think the general assault on the Tier System really represents a maturing of the industry to look at our problem space more combined wisdom.   I often laughed at the fact that human nature (or at least management human nature) used to hold a belief that a Tier 4 Data Center was better than a Tier 2 Data Center.  Effectively because the number was higher and it was built with more redundancy.   More Redundancy essentially equaled better facility.    A company might not have had the need for that level of physical systems redundancy (if one were to look at it from an application perspective) but Tier 4 was better than Tier 3, therefore we should build the best.   Its not better, just different. 

By the way, that’s not a myth that the design firms and construction firms were all that interested in dispelling either.   Besides Tier 4 having the higher number, and more redundancy, it also cost more to build, required significantly more engineering and took longer to work out the kinks.   So the myth of Tier 4 being the best has propagated for quite a long time.  Ill say it again.  Its not better, its just different.

One of the benefits of the recent economic downturn (there are not many I know), is that the definition of ‘better’ is starting to change.  With Capital budgets frozen or shrinking the willingness of enterprises to re-define ‘better’ is also changing significantly.   Better today means a smarter more economical approach.   This has given rise to the boom in Modular data center approach and its not surprising that this approach begins with what I call an Application level inventory.   

This application level inventory first specifically looks at the make up and resiliency of the software and applications within the data center environments.  Does this application need the level of physical fault tolerance that my Enterprise CRM needs?  Do servers that support testing or internal labs need the same level of redundancy?  This is the right behavior and the one that I would argue should have been used since the beginning.  The Data Center doesn’t drive the software, its the software that drives the Data Center. 

One interesting and good side effect of this is that the enterprise firms are now pushing harder on the software development firms.    They are beginning to ask some very interesting questions that the software providers have never been asked before.    For example, I sat in one meeting where and end customer asked their Financial Systems Application provider a series of questions on the inter-server latency requirements and transaction timeout lengths for data base access of their solution suite.  The reason behind this line of questioning was a setup for the next series of questions.   Once the numbers were provided it became abundantly clear that this application would only truly work from one location, from one data center and could not be redundant across multiple facilities.  This led to questions around the providers intentions to build more geo-diverse and extra facility capabilities into their product.   I am now even seeing these questions in official Requests for Information (RFI’s) and Requests for Proposal (RFPs).   The market is maturing and is starting to ask an important question – why should your sub-million dollar (euro) software application drive 10s of millions of capital investment by me?  Why aren’t you architecting your software to solve this issue.  The power of software can be brought to bear to easily solve this issue, and my money is on the fact this will be a real battlefield in software development in the coming years.

Blending software expertise with operational and facility knowledge will be at the center of a whole new train of software development in my opinion.  One that really doesn’t exist today and given the dollar amounts involved, I believe it will be a very impactful and fruitful line of development as well.    But it has a long way to go.    Most programmers coming out of universities today rarely question the impact of their code outside of the functions they are providing and the number of colleges and universities that teach a holistic approach can be counted on less than one hands worth of fingers world-wide.   But that’s up a finger or two from last year so I am hopeful. 

Regardless, while there will continue to be work on data center technologies at the physical layer, there is a looming body of work yet to be tackled facing the development community.  Companies like Oracle, Microsoft, SAP, and hosts of others will be thrust into the fray to solve these issues as well.   If they fail to adapt to the changing face and economics of the data center, they may just find themselves as an interesting footnote in data center texts of the future.

 

\Mm

In disappointment, there is opportunity. . .

I was personally greatly disappointed with the news coming out of last week that the Uptime Institute had branded Microsoft and Google as the enemy to traditional data center operators.  To be truthful, I did not give the reports much credit especially given our long and successful relationship with that organization.  However, when our representatives to the event returned and corroborated the story, I have to admit that I felt more than  a bit let down.

As reported elsewhere, there are some discrepancies in how our mission was portrayed versus the reality of our position.   One of the primary messages of our cloud initiatives is that there is a certain amount of work/information that you will want to be accessed via the cloud, and there is some work/information that you want to keep privately.  Its why we call it SOFTWARE + SERVICES.  There’s quite a few things people just would not feel comfortable running in the cloud.   We are doing this (data center construction and operation)  because the market, competitive forces, and our own research is driving us there.   I did want to address some of the misconceptions coming out of that meeting however:

On PUE, Measurement, and our threat to the IT industry

The comments that Microsoft and Google are the biggest threat to the IT industry and that Microsoft is “making the industry look bad by putting our facilities in areas that would bring the PUE numbers down” are very interesting.  First as mentioned before, please revisit our Software + Services strategy, its kind of hard to be a threat if we are openly acknowledging the need for corporate data centers in our expressed strategy.   I can assure you that we have no intention of making anyone look “bad”, nor do we in any way market our PUE values.  We are not a data center real estate firm and we do not lease out our space where this might even remotely be a factor. 

While Microsoft believes in Economization (both water and air-side), not all of our facilities employ this technology.  In fact, if a criticism does exist its that we believe that its imperative to widen your environmental envelopes as open as you can.  Simply stated – run your facilities hotter!

The fact of the matter is that Microsoft has invested in both technology and software to allow us to run our environments more aggressively than a traditional data center environment.   We understand that certain industries have very specific requirements around the operation of storage of information which drive and dictate certain physical reliability and redundancy needs.   I have been very vocal around getting the Best PUE for your facility.  Our targets are definitely unrealistic for the industry at large but the goal of driving the most efficiency you can out of your facilities is something everyone should be focused on.

It was also mentioned that we do not measure our facilities over time which is patently untrue.   We have years and years worth of measured information for our facilities with multiple measurements per day.  We have been fairly public about this and have produced specifics on numbers (including the Uptime Symposium last year) which makes this somewhat perplexing. 

On Bullying the Industry

If the big cloud players are trying to bully the industry with  money and resources, I guess I have to ask – To what end?  Does this focus on energy efficiency equate to something bad?  Aside from the obvious corporate responsibility of using resources wisely and lowering operating costs, the visibility we are bringing to this space is not inherently bad.  Given the energy constraints we are seeing across the planet, a focus on energy efficiency is a good thing. 

Lets not Overreact, There is yet hope

While many people (external and internal) approached me about pulling out of the Uptime organization entirely or even suggesting that we create a true non-for-profit end user forum, motivated by technology and operations issues alone, I think its more important to stay the course.   As an industry we have so much yet to accomplish.  We are at the beginning of some pretty radical changes in both technology, operations, and software that will define our industry in the coming decades.   Now is not the time to splinter but instead redouble our efforts to work together in the best interests of all involved.

Instead of picking apart the work done by the Green Grid and attacking the PUE metric by and large, I would love to see Uptime and Green Grid working together to give some real guidance.  Instead of calling out that PUE’s of 1.2 are unrealistic for traditional data center operators, would it not be more useful for Uptime and Green Grid to produce PUE targets and ranges associated with each Uptime Tier?   In my mind that would go along way to drive the standardization of reporting and reduce ridiculous marketing claims of PUE.

This industry is blessed with two organizations full of smart people attacking the same problem set.  We will continue our efforts through the Microsoft Data Center Experience (MDX) events, conferences, and white-papers to share what we are doing in the most transparent way possible.

/Mm

Green Grid Data Center Indicator + CADE = Something useful!

There are times when two concepts merge and the result makes something better than the whole.  It is not unlike the old television commercial where two people collide into each other.  One eating a chocolate bar, the other a vat of peanut butter.   The resulting lines are television gold:

“Your chocolate is in my peanut butter!”

“Your peanut butter is on my chocolate!”

“HEY!” (in unison with smiles)

I have been anxiously awaiting for the Green Grid to publish their work on the Data Center Indicator tool.  My good friend Christian Belady and the incredible folks in the Technical workgroups came up with something that made me smile and gave CADE a way to be a viable metric. 

The Data Center Indicator Tool gives you a visual representation across all factors important to operating and measuring a data center.  Its no secret that  I get quite passionate about the need to measure your data center.   The lack of strict and rigorous uniform measurement across the data center industry is one of the biggest tragedies we are waiting to inflict upon ourselves. 

This tool is not necessarily for beginners as it assumes you have a good set of data and active measurement already in place.   However, in terms of quickly identifying trends and understanding your environment, I find it quite unique and interesting.   In fact, many of the same factors represented are rolled up into the CADE metric.  

image

The white paper which has been published on the Green Grid Site (White Paper #15)  Is a great way to have a holistic view at your environment over time and is even suitable for executives not familiar with the intricacies of Data Center or Mission Critical environment facilities.    If one takes the rolled up percentage in CADE and combines it with this type of Graph you have a great KPI, and a mechanism which makes the information actionable.  That dear readers is what any facilities manager can use.

-Mm

Struggling with CADE, McKinsey / Uptime Metric (RE-POST)

percentage.jpg

This is a Re-post of my original blog post on May 5th regarding tortured thoughts around the CADE Data Center Metric put forward by McKinsey. This has relevance to my next post and I am placing it here for your convenience.

 

I guess I should start out this post with the pre-emptive statement that as a key performance indicator I support the use of CADE or metrics that tie both facilities and IT into a single metric.  In fact we have used a similar metric internally at Microsoft.  But the fact is at the end of the day I believe that any such metrics must be useful and actionable.  Maybe its because I have to worry about Operations as well.  Maybe its because I don’t think you roll the total complexity of running a facility with one metric.  In short, I don’t think dictating yet another metric, especially one that doesn’t lend itself to action, is helpful.

As some of you know I recently gave keynote speeches at both DataCenter World and the 2008 Uptime Symposium.  Part of those speeches included a simple query of the audience of how many people are measuring energy efficiency in their facilities.  Now please keep in mind that the combined audience of both engagements numbered between 2000-2400 datacenter professionals.  Arguably these are the 2400 that really view data centers as a serious business within their organizations.  These are folks whose full time jobs are running and supporting data center environments for some of the most important companies around the world.   At each conference less than 10% of them raised their hands.   The fact that many in the industry including Ken Brill at the Uptime Institute, Green Grid, and others have been preaching about measurement for at least the last three years and less than 10% of the industry has accepted this best practice is troublesome.  

Whether you believe in measuring PUE or DCIE, you need to be measuring *something* in order to even get one variable of the CADE metric.  Given this lack of instrumentation and\or process within those firms most motivated to do so speaks in large part of the lack of success this metric is going to have over time.  It therefore follows, if they are not measuring efficiency, they likely don’t understand their total facility utilization (electrically speaking).  The IT side may have an easier way of getting variables for system utilization, but how many firms have host level performance agents in place? 

I want to point out that I am speaking to the industry in general.  Companies like ours who are investing hundreds of millions of dollars get the challenges and requirements in this space.  Its not a nice to have, its a requirement.  But when you extend this to the rest of the industry, there is a massive gap in this space.

Here are some interesting scenarios that when extended to the industry may break or complicate the CADE metric:

  • As you cull out dead servers in your environment, your utilization will drop accordingly and as a result the metric will remain unchanged.  The components of CADE are not independent. Dead servers are removed so that Average server utilization goes up then Data Center Utilization goes down showing proportionally so there is no change and if anything PUE goes up which means the metric may actually go up. Keep in mind that all results are good when kept in context of one another.
  • Hosting Providers like Savvis, Equinix, Dupont Fabros, Digital Realty Trust, and the army of others will be exempt from participating.  They will need to report back of house numbers to the their  customers (effectively PUE).    They do not have access to their customers server information It seems to me that CADE reporting in hosted environments will be difficult if not impossible.  As the design of their facilities will need to play a large part of the calculation this makes effective tracking difficult.  Additionally, overall utilization will be measured at what level?
  • If hosters exempted, then it gives CADE a very limited application or shelf-life.  You have to own the whole problem for it to be effective.  
  • As I mentioned, I think CADE has strong possibilities for those firms who own their entire stack.   But most of the data centers in the world would probably not fall into “all-in” scenario bucket.

I cant help but think we are putting the cart before the horse in this industry.  CADE may be a great way to characterize data center utilization but its completely useless if the industry isnt even measuring the basics.  I have come to the realization that this industry does a wonderful job in telling its members WHAT to do, but lacks to follow-up with the HOW.  CADE is meant for higher level consumption.  Specifically those execs who lack the technical skill-sets to make heads or tails of efficiencies and how they relate to overall operations.   For them, this metric is perfect. But we have a considerable way to go before the industry at large gets there.

Regardless, I strongly suggest each and everyone adopt the take away at Symposium….Measure, measure, measure.