Coming Soon to a Data Center near you, Regulation.

As an industry, we have been talking about it for some time.  Some claimed it would never come and it was just a bunch of fear mongering. Others like me said it was the inevitable outcome of the intensifying focus on energy consumption.   Whether you view this to be a good thing or bad thing its something that you and your company are going to have to start planning for very shortly.  This is no longer a drill.

CRC – its not just a cycle redundancy check

I have been tracking the energy efficiency work being done in the United Kingdom for quite some time and developments in the Carbon Reduction Commitment (CRC).  My recent trip to London afforded me the opportunity to drive significantly harder into the draft and discuss it with a user community (at the Digital Realty Round table event) who will likely be the first impacted by such legislation. For those of you unfamiliar with the initiative let me give a quick overview of the CRC and how it will work. 

The main purpose of the CRC is a mandatory carbon reduction and energy efficiency scheme aimed at changing energy use behaviors and further incent the adoption of technology and infrastructure.  While not specifically aimed at Data Centers (its aimed at everyone) you can see that by its definition Data Centers will be significantly affected.  It was introduced as part of the Climate Change Act 2008.

In effect it is an auction based carbon emissions trading scheme designed to operate under a Cap and Trade mechanism.  While its base claim says that it will be revenue neutral to the government (except of course for penalties resulting from non-compliance), it provides a very handy vehicle for future taxation and revenue.  This is important, because as data center managers you are now placed in a position where you have primary regulatory reporting responsibilities for your company.  No more hiding under the radar, your roles will now be front and center.                             

All organizations including governmental agencies who consume more than 6000 MWh in 2008 are required to participate.  The mechanism is expected to go live in April 2010.  Please keep in mind that this consumption requirement is called out as MWh and not Megawatts.  What’s the difference? Its energy use over time for your whole company.  If you as a data center manager run a 500 kilowatt facility you account for almost 11% of the total energy consumption.  You can bet you will be front and center on that issue. Especially when the proposed introductory price is £12/tCO2 (or $19.48/tCO2).  Its real money.  Again, while not specifically focused on data centers you can see that they will be an active contributor and participant in the process.  For those firms with larger facilities, lets say 5MW of data center space – dont forget to add in your annual average PUE – the data centers will qualify all to themselves.

image 

 

For more information of the CRC you can check out the links below:

While many of you may be reading this and feel poorly for your brothers and sisters in Great Britain while sighing in relief that its not you, keep in mind that there are already other mechanisms being put in place.  The EU has the ETS, and the Obama Administration has been very public about a similar cap and trade program here in the United States.  You can bet that the US and other countries will be closely watching the success and performance of the CRC initiative in the UK. They are likely to model their own versions after the CRC (why invent the wheel over again, when you can just localize to your country or region).  SO it might be a good idea to read through it and start preparing how you and your organization will respond and/or collect.

I would bet that you as a Data Center Manager have not been thinking of this, that your CIO has not thought about this, the head of your facilities group has not thought about this.  First you need to start driving awareness to this issue.    Next we should heed to a call to arms.

One of the items that came out during the Roundtable discussions was how generally disconnected government regulators are to the complexities of the data center.   They want to view Data Centers as big bad energy using boxes that are all the same.  When the differences in what is achievable from small data centers to mega-scale facilities are great.  Achieving PUEs of 1.2x might be achievable for large scale Internet firms who control the entire stack from physical cabling to application development,  banks and financial insitutions are mandated to redundancy requirements which force them to maintain scores of 2.0. 

Someone once decried to me that data centers are actually extremely efficient as they have to integrate themselves into the grid, they generally purchase and procure the most energy efficient technologies, and are incented from an operating budget perspective to keep costs low.  Why would the government go after them before they went after the end users who typically do not have the most energy efficient servers or perhaps the OEMs that manufacture them.  The simple answer is that data centers are easy high energy concentration targets.   Politically going after users is a dicey affair and as such DCs will bear the initial brunt.

As an industry we need to start involving ourselves in educating and representing  the government  and regulatory agencies in our space.   While the Green Grid charter specifically forbids this kind of activity, having a Data Center industry lobby group to ensure dumb things wont happen is a must in my opinion.  

Would love to get your thoughts on that.

/Mm

Dinner and fireworks

Last night I attended my first Digital Realty Round Table Discussion in London and it was a fantastic treat and topper for my trip to the UK.   For those of you not familiar with these events, its an opportunity to discuss the challenges and issues facing the industry in an informal setting.  The events are hosted by Bernard Geoghegan, who is the General Manager for European Region who does a great job  of MC’ing the dinner and ensuring that conversation flows.    As the dinner begins attendees introduce themselves but are not required to mention the firms they work at.   The purpose of this meeting is real unfiltered conversation.  Selling and product positioning is not strictly not allowed, most especially from Digital attendees. 

As I sat around the largest round table in London (literally!) and scanned across the 25 or so attendees I really did not know what to expect.  I was pretty confident that if no-one bothered to offer any conversation points up,  Jim Smith (who also attended) and I could probably find some aspect of technology to argue and debate about.  But It didn’t take long for the fireworks to come out.   In fact the first person to introduce himself also listed out some of the things concerning him and that process flowed on to each participant.   By the time we got around the table of introductions we had healthy list of issues, challenges, and topics to talk about.  So much so, that there was absolutely no hope of getting to all of them.

After introductions I kicked off the conversation by diving into Data Center Management measurements.   Currency per kilowatt.  It was a great conversation with those that agreed that this was a good metric and those that did not.  I am not going to go into the topics we discussed in this post. They ranged from data center metrics, data center industry challenges, PUE, Data Center Tiering, Cloud Services, Managed Services, and a host of others. There is way too much to cover, and each will likely end up being its own post.   Lets just say there was no lack of opinion or fervor behind most topics.    Most interesting to me was the variation and representation of the firms around the table.  While many did not identify their specific firms, they did mention that they worked for a bank, a hosting provider, a large retail chain, etc.   It really highlighted to me how diverse our industry is and the technology applications we need to solve for.   The pervading thought as I left was that the current regulatory attempts to govern this space are going to be downright disastrous or ineffectual unless those agencies began to start reaching out to our industry in specific.   I have a whole post in mind on this, but fair warning – IT IS COMING (its already here), IT WILL AFFECT YOU – and YOU CANNOT IGNORE IT ANY MORE.

More on that to come.   I would strongly suggest that if you havent attended one of these events you think about doing so.   Quite a few of the attendees shared that they learned a great deal through this kind of group therapy.  It was a blast.

 

/Mm

Schneider / Digital New York Speaking Engagement

new-york-city.jpg

Just in case anyone wanted to connect – I wanted to highlight that I will be co-presenting the keynote at the Schneider Symposium at the Millenium Broadway hotel in New York City with Chris Crosby of Digital Realty Trust.  I will also be giving a talk on practical applications of Energy Efficiency and sitting on an Energy Efficiency  panel led by Dan Golding from Tier One Research.   The program kicks off at 8am on Wednesday.   Feel free to stop and say hi!

/Mm

Upcoming Webinar on Modular Data Centers

On June 22, 2009 I will be hosting a free webinar for Digital Realty Trust on Modular Data Center approaches. Its a topic I have some expertise on and quite a bit of passion around.  So I hope to not embarrass myself too thoroughly. If you have interest in the future of where data center construction, application, support and maintenance is going this might be something worth attending.   We will also touch on emerging technologies such as modular IT configuration and IT Containers and their where they might be of benefit.  

As I mentioned its a free event and you can register for it here

Digital actually has some very good taped webinars on a variety of Data Center topics you might find interesting.  The like to their video library can be found at this link.

The official blurb on the talk follows:

Digital Realty Trust would like to invite you to join us for another in our series of informational webinars. The Industrialization of the Datacentersm has given birth to a variety of modular development methods. From PODs to containers, end users are overwhelmed by a number of options and are often unsure of the best method to use for their particular application.

Monday, June 22, 2009

12:00 p.m. – 1:00 p.m. Central

In this webinar Digital Realty Trust will present the various modular alternatives that are available to today’s datacenter customers and the strengths and weaknesses of each. This presentation will also help provide attendees with a clear understanding of the potential uses for each modular approach and which datacenter requirements each is best designed to address.

Space is limited! Click here to reserve your space.

This webinar will be presented by Michael Manos, Senior Vice President of Technical Services at Digital Realty Trust. Mr. Manos is a 16-year veteran in the technology industry and most recently was responsible for the global design, construction, and operations of all of Microsoft’s datacenter facilities.

Hope to see you there!

/Mm

Chiller-Side Chats : Is Power Capping Ready for PrimeTime?

I was very pleased at the great many responses to my data center capacity planning chat.  They came in both public and private notes with more than a healthy population of those centered around my comments on power capping and their potential disagreement on why I don’t think the technology/applications/functionality is 100% there yet. So I decided to throw up an impromptu ad-hoc follow-on chat on Power Capping.  How’s that for service?

What’s your perspective?

In a nutshell my resistance can be summed up and defined in the exploration of two phrases.  The first is ‘prime time’ and how I define it from where I come at the problem from.  The second is the definition of the term ‘data center’ and in what context I am using it as it relates to Power Capping.

I think to adequately address my position I will answer it from the perspective of the three groups that these Chiller Side Chats are aimed at namely, the Facility side, the IT side, and ultimately the business side of the problem. 

Let’s start with the latter phrase : ‘data center’ first.  To the facility manager this term refers to the actual building, room, infrastructure that IT gear sits in.   His definition of Data Center includes things like remote power panels, power whips, power distribution units, Computer Room Air Handlers (CRAHs), generators, and cooling towers.   It all revolves around the distribution and management of power.

From an IT perspective the term is usually represented or thought of in terms of servers, applications, or network capabilities.  It sometimes blends in to include some aspects of the facility definition but only as it relates to servers and equipment.   I have even heard it used to applied to “information” which is even more ethereal.  Its base units could be servers, storage capacity, network capacity and the like.

From a business perspective the term ‘data center’ is usually lumped together to include both IT and facilities but at a very high level.  Where the currency for our previous two groups are technical in nature (power, servers, storage, etc) – the currency for the business side is cold hard cash.   It involves things like OPEX costs, CAPEX costs, and Return on Investment.

So from the very start, one has to ask, which data center are you referring to?  Power Capping is a technical issue, and can be implemented at either of the two technical perspectives.   It also will have an impact on the business aspect but it can also be a barrier to adoption.

We believe these truths to be self-evident

Here are some of the things that I believe to be inalienable truths about data centers today and in some of these probably forever if history is any indication.

  1. Data Centers are heterogeneous in the make up of facilities equipment with different brands of equipment across the functions.
  2. Data Centers are largely heterogeneous in the make up of their servers population, network population, etc.
  3. Data Centers house non-server equipment like routers, switches, tape storage devices and the like.
  4. Data Centers generally have differing designs, redundancy, floor layout, PDU distribution configurations.
  5. Today most racks are unintelligent, those that are not, are vendor specific and/or proprietary-also-Expensive versus bent steel.
  6. Except in a very few cases, there is NO integration between asset management, change management, incident management, problem management systems between IT *AND* facilities systems.

These will be important in a second so mark this spot on the page as it ties into my thoughts on the definition of prime time.  You see, to me in this context, Prime Time means that when a solution is deployed it will actually solve problems and reduce the number of things a Data Center Manager has to do or worry about.   This is important because notice I did not say anything about making something easier.  Sometimes, easier doesn’t solve the problem. 

There is some really incredible work going on at some of the server manufacturers in the area of power capping.   After all they know their products better than anyone.  For gratuitous purposes because he posts and comments here, I refer you to the Eye on Blades blog at HP by Tony Harvey.  On his post responding to the previous Chiller-side chat, he talked up the amazing work that HP is doing and is already available on some G5 boxes and all G6 boxes along with additional functionality available in the blade enclosures. 

Most of the manufacturers are doing a great job here.  The dynamic load stuff is incredibly cool as well.    However, the business side of my brain requires that I state that this level of super-cool wizardry usually comes at additional cost.   Lets compare that with Howard, the every day data center manager who does it today, who from a business perspective is a sunk cost.   Its essentially free.   Additionally, simple things like performing an SNMP poll for power draw on a box (which used to be available in some server products for free) have been removed or can only be accessed through additional operating licenses.  Read as more cost.    So the average business is faced with getting this capability for servers at an additional cost, or make Howard the Data Center manager do it for free and know that his general fear of losing his job if things blow up is a good incentive for doing it right. 

Aside from that, it still has challenges in Truth #2.  Extremely rare is the data center that uses only one server manufacturer.  While its the dream of most server manufacturers, its more common to find DELL Servers, along side HP Servers, alongside Rackable. Add to that fact that even in the same family you are likely to see multiple generations of gear.  Does the business have to buy into the proprietary solutions of each to get the functionality they need for power capping?  Is there an industry standard in Power Capping that ensures we can all live in peace and harmony?  No.  Again that pesky business part of my mind says, cost-cost-cost.  Hey Howard – Go do your normal manual thing.

Now lets tackle Truth #3 from a power capping perspective.   Solving the problem from the server side is only solving part of the problem.   How many network gear manufacturers have power capping features? You would be hard pressed to find a number on one hand.   In a related thought, one of the standard connectivity trends in the industry is top of rack switching.  Essentially for purposes of distribution, a network switch is placed at the top of the rack to handle server connectivity to the network.     Does our proprietary power capping software catch the power draw of that switch?  Any network gear for that matter?  Doubtful.  So while I may have super cool power capping on my servers I am still screwed at the rack layer –where data center managers manage from as one of their base units.   Howard may be able to have some level of Surety that his proprietary server power capping stuff is humming along swimmingly, he still has to do the work manually.  Its definitely simpler for Howard, to get that task done potentially quicker, but we have not actually reduced steps in the process.   Howard is still manually walking the floor.  

Which brings up a good point, Howard the Data Center manager manages by his base unit of rack.  In most data centers, racks can have different server manufacturers, different equipment types (servers, routers, switches, etc), and can even be of different sizes.    While some manufacturers have built state of the art racks specific for their equipment it doesn’t solve the problem.  We have now stumbled upon Truth #5.

Since we have been exploring how current power capping technologies meet at the intersection of IT and facilities it brings up the last point I will touch on regarding tools. I will get there by asking some basic questions as to the operations of a typical data center.  In terms of Operations does your IT asset management system provide for racks as an item of configuration?  Does your data center manager use the same system?  Does your system provide for multiple power variables?  does it track power at all?  Does the rack have power configuration associated with it?  Or does your version of Howard use spreadsheets?  I know where my bet is on your answers.  Tooling has a long way to go in this space.   Facilities vendors are trying to approach it from their perspective, IT tools providers are doing the same, along with tools and mechanisms from equipment manufacturers as well. There are a few tools that have been custom developed to do this kind of thing, but they have been done for use in very specific environments.  We have finally arrived at Power Capping and Truth #6. 

Please don’t get me wrong, I think that ultimately power capping will finally fulfill its great promise and do tremendous wonders.  Its one of those rare areas which will have a very big impact in this industry.   If you have the ability to deploy the vendor specific solutions (which are indeed very good), you should. It will make things a bit easier, even if it doesn’t remove steps.   However I think ultimately in order to have real effect its going to have to compete with the cost of free.   Today this work is done by the data center managers with no apparent additional cost from a business perspective.   If I had some kind of authority I would call for there to be a Standard to be put in place around Power Capping.  Even if its quite minimal it would have a huge impact.   It could be as simple as providing three things.  First provide for free and unfiltered access to an SNMP Mib that allows access to the current power usage information of any IT related device.  Second, provide a Mib, which through the use of a SET command could place a hard upper limit of power usage.  This setting could be read by the box and/or the operating system and start to slow things down or starve resources on the box for a time.  Lastly, the ability to read that same Mib.    This would allow for the poor cheap Howard’s to take advantage of at least simplifying their environments.  tremendously.  It would still provide software and hardware manufacturers to build and charge for the additional and dynamic features they would require. 

\Mm

DC Curmudgeon and Greenbean: The State of the Industry

Ever since I was a little kid I wanted to be a comic strip artist.   So I thought I would take a crack at it and apply it to our Industry.      I have a few more in the works.  Hope you enjoy!

/Mm

 

image

DCC&G1

Chiller-Side Chats: The Capacity Problem

stones.jpg

I recently had a conversation with the CIO of a well respected company who informed me that his “data center people” had completely mismanaged his data center space which was now causing them to look at having to lease additional capacity or more aggressively pursue virtualization to solve the problem.   Furthermore he was powerless to drive and address change as that data center facilities people worked for a different organization.  To top it off it frustrated him to no end that in his mind they simply did not understand IT equipment or technologies being deployed.   Unfortunately its a common refrain that I hear over and over again.  It speaks to the heart of the problems with understanding data center issues in the industry.  

How Data Center Capacity Planning Really works! 

Data Center managers are by their very nature extremely conservative people.   At the root of this conservatism is the understanding that if and when a facility goes down, it is their bottoms on the bottom line.   As such, risk takers are very few and far between in this industry.   I don’t think I would get much argument from most business-side managers, who would readily agree to that in a heart beat.   But before we hang the albatross around the neck of our facilities management brethren lets take a look at some of the challenges they actually have.   

Data Center Capacity Planning is a swirling vortex of science, art, best guess-work, and bad information with a sprinkling of cult of personality for taste.  One would think that is should be a straight numbers and math game, but its not.   First and foremost, the currency of Data Center Capacity Management and Planning is power.  Simple right?  Well we shall see.

Lets start at a blissful moment in time, when the facility is shiny and new. The floor shines from its first cleaning, the VESDA (Very Early Smoke Detection and Alarm) equipment has not yet begun to throw off false positives, and all is right with the world.   The equipment has been fully commissioned and is now ready to address the needs of the business.

Our Data Center Manager is full of hope and optimism. He or she is confident that this time it will be much different than the legacy problems they had to deal with before.  They now have the perfect mix of power and cooling to handle any challenge to be thrown at them.   They are then approached by their good friends in Information Services with their first requirement.   The business has decided to adopt a new application platform which will of course solve all the evils of previous installations.  

Its a brand new day, a new beginning.   The Data Center Manager asks the IT personnel how many servers are associated with this new deployment.   They also ask how much power those servers will draw so that the room can be optimized for this wonderful new solution.   The IT personnel may be using consultants, or maybe they are providing the server specifications themselves.  In advanced cases they may even have standardized the types and classes of servers they use.  How much power?   Well, the nameplate on the server says that each of these bit crunching wonders will draw 300watts a piece.   As this application is bound to be a huge draw on resources, they inform the facilities team that there are approximately 20 machines at 300watts that are going to be deployed. 

The facilities team knows that no machine ever draws its nameplate ratings once ‘in the wilds’ of the data center and therefore for capacity planning purposes they ‘manually’ calculate a 30% reduction into the server deployment numbers.  You see, its not that they don’t trust the IT folks, its just that they generally know better.   So that nets out to a 90 watt reduction per server bringing the “budgeted power allocation” down to 210 watts per server.  This is an important number to keep in mind.  You now have two ratings that you have to deal with.   Nameplate, and Budgeted.  For advanced data center users they may use even more scientific methods of testing to derive their budgeted amount.   For example they may run test software on the server designed to drive the machine to 100% CPU utilization, 100% disk utilization, and the like.   Interestingly even after these rigorous tests, the machine never gets close to nameplate.  Makes you wonder what that rating is even good for, doesn’t it?  Our data center manager doesn’t have that level of sophistication, so he is using a 30% reduction.  Keep these numbers in mind as we move forward. 

The next question is typically are these servers dual or single corded?  Essentially will these machines have redundancy built into the power supplies so in the event of a power loss they might still operate through another source of electricity?  Well as every good business manager, IT professional, and data center manager knows – This is a good thing.  Sure lets make them double corded.

The data center manager, now well armed with information begins building out the racks, pulls the power whips from diverse PDU (power distribution units) to the location of those racks to ensure that the wonders of dual cording can come to full effect.  

The servers arrive, they are installed and in a matter of days the new application is humming along just fine, running into all kinds of user adoption issues, unexpected hick-ups,  budget over-runs, etc.  Okay maybe I am being a bit sarcastic and jaded there but I think it works for many installations.   All in all a successful project right?  I say sure.  But do all parties agree today? tomorrow? 3 years from now?

Lets break this down a bit more on the data center side.   The data center manager has to allocate power out for the deployment.   He has already de-rated the server draw but there is a certain minimum amount of infrastructure he has to deploy regardless.   The power being pulled from those PDUs are taking up valuable slots inside that equipment.   Think of your stereo equipment at home, there are only so many speakers you can connect to your base unit no matter how loud you want it to get.  The data center manager had to make certain decisions based upon the rack configuration.   If we believe that they can squeeze 10 of these new servers into a rack, the data center manager has pulled enough capacity to address 2.1 kilowatts per rack (210watts*10 servers).  With twenty total servers that means he has two racks of 2.1kilowatts of base load.  Sounds easy right? Its just math.  And Mike – you said it was harder than regular math.  You lied.  Did I? Well it turns out that physics is physics and as Scotty from the Enterprise taught us, “You cannot change the laws of Physics, Jim!”  Its likely that the power capacity being allocated to the rack might actually be a bit over the 2.1kilowatts due to the nature of what sized circuits might be required. For example he or she may have only needed enough power for 32 amps of power, but because of those pesky connections he had to pull two 20 amp circuits.  Lets say for the sake of argument that in this case he has to reserve 2.5 kilowatts as a function of the physical infrastructure requirements.  You start to see a little waste right?  Its a little more than one servers expected draw, so you might think its not terribly bad.  As a business manager, your frustrated with that waste but you might be ok with it.  Especially since its a new facility and you have plenty of capacity.

But wait!  Remember that dual cording thing?   Now you have to double the power you are setting aside.  You have to ensure that you have enough power to ensure you can maintain the servers.  Usually this is from another PDU so that you can survive a single PDU failure.  Additionally you need to reserve that each side (each cord) has enough power to failover.  In some cases the total load of the server is divided between the two power supplies, in some cases, power is drawn from the primary with a small trickle of draw from the redundant connection.  If the load is divided between both power supplies you are effectively drawing HALF of the total reserved power.   If its the situation where they draw full load off one, and have a trickle draw off the second power supply, you are actually drawing the correct amount on one leg, and dramatically less than HALF on the second.   Either way the power is allocated and reserved and I bet its more than you thought when we started out this crazy story.  Well hold tight, because its going to get even more complicated in a bit. 

Right now, any data center manager in the world is reading this and screaming at the remedial nature of this post.  This is Facilities Management 101 right after the seminar entitled ‘The EPO button is rarely your friend’ .   In fact, I am probably insulting their intelligence because there are even more subtleties than what I have outlined here.    But my dear facilities comrade,  its not to you I am writing this section to.  Its the business management and IT folks.   With physics being its pesky self combined with some business decisions, you are effectively taking more power down that you initially thought.   Additionally you now have a tax on future capacity as that load associated with physics and redundancy is forever in reserve.  Not to be touched without great efforts if at all.

Dual Cording is not bad. Redundancy is not bad.  Its a business risk, and that’s something you can understand, and in fact, as a business manager its something I would be willing to bet you do every day in your job.  You are weighing the impact of the outage of the business to actual cost.  One can even easily calculate the cost of such a decision by taking proportional allocations of your capital cost from an infrastructure perspective and weigh it against the economic impact of not having certain tools and applications available.   Even when this is done and its well understood, there is a strange phenomena of amnesia that sets in and in a few months/years the same business manager may look at the facility and give the facilities person a hard time for not utilizing all the power.    To Data Center Managers – Having sat as a Data Center Professional for many years – I’m sad to say, you can expect to have this “Reserved” power conversations multiple times with your manager over and over again, especially when things get tight in terms of capacity left.  To business managers, book mark this post and read it about every 6 months or so. 

Taking it up a notch for a moment…

That last section introduced the concept of Reserved Power.  Reserved Power is a concept that sits at the Facility level of Capacity Planning.   When a data center hall is first built out there are three terms and concepts you need to know.  The first is Critical load (sometimes called IT load).  This is the power available to IT and computer equipment in your facility.  The second is called Non-Critical load, which has to do with the amount of power allocated to things like lighting, mechanical systems and your electrical plant, generators, etc.  What I commonly call ‘Back of the house’ or ‘Big Iron’.  The last term is Total load.  Total load is the total amount of power available to the facility and can usually be calculated by adding Critical and Non-Critical loads. 

A Facility is born with all three facets.  You generally cannot have one without the other.  In plan on having a future post called ‘Data Center Metrics for Dummies’ which will explore the interconnection between these.  For now lets keep it really simple.

The wonderful facility we have built has a certain amount of IT gear that it will hold.   Essentially every server we deploy into the facility will subtract from the total amount of Critical Load available for new deployments.  As we deduct the power from the facility we are allocating that capacity out.    In our previous example we deployed two racks at 2.5kilowatts (and essentially reserved capacity for two more for redundancy).   With those two racks we have allocated enough power for 5 kilowatts of real draw and have reserved 10 kilowatts in total. 

Before people get all mad at me, I just want to point out that some people don’t count the dual cording extra because they essentially de-rate the room with the understanding that everything will be dual corded.    I’m keeping it simple for people to understand what’s actually happening.

Ok back to the show – As I mentioned those racks would really only draw 2.1kW each at full load (we have essentially stranded 400watts per rack of potential capacity and combined its almost 800 watts per rack).  As a business we already knew this but we still have to calculate it out and apply our “Budgeted Power” to the room level.   So, across our two racks we have an allocated power of 5 kilowatts, with a budgeted amount of 4.2 kilowatts.

Now here is where our IT friends come into play and make things a bit more difficult.   That wonderful application that was going to solve world hunger for us and was going to be such a beefy application from a resources perspective is not living up to its reputation.    Instead of driving 100% utilization, its sitting down around 8 percent per box.   In fact the estimated world wide server utilization number for  servers sits between 5-14%.  Most server manufacturers have built their boxes in a way where they draw less power at lower utilizations.  Therefore our 210watts per server might be closer to 180watts per server of “Actual Load”.  That’s another 30watts per server.   So while we have allocated 2.5kilowatts, and reserved 2.1kilowatts, we are only drawing 1.8kilowatts of power.   We have two racks, we so double it.  So now we are in a situation where we are not using 700watts per rack or 1.4 kilowatts across our two racks.  Ouch that’s potentially 28% of power wasted!

image

The higher IT and applications drive the utilization rate, the lower amount of waste you will have.   Virtualization can help here, but its not without its own challenges as we will see.

Big Rocks Little Rocks…

Now luckily, as a breed, data center managers are a smart bunch and they have a various of ways to try and reduce this waste.   It goes back to our concept of budgeted or reserved power combined with our “stereo jacks” in the PDU.   As long as we have some extra jacks, the Data Center Manager can return back to our two racks and artificially set a lower power budget per rack.  This time after metering the power for some time he makes the call to artificially limit the racks allocation to 2 kilowatts – he could go to 1.8kilowatts, but remember he is conservative and wants to still give himself some cushion.  He can then deploy new racks or cabinets and pretend that the extra 200 watts to the new racks.  He can continue this until he runs out of power or out of slots on the PDU.   This is a manual process that is physically managed by the facilities manager.   There is an emerging technology called power capping which will allow you to do this in software on a server to server basis which will be hugely impactful in our industry, its just not ready for prime time yet.

This inefficiency in allocation creates strange gaps and holes in data centers.  Its a phenomena I call Big Rocks, Little Rocks, and like everything in this post is somehow tied to physics.

In this case it was my freshman year physics class in college.  The professor was at the front of the class with a bucket full of good sized rocks.   He politely asked if the the bucket was full.   The class of course responded in the affirmative.  He then took out a smaller bucket of pebbles and poured and rigorously sifted and shook the heck out of the larger bucket until every last pebble was emptied into that bucket with the big rocks.  He asked again, “Now is it full?”  The class responded once more in the affirmative and he pulled out a bucket of sand.  He proceeded to re-perform the sifting and shaking, etc and emptied the sand into the bucket.   “Its finally full now right?”   The class shook their heads one last time in the affirmative and he produced a small bucket of water and poured it into the bucket as well.   

That ‘Is the bucket full exercise’ is a lot like the capacity planning that every data center manager eventually has to get very good at.   Those holes in capacity at a rack level or PDU level I spoke out are the spaces for servers and equipment to ultimately fit into.  At first its easy to fit in big rocks, then it gets harder and harder.    You are ultimately left trying to manage to those small spaces of capacity.   Trying to utilize every last bit of energy in the facility.

This can be extremely frustrating to business managers and IT personnel.   Lets say you do a great job of informing the company how much capacity you actually have in your facility, if there is no knowledge of our “rocks” problem you can easily get yourself into trouble.

Lets go back for a second to our example facility.   Time has now passed and our facility is now nearly full.  Out of the 1MW of total capacity we have been very aggressive in managing our holes and still have 100kilowatts of capacity.  The IT personnel have a new application that is database intensive that will draw 80 kilowatts and because the facility manager has done a good job of managing his facility, there is every expectation that it will be just fine.  Until of course they mention that these servers have to be contiguous and close together for performance or even functionality purposes.   The problem of course is that you now have a large rock that you need to try and squeeze into small rock places.  It wont work.  It may actually even force you to either move other infrastructure around in your facility impacting other applications and services, or cause you to get more data center space.    

You see the ‘Is it full exercise’ does not work in reverse.  You cannot fill a bucket with water, then sand,  then pebbles, then rocks.   Again lack of understanding can lead to ill-will or the perception that the data center manager is not doing a good job of managing his facility when in fact they are being very aggressive in that management.    Its something the business side and IT side should understand.

Virtualization Promise and Pitfalls…

Virtualization is a topic unto itself that is pretty vast and interesting, but I did want to point out some key things to think about it.  As you hopefully saw, server utilization has a huge impact on power draw.  The higher the utilization the better performance from a power perspective.   Additionally many server manufacturers have certain power ramps built into their equipment where you might see an incrementally large jump in power consumption from 11 percent to 12 percent for example.  It has to do with throttling of the power consumption I mentioned above. This is a topic that most facility managers have no experience and knowledge of as it has more to do with server design and performance.   If your facility manager is aggressively managing your facility as in the example above and virtualization is introduced, you might find yourself tripping circuits as you drive the utilization higher and it crosses these internal utilization thresholds.  HP has a good paper talking about how this works.  If you pay particular attention to page 14, The lower line is the throttled processor as a function utilization.  The upper line is full speed as a function of utilization and then their dynamic power regulation feature is the one that jumps up to full speed a 60% utilization.  This gives the box performance only at high utilizations.  Its a feature that is turned on by default in HP Servers.  Other manufacturers have similar technologies built into their products as well.  Typically your Facilities people would not be reading such things. Therefore its imperative that when considering virtualization and its impacts – it should be something that the IT folks and Data Center managers should work on jointly.  

I hope this was at least partially valuable out there and hopefully explained some things that may have been considered black box or arcane data center challenges in your mind.   Keep in mind with this series I am trying to educate on all sides the challenges we are facing together.

/Mm