I was very pleased at the great many responses to my data center capacity planning chat. They came in both public and private notes with more than a healthy population of those centered around my comments on power capping and their potential disagreement on why I don’t think the technology/applications/functionality is 100% there yet. So I decided to throw up an impromptu ad-hoc follow-on chat on Power Capping. How’s that for service?
What’s your perspective?
In a nutshell my resistance can be summed up and defined in the exploration of two phrases. The first is ‘prime time’ and how I define it from where I come at the problem from. The second is the definition of the term ‘data center’ and in what context I am using it as it relates to Power Capping.
I think to adequately address my position I will answer it from the perspective of the three groups that these Chiller Side Chats are aimed at namely, the Facility side, the IT side, and ultimately the business side of the problem.
Let’s start with the latter phrase : ‘data center’ first. To the facility manager this term refers to the actual building, room, infrastructure that IT gear sits in. His definition of Data Center includes things like remote power panels, power whips, power distribution units, Computer Room Air Handlers (CRAHs), generators, and cooling towers. It all revolves around the distribution and management of power.
From an IT perspective the term is usually represented or thought of in terms of servers, applications, or network capabilities. It sometimes blends in to include some aspects of the facility definition but only as it relates to servers and equipment. I have even heard it used to applied to “information” which is even more ethereal. Its base units could be servers, storage capacity, network capacity and the like.
From a business perspective the term ‘data center’ is usually lumped together to include both IT and facilities but at a very high level. Where the currency for our previous two groups are technical in nature (power, servers, storage, etc) – the currency for the business side is cold hard cash. It involves things like OPEX costs, CAPEX costs, and Return on Investment.
So from the very start, one has to ask, which data center are you referring to? Power Capping is a technical issue, and can be implemented at either of the two technical perspectives. It also will have an impact on the business aspect but it can also be a barrier to adoption.
We believe these truths to be self-evident
Here are some of the things that I believe to be inalienable truths about data centers today and in some of these probably forever if history is any indication.
- Data Centers are heterogeneous in the make up of facilities equipment with different brands of equipment across the functions.
- Data Centers are largely heterogeneous in the make up of their servers population, network population, etc.
- Data Centers house non-server equipment like routers, switches, tape storage devices and the like.
- Data Centers generally have differing designs, redundancy, floor layout, PDU distribution configurations.
- Today most racks are unintelligent, those that are not, are vendor specific and/or proprietary-also-Expensive versus bent steel.
- Except in a very few cases, there is NO integration between asset management, change management, incident management, problem management systems between IT *AND* facilities systems.
These will be important in a second so mark this spot on the page as it ties into my thoughts on the definition of prime time. You see, to me in this context, Prime Time means that when a solution is deployed it will actually solve problems and reduce the number of things a Data Center Manager has to do or worry about. This is important because notice I did not say anything about making something easier. Sometimes, easier doesn’t solve the problem.
There is some really incredible work going on at some of the server manufacturers in the area of power capping. After all they know their products better than anyone. For gratuitous purposes because he posts and comments here, I refer you to the Eye on Blades blog at HP by Tony Harvey. On his post responding to the previous Chiller-side chat, he talked up the amazing work that HP is doing and is already available on some G5 boxes and all G6 boxes along with additional functionality available in the blade enclosures.
Most of the manufacturers are doing a great job here. The dynamic load stuff is incredibly cool as well. However, the business side of my brain requires that I state that this level of super-cool wizardry usually comes at additional cost. Lets compare that with Howard, the every day data center manager who does it today, who from a business perspective is a sunk cost. Its essentially free. Additionally, simple things like performing an SNMP poll for power draw on a box (which used to be available in some server products for free) have been removed or can only be accessed through additional operating licenses. Read as more cost. So the average business is faced with getting this capability for servers at an additional cost, or make Howard the Data Center manager do it for free and know that his general fear of losing his job if things blow up is a good incentive for doing it right.
Aside from that, it still has challenges in Truth #2. Extremely rare is the data center that uses only one server manufacturer. While its the dream of most server manufacturers, its more common to find DELL Servers, along side HP Servers, alongside Rackable. Add to that fact that even in the same family you are likely to see multiple generations of gear. Does the business have to buy into the proprietary solutions of each to get the functionality they need for power capping? Is there an industry standard in Power Capping that ensures we can all live in peace and harmony? No. Again that pesky business part of my mind says, cost-cost-cost. Hey Howard – Go do your normal manual thing.
Now lets tackle Truth #3 from a power capping perspective. Solving the problem from the server side is only solving part of the problem. How many network gear manufacturers have power capping features? You would be hard pressed to find a number on one hand. In a related thought, one of the standard connectivity trends in the industry is top of rack switching. Essentially for purposes of distribution, a network switch is placed at the top of the rack to handle server connectivity to the network. Does our proprietary power capping software catch the power draw of that switch? Any network gear for that matter? Doubtful. So while I may have super cool power capping on my servers I am still screwed at the rack layer –where data center managers manage from as one of their base units. Howard may be able to have some level of Surety that his proprietary server power capping stuff is humming along swimmingly, he still has to do the work manually. Its definitely simpler for Howard, to get that task done potentially quicker, but we have not actually reduced steps in the process. Howard is still manually walking the floor.
Which brings up a good point, Howard the Data Center manager manages by his base unit of rack. In most data centers, racks can have different server manufacturers, different equipment types (servers, routers, switches, etc), and can even be of different sizes. While some manufacturers have built state of the art racks specific for their equipment it doesn’t solve the problem. We have now stumbled upon Truth #5.
Since we have been exploring how current power capping technologies meet at the intersection of IT and facilities it brings up the last point I will touch on regarding tools. I will get there by asking some basic questions as to the operations of a typical data center. In terms of Operations does your IT asset management system provide for racks as an item of configuration? Does your data center manager use the same system? Does your system provide for multiple power variables? does it track power at all? Does the rack have power configuration associated with it? Or does your version of Howard use spreadsheets? I know where my bet is on your answers. Tooling has a long way to go in this space. Facilities vendors are trying to approach it from their perspective, IT tools providers are doing the same, along with tools and mechanisms from equipment manufacturers as well. There are a few tools that have been custom developed to do this kind of thing, but they have been done for use in very specific environments. We have finally arrived at Power Capping and Truth #6.
Please don’t get me wrong, I think that ultimately power capping will finally fulfill its great promise and do tremendous wonders. Its one of those rare areas which will have a very big impact in this industry. If you have the ability to deploy the vendor specific solutions (which are indeed very good), you should. It will make things a bit easier, even if it doesn’t remove steps. However I think ultimately in order to have real effect its going to have to compete with the cost of free. Today this work is done by the data center managers with no apparent additional cost from a business perspective. If I had some kind of authority I would call for there to be a Standard to be put in place around Power Capping. Even if its quite minimal it would have a huge impact. It could be as simple as providing three things. First provide for free and unfiltered access to an SNMP Mib that allows access to the current power usage information of any IT related device. Second, provide a Mib, which through the use of a SET command could place a hard upper limit of power usage. This setting could be read by the box and/or the operating system and start to slow things down or starve resources on the box for a time. Lastly, the ability to read that same Mib. This would allow for the poor cheap Howard’s to take advantage of at least simplifying their environments. tremendously. It would still provide software and hardware manufacturers to build and charge for the additional and dynamic features they would require.