Pointy Elbows, Bags of Beans, and a little anthill excavation…A response to the New York Times Data Center Articles

I have been following with some interest the series of articles in the New York Times by Jim Glanz.  The series premiered on Sunday with an article entitled Power, Pollution and the Internet, which was followed up today with a deeper dive in some specific examples.  The examples today (Data  Barns in a farm town, Gobbling Power and Flexing muscle) focused on the Microsoft program, a program of which I have more than some familiarity since I ran it for many years.   After just two articles, reading the feedback in comments, and seeing some of the reaction in the blogosphere it is very clear that there is more than a significant amount of misunderstanding, over-simplification, and a lack of detail I think is probably important.   In doing so I want to be very clear that I am not representing AOL, Microsoft, or any other organization other than my own personal observations and opinions.  

As mentioned in both of the articles I was one of hundreds of people interviewed by the New York Times for this series.  In those conversations with Jim Glanz a few things became very apparent.  First – He has been on this story for a very long time, at least a year.   As far as journalists go, he was incredibly deeply engaged and armed with tons of facts.  In fact, he had a trove of internal emails, meeting minutes, and a mountain of data through government filings that must have taken him months to collect.  Secondly, he had the very hard job of turning this very complex space into a format where the uneducated masses can begin to understand it.  Therein lies much of the problem – This is an incredibly complex space to try and communicate it to those not tackling it day to day or even understand that technological, regulatory forces involved.  This is not an area or topic that can be sifted down to a sound bite.   If this were easy, there really wouldn’t be a story would there?

At issue for me is that the complexity of the powers involved seems to get scant attention aiming larger for the “Data Centers are big bad energy vampires hurting the environment” story.   Its clearly evident reading through the comments on the both of the articles so far.   Claiming that the sources and causes have everything to do from poor web page design to government or multi-national companies conspiracies to corner the market on energy. 

So I thought I would take a crack article by article to shed some light (the kind that doesn’t burn energy) on some of the topics and just call out where I disagree completely.     In full transparency  the “Data Barns” article doesn’t necessarily paint me as a “nice guy”.  Sometimes I am.  Sometimes I am not.  I am not an apologist, nor do I intend to do so in this post.  I am paid to get stuff done.  To execute. To deliver.  Quite frankly the PUD missed deadlines (the progenitor event to my email quoted in the piece) and sometimes people (even utility companies) have to live in the real world of consequences.   I think my industry reputation, work, and fundamental stances around driving energy efficiency and environmental conservancy in this industry can stand on its own both publicly and for those that have worked for me. 

There is an inherent irony here that these articles were published in both print and electronically to maximize the audience and readership.  To do that, these articles made “multiple trips” through a data center, and ultimately reside in one (or more).  They seem to denote that keeping things online is bad which seems to go against the availability and need of the articles themselves.  Doesn’t the New York times expect to make these articles available on-line for people to read?  Its posted online already.  Perhaps they expect that their micro-fiche experts would be able to serve the demand for these articles in the future?  I do not think so. 

This is a complex eco-system of users, suppliers, technology, software, platforms, content creators, data (both BIG and small), regulatory forces, utilities, governments, financials, energy consumption, people, personalities, politics, company operating tenets, community outreach to name a very few.  On top of managing through all these variables they also have to keep things running with no downtime.

\Mm

Kickin’ Dirt

mikeatquincy

I recently got an interesting note from Joel Stone, the Global Operations Chief at Global Switch.  As some of you might know Joel used to run North American Operations for me at Microsoft.  I guess he was digging through some old pictures and found this old photo of our initial site selection trip to Quincy, Washington.

As you can see, the open expanse of farmland behind me, ultimately became Microsoft’s showcase facilities in the Northwest.  In fact you can even see some farm equipment just behind me.   It got me reminiscing about that time and how exciting and horrifying that experience can be.

At the time Quincy, Washington was not much more than a small agricultural town, whose leaders did some very good things (infrastructurally speaking) and benefitted by the presence of large amounts of hydro-power.  When we went there, there were no other active data centers for hundreds of miles, there were no other technology firms present, and discussions around locating a giant industrial-technology complex here seemed as foreign as landing on the moon might have sounded during World War Two.

Yet if you fast forward to today companies like Microsoft, Yahoo, Sabey, Intuit, and others have all located technology parks in this one time agricultural hub.   Data Center Knowledge recently did an article on the impacts to Quincy. 

Many people I speak to at conferences generally think that the site selection process is largely academic.   Find the right intersection of a few key criteria and locate areas on a map that seem to fit those requirements.   In fact, the site selection strategy that we employed took many different factors into consideration each with its own weight leading ultimately to a ‘heat map’ in which to investigate possible locations. 

Even with some of the brightest minds, and substantial research being done, its interesting to me that ultimately the process breaks down into something I call ‘Kickin Dirt’.   Those ivory tower exercises ultimately help you narrow down your decisions to a few locations, but the true value of the process is when you get out to the location itself and ‘kick the dirt around’.   You get a feel for the infrastructure, local culture, and those hard to quantify factors that no modeling software can tell you.  

Once you have gone out and kicked the dirt,  its decision time.  The decision you make, backed by all the data and process in the world, backed by personal experience of the locations in question,  ultimately nets out to someone making a decision.   My experience is that this is something that rarely works well if left up to committee.  At some point someone needs the courage and conviction, and in some cases outright insanity to make the call. 

If you are someone with this responsibility in your job today – Do your homework, Kick the Dirt, and make the best call you can.  

To my friends in Quincy – You have come along way baby!  Merry Christmas!

 

\Mm

Live Chiller Side Chat Redux

I wanted to take a moment to thank Rich Miller of Data Center Knowledge, and all of those folks that called in and asked and submitted questions today in the Live Chiller Side Chat.   It was incredible fun for me to get a chance to answer questions directly from everyone.   My only regret is that we did not have enough time!

When you have a couple of hundred people logged in, its unrealistic and all but impossible to answer all of the questions.  However, I think Rich did a great job bouncing around to clue into key themes that he saw emerging from the questions.    One thing is for sure is that we will try to do another one of those given the amount of unanswered questions.  I have already been receiving some great ideas on how to possibly structure these moving forward.  Hopefully everyone got some value or insight out of the exercise.  As I warned before the meeting, you may not get the right answer, but you will definitely get my answer.  

One of the topics that we touched on briefly during the call, and went a bit under-discussed was regulation associated with data centers or more correctly, regulation and legislation that will affect our industry.    For those of you who are interested I recently completed an executive primer video on the subject of data center regulation.  The link can be found here:

image

Data Center Regulation Video.

Thanks again for spending your valuable time with me today and hope we can do it again!

\Mm

“We Can’t Afford to measure PUE”

One of the more interesting phenomena that I experience as I travel and talk with customers and industry peers is that there is a significant number of folks out there with the belief that they cannot measure PUE because they cannot afford or lack the funding for the type of equipment and systems needed to properly measure their infrastructure.  As a rule I beleive this to be complete hogwash as there are ways to measure PUE without any additional equipment (I call it SneakerNet or Manual Scada).   One can easily look at the power draw off the UPS  and compare that to the information in their utility bills.  Its not perfect, but it gives you a measure that you can use to improve your efficiency.  As long as you are consistent in your measurement rigor (regular intervals, same time of day, etc) you can definitely achieve better and better efficiency within your facility.

Many people have pushed back on me saying that measurement closer to the PDU or rack is more important and for that one needs a full blown branch circuit monitoring solution.   While to me increased efficiency is more about vigilance in understanding your environment I have had to struggle with an affordable solution for folks who desired better granularity.

Now that I have been in management for the better half of my career I have had to closet the inner geek in me to home and weekend efforts.   Most of my friends laugh when they find out I essentially have a home data center comprised of a hodge podge of equipment I have collected over the years.  This includes things like my racks of different sizes (It has been at least a decade since I have seen a half-height rack in a facility, but I have two!) , my personal Cisco Certification lab, Linux Servers, Microsoft Servers, and a host of other odds and ends).  Its a personal playground for me to try and remain technical despite my management responsibilities.  

Its also a great place for me to try out new gear from time to time and I have to say I found something that might fit the bill for those folks that want to get a deeper understanding of power consumption in their facilities.   I rarely give product endorsements but this is something that the budget minded facilities folks might really like to take a look at. 

I received a CL-AMP IT package from the Noble Vision Group to review and give them some feedback on their kit.   The first thing that struck me was that this kit seemed to essentially be a power metering for dummies kit.    There were a couple of really neat characteristics out of the box that took many of the arguments I usually hear right off the table.

nvg

First the “clamp” itself in non-intrusive, non-invasive way to get accurate power metering and results.   This means contrary to other solutions I did not have to unplug existing servers and gear to be able to get readings from my gear or try and install this device inline.  I simply Clamped the power coming into the rack (or a server) and POOF! I had power information. It was amazingly simple. Next up -  I had heard that clamp like devices were not as accurate before so I did some initial tests using an older IP Addressable power strip which allowed me to get power readings for my gear.   I then used the CL-AMP device to compare and they were consistently within +/- 2% with each other.  As far as accuracy, I am calling it a draw because to be honest its a garage based data center and I am not really sure how accurate my old power strips are.   Regardless the CL-AMPS allowed me a very easy way to get my power readings easily without disrupting the network.  Additionally, its mobile so if I wanted to I could move it around you can.  This is important for those that might be budget challenged as the price point for this kit would be incredibly cheaper than a full blown Branch Circuit solution. 

While my experiment was far from completely scientific and I am the last person to call myself a full blown facilities engineer, one thing was clear this solution can easily fill a role as a mobile, quick hit way to measure PUE power usage in your facility that doesn’t break the bank or force you to disrupt operations or installing devices in line.   

\Mm

Chiller-Side Chats : Is Power Capping Ready for PrimeTime?

I was very pleased at the great many responses to my data center capacity planning chat.  They came in both public and private notes with more than a healthy population of those centered around my comments on power capping and their potential disagreement on why I don’t think the technology/applications/functionality is 100% there yet. So I decided to throw up an impromptu ad-hoc follow-on chat on Power Capping.  How’s that for service?

What’s your perspective?

In a nutshell my resistance can be summed up and defined in the exploration of two phrases.  The first is ‘prime time’ and how I define it from where I come at the problem from.  The second is the definition of the term ‘data center’ and in what context I am using it as it relates to Power Capping.

I think to adequately address my position I will answer it from the perspective of the three groups that these Chiller Side Chats are aimed at namely, the Facility side, the IT side, and ultimately the business side of the problem. 

Let’s start with the latter phrase : ‘data center’ first.  To the facility manager this term refers to the actual building, room, infrastructure that IT gear sits in.   His definition of Data Center includes things like remote power panels, power whips, power distribution units, Computer Room Air Handlers (CRAHs), generators, and cooling towers.   It all revolves around the distribution and management of power.

From an IT perspective the term is usually represented or thought of in terms of servers, applications, or network capabilities.  It sometimes blends in to include some aspects of the facility definition but only as it relates to servers and equipment.   I have even heard it used to applied to “information” which is even more ethereal.  Its base units could be servers, storage capacity, network capacity and the like.

From a business perspective the term ‘data center’ is usually lumped together to include both IT and facilities but at a very high level.  Where the currency for our previous two groups are technical in nature (power, servers, storage, etc) – the currency for the business side is cold hard cash.   It involves things like OPEX costs, CAPEX costs, and Return on Investment.

So from the very start, one has to ask, which data center are you referring to?  Power Capping is a technical issue, and can be implemented at either of the two technical perspectives.   It also will have an impact on the business aspect but it can also be a barrier to adoption.

We believe these truths to be self-evident

Here are some of the things that I believe to be inalienable truths about data centers today and in some of these probably forever if history is any indication.

  1. Data Centers are heterogeneous in the make up of facilities equipment with different brands of equipment across the functions.
  2. Data Centers are largely heterogeneous in the make up of their servers population, network population, etc.
  3. Data Centers house non-server equipment like routers, switches, tape storage devices and the like.
  4. Data Centers generally have differing designs, redundancy, floor layout, PDU distribution configurations.
  5. Today most racks are unintelligent, those that are not, are vendor specific and/or proprietary-also-Expensive versus bent steel.
  6. Except in a very few cases, there is NO integration between asset management, change management, incident management, problem management systems between IT *AND* facilities systems.

These will be important in a second so mark this spot on the page as it ties into my thoughts on the definition of prime time.  You see, to me in this context, Prime Time means that when a solution is deployed it will actually solve problems and reduce the number of things a Data Center Manager has to do or worry about.   This is important because notice I did not say anything about making something easier.  Sometimes, easier doesn’t solve the problem. 

There is some really incredible work going on at some of the server manufacturers in the area of power capping.   After all they know their products better than anyone.  For gratuitous purposes because he posts and comments here, I refer you to the Eye on Blades blog at HP by Tony Harvey.  On his post responding to the previous Chiller-side chat, he talked up the amazing work that HP is doing and is already available on some G5 boxes and all G6 boxes along with additional functionality available in the blade enclosures. 

Most of the manufacturers are doing a great job here.  The dynamic load stuff is incredibly cool as well.    However, the business side of my brain requires that I state that this level of super-cool wizardry usually comes at additional cost.   Lets compare that with Howard, the every day data center manager who does it today, who from a business perspective is a sunk cost.   Its essentially free.   Additionally, simple things like performing an SNMP poll for power draw on a box (which used to be available in some server products for free) have been removed or can only be accessed through additional operating licenses.  Read as more cost.    So the average business is faced with getting this capability for servers at an additional cost, or make Howard the Data Center manager do it for free and know that his general fear of losing his job if things blow up is a good incentive for doing it right. 

Aside from that, it still has challenges in Truth #2.  Extremely rare is the data center that uses only one server manufacturer.  While its the dream of most server manufacturers, its more common to find DELL Servers, along side HP Servers, alongside Rackable. Add to that fact that even in the same family you are likely to see multiple generations of gear.  Does the business have to buy into the proprietary solutions of each to get the functionality they need for power capping?  Is there an industry standard in Power Capping that ensures we can all live in peace and harmony?  No.  Again that pesky business part of my mind says, cost-cost-cost.  Hey Howard – Go do your normal manual thing.

Now lets tackle Truth #3 from a power capping perspective.   Solving the problem from the server side is only solving part of the problem.   How many network gear manufacturers have power capping features? You would be hard pressed to find a number on one hand.   In a related thought, one of the standard connectivity trends in the industry is top of rack switching.  Essentially for purposes of distribution, a network switch is placed at the top of the rack to handle server connectivity to the network.     Does our proprietary power capping software catch the power draw of that switch?  Any network gear for that matter?  Doubtful.  So while I may have super cool power capping on my servers I am still screwed at the rack layer –where data center managers manage from as one of their base units.   Howard may be able to have some level of Surety that his proprietary server power capping stuff is humming along swimmingly, he still has to do the work manually.  Its definitely simpler for Howard, to get that task done potentially quicker, but we have not actually reduced steps in the process.   Howard is still manually walking the floor.  

Which brings up a good point, Howard the Data Center manager manages by his base unit of rack.  In most data centers, racks can have different server manufacturers, different equipment types (servers, routers, switches, etc), and can even be of different sizes.    While some manufacturers have built state of the art racks specific for their equipment it doesn’t solve the problem.  We have now stumbled upon Truth #5.

Since we have been exploring how current power capping technologies meet at the intersection of IT and facilities it brings up the last point I will touch on regarding tools. I will get there by asking some basic questions as to the operations of a typical data center.  In terms of Operations does your IT asset management system provide for racks as an item of configuration?  Does your data center manager use the same system?  Does your system provide for multiple power variables?  does it track power at all?  Does the rack have power configuration associated with it?  Or does your version of Howard use spreadsheets?  I know where my bet is on your answers.  Tooling has a long way to go in this space.   Facilities vendors are trying to approach it from their perspective, IT tools providers are doing the same, along with tools and mechanisms from equipment manufacturers as well. There are a few tools that have been custom developed to do this kind of thing, but they have been done for use in very specific environments.  We have finally arrived at Power Capping and Truth #6. 

Please don’t get me wrong, I think that ultimately power capping will finally fulfill its great promise and do tremendous wonders.  Its one of those rare areas which will have a very big impact in this industry.   If you have the ability to deploy the vendor specific solutions (which are indeed very good), you should. It will make things a bit easier, even if it doesn’t remove steps.   However I think ultimately in order to have real effect its going to have to compete with the cost of free.   Today this work is done by the data center managers with no apparent additional cost from a business perspective.   If I had some kind of authority I would call for there to be a Standard to be put in place around Power Capping.  Even if its quite minimal it would have a huge impact.   It could be as simple as providing three things.  First provide for free and unfiltered access to an SNMP Mib that allows access to the current power usage information of any IT related device.  Second, provide a Mib, which through the use of a SET command could place a hard upper limit of power usage.  This setting could be read by the box and/or the operating system and start to slow things down or starve resources on the box for a time.  Lastly, the ability to read that same Mib.    This would allow for the poor cheap Howard’s to take advantage of at least simplifying their environments.  tremendously.  It would still provide software and hardware manufacturers to build and charge for the additional and dynamic features they would require. 

\Mm