The folks who were recording the “Live” Chiller Side Chat have sent me a link to the recording. If you were not able to make the event live, but are still interested in hearing how it went feel free to have a listen at the following link:
The folks who were recording the “Live” Chiller Side Chat have sent me a link to the recording. If you were not able to make the event live, but are still interested in hearing how it went feel free to have a listen at the following link:
I wanted to take a moment to thank Rich Miller of Data Center Knowledge, and all of those folks that called in and asked and submitted questions today in the Live Chiller Side Chat. It was incredible fun for me to get a chance to answer questions directly from everyone. My only regret is that we did not have enough time!
When you have a couple of hundred people logged in, its unrealistic and all but impossible to answer all of the questions. However, I think Rich did a great job bouncing around to clue into key themes that he saw emerging from the questions. One thing is for sure is that we will try to do another one of those given the amount of unanswered questions. I have already been receiving some great ideas on how to possibly structure these moving forward. Hopefully everyone got some value or insight out of the exercise. As I warned before the meeting, you may not get the right answer, but you will definitely get my answer.
One of the topics that we touched on briefly during the call, and went a bit under-discussed was regulation associated with data centers or more correctly, regulation and legislation that will affect our industry. For those of you who are interested I recently completed an executive primer video on the subject of data center regulation. The link can be found here:
Thanks again for spending your valuable time with me today and hope we can do it again!
I am extremely excited to be participating in a live (webcast) Chiller-Side Chat hosted by none other than Rich Miller of Data Center Knowledge. The event is scheduled for Monday, September 14th from noon to 1pm Central Standard Time. You can register for the online event at this link.
I think perhaps the most interesting aspect of this to me is that this will be a live event and focused on answering questions that come in from the audience. As you know I usually use my ‘Chiller Side Chat’ posts to discuss some topic or other that interests or frustrates me. Sometimes, even others think they may be interesting or relevant too. I am planning on meeting up with Rich and doing the webcast from Las Vegas, where I am speaking at the Tier One Hosting Transformation Summit.
I am incredibly excited about this event and hope that if you have time you will join us. While I will endeavor to give you the right answers – one thing you can be sure of is that you will get MY answers.
See you then!
If you happen to be following the news around Digital Realty Trust you may have seen the recent announcement of our Pod Architecture Services (PAS) offering. Although the response has been deafening there seems to be a lot of questions and confusion around what it is, what it is not, and what this ultimately means for Digital Realty Trust and our place in the industry.
First a simple observation – the Data Center Industry as it stands today is in actuality an industry of cottage industries. Its an industry dominated by boutique firms in specialized niches all in support of the building out of these large technically complex facilities. For the initiated its a world full of religious arguments like battery versus rotary, air-side economization versus water-side economization, raised floor versus no raised floor. To the uninitiated its an industry categorized by mysterious wizards of calculus and fluid dynamics and magical electrical energies. Its an illusion the wizards of the collective cottage industries are well paid and incented to keep up. They ply their trade in ensuring that each facility’s creation is a one-off event, and likewise, so is the next one. Its a world of competing General Contractors, architecture firms, competing electrical and mechanical firms, of specialists in all sizes, shapes and colors. Ultimately – in my mind there is absolutely nothing wrong with this. Everyone has the right to earn a buck no matter how inefficient the process.
After all, there is a real science to most of the technologies and application of design involved in data centers and the magical mysteries they control are real. They are are all highly trained professionals educated in some very technical areas. But if we are honest, each generally looks at the data center world from their own corner of the eco-system and while they solve their own issues and challenges quite handily, they stumble when having to get out of their comfort zone. When they need to cooperate with other partners in the eco-system and solve more far reaching issues it almost universally results in those solutions being applied in a one-off or per-job perspective. I can tell you that leveraging consistency across a large construction program is difficult at best even with multiple projects underway, let alone a single project.
The challenge of course is that in reality the end-user/owner/purchaser does not view the data center as an assembly of different components but rather as a single data center facility. The complexity in design and construction are must-have headaches for the business manager who ultimately just wants to sate his need to have capacity for some application or business solution.
Now into that background enter our POD Architecture Services offering. In a nutshell it allows those customers who do not necessarily want to lease (or cannot due to regulatory or statutory reasons) a facility to use their own capital in building a facility without all the complexity associated with these buildings.
Our PAS offering is ultimately a way for a company to purchase a data center product. Leveraging Digital’s Data Center product approach a company can simply select the size and configuration of their facility using the same “SKU’s” we use internally in building out of our own facilities. In effect we license the use of our design to these customers so that they enjoy the same benefits as our the customers of our turn-key data center facilities.
This means that customers of our PAS product can leverage our specific designs which are optimized across the four most crucial aspects of the data center lifecycle. Our facility design is optimized around cost, mission, long term simplicity in operability and efficiency. This is anchored around a belief that a design is comprised of both upfront first time costs along with the lifetime costs of the facility. This is an “owners” perspective, which is the only perspective we have. As the world’s largest data center REIT and wholesaler, we need to take the full picture into account. We will own these facilities for a very long time.
Many designs like to optimize around the technology, or around the upfront facility costs, or drive significant complexity in design to ensure that every possible corner case is covered in the facility. But the fact is if you cut corners up front, you potentially pay for it over the life of the asset, if you look for the most technologically advanced gear or provide for lots of levers, knobs, and buttons for the ultimate flexibility you open yourself up for more human error or drive more costs in the ongoing operation of the facility. The owners perspective is incredibly important. Many times companies allow these decisions to be made by their partner firms (the cottage industries) and this view gets clouded. Given the complexity of these buildings and the fact that they are not built often by the customers in the first time its hard to maintain that owners perspective without dedicated and vigilant attention. PAS changes all that as the designs have already been optimized and companies can simply purchase the product they most desire with the guarantee of what they receive on the other end of the process, is what they expected.
Moreover, PAS includes the watchful eye, experience and oversight of our veteran data center construction management. These are highly qualified program managers who have built tens of data centers and act as highly experienced owners representatives. The additional benefit being that they have built this product multiples of times and have become very good at the delivery of these types of facilities based upon our standardized product. In addition, our customers can leverage our significant supply chain of construction partners, parts and equipment which allows for incredible benefits in the speed of delivery of these buildings along with reductions in upfront costs due to our volume purchasing.
This does not mean that Digital is going to become an Architectural or engineering firm and stamp drawings. This does not mean we will become the general contractor. This simply means that we will leverage our supply chain to deliver our designs and facilities based upon our best practices on behalf of the customers in a process that is completely managed and delivered by experienced professionals. We will still have general contractors, and A&E firms, and the like that have experience in building our standardized product. We are driving towards standardization. If you believe there is value in having each facility as a one off design, the more power to you. We have a service there too, its call Build to Suit. PAS is just another step in the formal definition of standard data center product. Its a key element in modularization of capacity. It is through standardization by which we can begin to have a larger impact on efficiency, and other key “Green” initiatives.
I have often referred to my belief that data centers are simply the sub-stations of the Information Utility. This service allows for commercial companies to start benefitting from the same standardization and efficiency gains that we are making in the wholesale space and enjoy the same cost factors.
Hope that makes things a bit clearer!
Most people think of ‘the cloud’ as a technical place defined by technology, the innovation of software leveraged across a scale of immense proportions and ultimately a belief that its decisions are guided by some kind of altruistic technical meritocracy. At some levels that is true on others one needs to remember that the ‘cloud’ is ultimately a business. Whether you are talking about the Google cloud, the Microsoft cloud, Amazon Cloud, or Tom and Harry’s Cloud Emporium, each is a business that ultimately wants to make money. It never ceases to amaze me that in a perfectly solid technical or business conversation around the cloud people will begin to wax romantic and lose sight of common sense. These are very smart technical or business savvy people but for some reason the concept of the cloud has been romanticized into something almost philosophical, a belief system, something that actually takes on the wispy characteristics that the term actually conjures up.
When you try to bring them down to the reality the cloud is essentially large industrial buildings full of computers, running applications that have achieved regional or even global geo-diversity and redundancy you place yourself in a tricky place that at best labels you a kill-joy and at worst a Blasphemer.
I have been reminded of late of a topic that I have been meaning to write about. As defined by my introduction above, some may find it profane, others will choose to ignore it as it will cause them to come crashing to the ground. I am talking about the unseemly and terribly disjointed intersection of Government regulation, Taxes, and the Cloud. This also loops in “the privacy debate” which is a separate conversation almost all to itself. I hope to touch on privacy but only as it touches these other aspects.
As many of you know my roles past and present have focused around the actual technical delivery and execution of the cloud. The place where pure software developers fear to tread. The world of large scale design, construction and operations specifically targeted at a global infrastructure deployment and its continued existence into perpetuity. Perpetuity you say? That sounds a bit to grandiose doesn’t it? My take is that once you have this kind of infrastructure deployed it will become an integral part of how we as a species will continue to evolve in our communications and our technological advances. Something this cool is powerful. Something this cool is a game changer. Something this cool will never escape the watchful eyes of the world governments and in fact it hasn’t.
There was a recent article at Data Center Knowledge regarding Microsoft’s decision remove its Azure Cloud platform out of the State of Washington and relocate them (whether virtually or physically) to be run in the state of Texas. Other articles have highlighted similar conversations with Yahoo and the state of Washington, or Google and the state of North Carolina. These decisions all have to do with state level taxes and their potential impact on the upfront capital costs or long term operating costs of the cloud. You are essentially seeing the beginning of a cat and mouse game that will last for some time on a global basis. States and governments are currently using their blunt, imprecise instruments of rule (regulations and taxes) to try and regulate something they do not yet understand but know they need to play apart of. Its no secret that technology is advancing faster than our society can gauge its overall impact or its potential effects and the cloud is no different.
In my career I have been responsible for the creation of at least 3 different site selection programs. Upon these programs were based the criteria and decisions of where to place cloud and data center infrastructure would reside. Through example and practice, I have been able to deconstruct other competitors criteria and their relative weightings at least in comparison to my own and a couple of things jump out very quickly at anyone truly studying this space. While most people can guess the need for adequate power and communications infrastructure, many are surprised that tax and regulation play such a significant role in even the initial sighting of a facility. The reason is pure economics over the total lifetime of an installation.
I cannot tell you how often I have economic development councils or business development firms come to me to tell me about the next ‘great’ data center location. Rich in both power infrastructure and telecommunications, its proximities to institutions of higher learning, etc. Indeed there are some really great places that would seem ideal for data centers if one allowed them to dwell in the “romanticized cloud”. What they fail to note, or understand is that there may be legislation or regulation already on the books, or perhaps legislation currently winding its way through the system that could make it an inhospitable place or at least slightly less welcoming to a data center. As someone responsible for tens of millions or hundreds of millions, or even billions of dollars worth of investment you find yourself in a role where you are reading and researching legislation often. Many have noted my commentary on the Carbon Reduction Commitment in the UK, or my most recent talks about the current progress and data center impacts of the Waxman-Markey bill in the US House of Representatives. You pay attention because you have to pay attention. Your initial site selection is supremely important because you not only need to look for the “easy stuff” like power and fiber, but you need to look longer term, you need to look at the overall commitment of a region or an area to support this kind of infrastructure. Very large business decisions are being made against these “bets” so you better get them right.
To be fair the management infrastructure in many of these cloud companies are learning as they go as well. Most of these firms are software companies who have now been presented with the dilemma of managing large scale capital assets. Its no longer about Intellectual Property, its about physical property and there are some significant learning curves associated with that. Add to the mix that this is whole cloud thing is something entirely new.
One must also keep in mind that even with the best site selection program and the most robust up front due diligence, people change, governments change, rules change and when that happens it can and will have very large impacts on the cloud. This is not something cloud providers are ignoring either. Whether its through their software, through infrastructure, through a more modular approach they are trying to solve for the eventuality that things will change. Think about the potential impacts from a business perspective.
Lets pretend you own a cloud and have just sunk 100M dollars into a facility to house part of your cloud infrastructure. You spent lots of money in your site selection and up front due diligence to find the very best place to put a data center. Everything is going great, after 5 years you have a healthy population of servers in that facility, you have found a model to monetize your service, so things are going great, but then the locale where your data center lives changes the game a bit. They pass a law that states that servers engaged in the delivery of a service are a taxable entity. Suddenly that place becomes very inhospitable to your business model. You now have to worry about what that does to your business. It could be quite disastrous. Additionally if you rule that such a law would instantly impact your business negatively, you have the small matter of a 100M asset sitting in a region where you cannot use it. Again a very bad situation. So how do you architect around this? Its a challenge that many people are trying to solve. Whether you want to face it or not, the ‘Cloud’ will ultimately need to be mobile in its design. Just like its vapory cousins in the sky, the cloud will need to be on the move, even if its a slow move. Because just as there are forces looking to regulate and control the cloud, there are also forces in play where locales are interested in attracting and cultivating the cloud. It will be a cycle that repeats itself over and over again.
So far we have looked at this mostly from a taxation perspective. But there are other regulatory forces in play. I will use the example of Canada. The friendly frosty neighbors to the great white north of the United States. Its safe to say that Canada and US have had historically wonderful relations with one another. However when one looks through the ‘Cloud’ colored looking glass there are some things that jump out to the fore.
In response to the Patriot Act legislation after 9-11, the Canadian government became concerned with the rights given to the US government with regards to the seizure of online information. They in turn passed a series of Safe-Harbor-like laws that stated that no personally identifiable information of Canadian citizens could be housed outside of the Canadian borders. Other countries have done, or are in process with similar laws. This means that at least some aspects of the cloud will need to be anchored regionally or within specific countries. A boat can drift even if its anchored and so must components of the cloud, its infrastructure and design will need to accommodate for this. This touches on the privacy issue I talked about before. I don’t want to get into the more esoteric conversations of Information and where its allowed to live and not live, I try to stay grounded in the fact that whether my romantic friends like it or not, this type of thing is going to happen and the cloud will need to adapt.
Its important to note that none of the legislation focuses on ‘the cloud’ or ‘data centers’ just yet. Just as the Waxman-Markey bill or CRC in the UK doesn’t specifically call out data centers, those laws will have significant impacts on the infrastructure and shape of the cloud itself.
There is an interesting chess board developing between technology versus regulation. They are inexorably intertwined with one another and each will shape the form of the other in many ways. A giant cat an mouse game on a global level. Almost certainly, this evolution wont be the most “technically superior” solution. In fact, these complications will make the cloud a confusing place at times. If you desired to build your own application using only cloud technology, would you subscribe to a service to allow the cloud providers to handle these complications? Would you and your application be liable for regulatory failures in the storage of Azerbaijani-nationals? Its going to be an interesting time for the cloud moving forward.
One can easily imagine personally identifiable information housed in countries of origin, but the technology evolving so that their actions on the web are held elsewhere, perhaps even regionally where the actions take place. You would see new legislation emerging to potentially combat even that strategy and so the cycle will continue. Likewise you might see certain types of load compute or transaction work moving around the planet to align with more technically savvy or advantageous locales. Just as the concept of Follow the Moon has emerged for a potential energy savings strategy to move load around based on the lowest cost energy, it might someday be followed with a program similarly move information or work to more “friendly” locales. The modularity movement of data center design will likely grow as well trying to reduce the overall exposure the cloud firms have in any given market or region.
On this last note, I am reminded of one of my previous posts. I am firm in my belief that Data Centers will ultimately become the Sub-Stations of the information utility. In that evolution they will become more industrial, more commoditized, with more intelligence at the software layer to account for all these complexities. As my own thoughts and views evolve around this I have come to my own strange epiphany.
Ultimately the large cloud providers should care less and less about the data centers they live in. These will be software layer attributes to program against. Business level modifiers on code distribution. Data Centers should be immaterial components for the Cloud providers. Nothing more than containers or folders in which to drop their operational code. Today they are burning through tremendous amounts of capital believing that these facilities will ultimately give them strategic advantage. Ultimately these advantages will be fleeting and short-lived. They will soon find themselves in a place where these facilities themselves will become a drag on their balance sheets or cause them to invest more in these aging assets.
Please don’t get me wrong, the cloud providers have been instrumental in pushing this lethargic industry into thinking differently and evolving. For that you need give them appropriate accolades. At some point however, this is bound to turn into a losing proposition for them.
How’s that for Blasphemy?
I was very pleased at the great many responses to my data center capacity planning chat. They came in both public and private notes with more than a healthy population of those centered around my comments on power capping and their potential disagreement on why I don’t think the technology/applications/functionality is 100% there yet. So I decided to throw up an impromptu ad-hoc follow-on chat on Power Capping. How’s that for service?
What’s your perspective?
In a nutshell my resistance can be summed up and defined in the exploration of two phrases. The first is ‘prime time’ and how I define it from where I come at the problem from. The second is the definition of the term ‘data center’ and in what context I am using it as it relates to Power Capping.
I think to adequately address my position I will answer it from the perspective of the three groups that these Chiller Side Chats are aimed at namely, the Facility side, the IT side, and ultimately the business side of the problem.
Let’s start with the latter phrase : ‘data center’ first. To the facility manager this term refers to the actual building, room, infrastructure that IT gear sits in. His definition of Data Center includes things like remote power panels, power whips, power distribution units, Computer Room Air Handlers (CRAHs), generators, and cooling towers. It all revolves around the distribution and management of power.
From an IT perspective the term is usually represented or thought of in terms of servers, applications, or network capabilities. It sometimes blends in to include some aspects of the facility definition but only as it relates to servers and equipment. I have even heard it used to applied to “information” which is even more ethereal. Its base units could be servers, storage capacity, network capacity and the like.
From a business perspective the term ‘data center’ is usually lumped together to include both IT and facilities but at a very high level. Where the currency for our previous two groups are technical in nature (power, servers, storage, etc) – the currency for the business side is cold hard cash. It involves things like OPEX costs, CAPEX costs, and Return on Investment.
So from the very start, one has to ask, which data center are you referring to? Power Capping is a technical issue, and can be implemented at either of the two technical perspectives. It also will have an impact on the business aspect but it can also be a barrier to adoption.
We believe these truths to be self-evident
Here are some of the things that I believe to be inalienable truths about data centers today and in some of these probably forever if history is any indication.
These will be important in a second so mark this spot on the page as it ties into my thoughts on the definition of prime time. You see, to me in this context, Prime Time means that when a solution is deployed it will actually solve problems and reduce the number of things a Data Center Manager has to do or worry about. This is important because notice I did not say anything about making something easier. Sometimes, easier doesn’t solve the problem.
There is some really incredible work going on at some of the server manufacturers in the area of power capping. After all they know their products better than anyone. For gratuitous purposes because he posts and comments here, I refer you to the Eye on Blades blog at HP by Tony Harvey. On his post responding to the previous Chiller-side chat, he talked up the amazing work that HP is doing and is already available on some G5 boxes and all G6 boxes along with additional functionality available in the blade enclosures.
Most of the manufacturers are doing a great job here. The dynamic load stuff is incredibly cool as well. However, the business side of my brain requires that I state that this level of super-cool wizardry usually comes at additional cost. Lets compare that with Howard, the every day data center manager who does it today, who from a business perspective is a sunk cost. Its essentially free. Additionally, simple things like performing an SNMP poll for power draw on a box (which used to be available in some server products for free) have been removed or can only be accessed through additional operating licenses. Read as more cost. So the average business is faced with getting this capability for servers at an additional cost, or make Howard the Data Center manager do it for free and know that his general fear of losing his job if things blow up is a good incentive for doing it right.
Aside from that, it still has challenges in Truth #2. Extremely rare is the data center that uses only one server manufacturer. While its the dream of most server manufacturers, its more common to find DELL Servers, along side HP Servers, alongside Rackable. Add to that fact that even in the same family you are likely to see multiple generations of gear. Does the business have to buy into the proprietary solutions of each to get the functionality they need for power capping? Is there an industry standard in Power Capping that ensures we can all live in peace and harmony? No. Again that pesky business part of my mind says, cost-cost-cost. Hey Howard – Go do your normal manual thing.
Now lets tackle Truth #3 from a power capping perspective. Solving the problem from the server side is only solving part of the problem. How many network gear manufacturers have power capping features? You would be hard pressed to find a number on one hand. In a related thought, one of the standard connectivity trends in the industry is top of rack switching. Essentially for purposes of distribution, a network switch is placed at the top of the rack to handle server connectivity to the network. Does our proprietary power capping software catch the power draw of that switch? Any network gear for that matter? Doubtful. So while I may have super cool power capping on my servers I am still screwed at the rack layer –where data center managers manage from as one of their base units. Howard may be able to have some level of Surety that his proprietary server power capping stuff is humming along swimmingly, he still has to do the work manually. Its definitely simpler for Howard, to get that task done potentially quicker, but we have not actually reduced steps in the process. Howard is still manually walking the floor.
Which brings up a good point, Howard the Data Center manager manages by his base unit of rack. In most data centers, racks can have different server manufacturers, different equipment types (servers, routers, switches, etc), and can even be of different sizes. While some manufacturers have built state of the art racks specific for their equipment it doesn’t solve the problem. We have now stumbled upon Truth #5.
Since we have been exploring how current power capping technologies meet at the intersection of IT and facilities it brings up the last point I will touch on regarding tools. I will get there by asking some basic questions as to the operations of a typical data center. In terms of Operations does your IT asset management system provide for racks as an item of configuration? Does your data center manager use the same system? Does your system provide for multiple power variables? does it track power at all? Does the rack have power configuration associated with it? Or does your version of Howard use spreadsheets? I know where my bet is on your answers. Tooling has a long way to go in this space. Facilities vendors are trying to approach it from their perspective, IT tools providers are doing the same, along with tools and mechanisms from equipment manufacturers as well. There are a few tools that have been custom developed to do this kind of thing, but they have been done for use in very specific environments. We have finally arrived at Power Capping and Truth #6.
Please don’t get me wrong, I think that ultimately power capping will finally fulfill its great promise and do tremendous wonders. Its one of those rare areas which will have a very big impact in this industry. If you have the ability to deploy the vendor specific solutions (which are indeed very good), you should. It will make things a bit easier, even if it doesn’t remove steps. However I think ultimately in order to have real effect its going to have to compete with the cost of free. Today this work is done by the data center managers with no apparent additional cost from a business perspective. If I had some kind of authority I would call for there to be a Standard to be put in place around Power Capping. Even if its quite minimal it would have a huge impact. It could be as simple as providing three things. First provide for free and unfiltered access to an SNMP Mib that allows access to the current power usage information of any IT related device. Second, provide a Mib, which through the use of a SET command could place a hard upper limit of power usage. This setting could be read by the box and/or the operating system and start to slow things down or starve resources on the box for a time. Lastly, the ability to read that same Mib. This would allow for the poor cheap Howard’s to take advantage of at least simplifying their environments. tremendously. It would still provide software and hardware manufacturers to build and charge for the additional and dynamic features they would require.
I recently had a conversation with the CIO of a well respected company who informed me that his “data center people” had completely mismanaged his data center space which was now causing them to look at having to lease additional capacity or more aggressively pursue virtualization to solve the problem. Furthermore he was powerless to drive and address change as that data center facilities people worked for a different organization. To top it off it frustrated him to no end that in his mind they simply did not understand IT equipment or technologies being deployed. Unfortunately its a common refrain that I hear over and over again. It speaks to the heart of the problems with understanding data center issues in the industry.
How Data Center Capacity Planning Really works!
Data Center managers are by their very nature extremely conservative people. At the root of this conservatism is the understanding that if and when a facility goes down, it is their bottoms on the bottom line. As such, risk takers are very few and far between in this industry. I don’t think I would get much argument from most business-side managers, who would readily agree to that in a heart beat. But before we hang the albatross around the neck of our facilities management brethren lets take a look at some of the challenges they actually have.
Data Center Capacity Planning is a swirling vortex of science, art, best guess-work, and bad information with a sprinkling of cult of personality for taste. One would think that is should be a straight numbers and math game, but its not. First and foremost, the currency of Data Center Capacity Management and Planning is power. Simple right? Well we shall see.
Lets start at a blissful moment in time, when the facility is shiny and new. The floor shines from its first cleaning, the VESDA (Very Early Smoke Detection and Alarm) equipment has not yet begun to throw off false positives, and all is right with the world. The equipment has been fully commissioned and is now ready to address the needs of the business.
Our Data Center Manager is full of hope and optimism. He or she is confident that this time it will be much different than the legacy problems they had to deal with before. They now have the perfect mix of power and cooling to handle any challenge to be thrown at them. They are then approached by their good friends in Information Services with their first requirement. The business has decided to adopt a new application platform which will of course solve all the evils of previous installations.
Its a brand new day, a new beginning. The Data Center Manager asks the IT personnel how many servers are associated with this new deployment. They also ask how much power those servers will draw so that the room can be optimized for this wonderful new solution. The IT personnel may be using consultants, or maybe they are providing the server specifications themselves. In advanced cases they may even have standardized the types and classes of servers they use. How much power? Well, the nameplate on the server says that each of these bit crunching wonders will draw 300watts a piece. As this application is bound to be a huge draw on resources, they inform the facilities team that there are approximately 20 machines at 300watts that are going to be deployed.
The facilities team knows that no machine ever draws its nameplate ratings once ‘in the wilds’ of the data center and therefore for capacity planning purposes they ‘manually’ calculate a 30% reduction into the server deployment numbers. You see, its not that they don’t trust the IT folks, its just that they generally know better. So that nets out to a 90 watt reduction per server bringing the “budgeted power allocation” down to 210 watts per server. This is an important number to keep in mind. You now have two ratings that you have to deal with. Nameplate, and Budgeted. For advanced data center users they may use even more scientific methods of testing to derive their budgeted amount. For example they may run test software on the server designed to drive the machine to 100% CPU utilization, 100% disk utilization, and the like. Interestingly even after these rigorous tests, the machine never gets close to nameplate. Makes you wonder what that rating is even good for, doesn’t it? Our data center manager doesn’t have that level of sophistication, so he is using a 30% reduction. Keep these numbers in mind as we move forward.
The next question is typically are these servers dual or single corded? Essentially will these machines have redundancy built into the power supplies so in the event of a power loss they might still operate through another source of electricity? Well as every good business manager, IT professional, and data center manager knows – This is a good thing. Sure lets make them double corded.
The data center manager, now well armed with information begins building out the racks, pulls the power whips from diverse PDU (power distribution units) to the location of those racks to ensure that the wonders of dual cording can come to full effect.
The servers arrive, they are installed and in a matter of days the new application is humming along just fine, running into all kinds of user adoption issues, unexpected hick-ups, budget over-runs, etc. Okay maybe I am being a bit sarcastic and jaded there but I think it works for many installations. All in all a successful project right? I say sure. But do all parties agree today? tomorrow? 3 years from now?
Lets break this down a bit more on the data center side. The data center manager has to allocate power out for the deployment. He has already de-rated the server draw but there is a certain minimum amount of infrastructure he has to deploy regardless. The power being pulled from those PDUs are taking up valuable slots inside that equipment. Think of your stereo equipment at home, there are only so many speakers you can connect to your base unit no matter how loud you want it to get. The data center manager had to make certain decisions based upon the rack configuration. If we believe that they can squeeze 10 of these new servers into a rack, the data center manager has pulled enough capacity to address 2.1 kilowatts per rack (210watts*10 servers). With twenty total servers that means he has two racks of 2.1kilowatts of base load. Sounds easy right? Its just math. And Mike – you said it was harder than regular math. You lied. Did I? Well it turns out that physics is physics and as Scotty from the Enterprise taught us, “You cannot change the laws of Physics, Jim!” Its likely that the power capacity being allocated to the rack might actually be a bit over the 2.1kilowatts due to the nature of what sized circuits might be required. For example he or she may have only needed enough power for 32 amps of power, but because of those pesky connections he had to pull two 20 amp circuits. Lets say for the sake of argument that in this case he has to reserve 2.5 kilowatts as a function of the physical infrastructure requirements. You start to see a little waste right? Its a little more than one servers expected draw, so you might think its not terribly bad. As a business manager, your frustrated with that waste but you might be ok with it. Especially since its a new facility and you have plenty of capacity.
But wait! Remember that dual cording thing? Now you have to double the power you are setting aside. You have to ensure that you have enough power to ensure you can maintain the servers. Usually this is from another PDU so that you can survive a single PDU failure. Additionally you need to reserve that each side (each cord) has enough power to failover. In some cases the total load of the server is divided between the two power supplies, in some cases, power is drawn from the primary with a small trickle of draw from the redundant connection. If the load is divided between both power supplies you are effectively drawing HALF of the total reserved power. If its the situation where they draw full load off one, and have a trickle draw off the second power supply, you are actually drawing the correct amount on one leg, and dramatically less than HALF on the second. Either way the power is allocated and reserved and I bet its more than you thought when we started out this crazy story. Well hold tight, because its going to get even more complicated in a bit.
Right now, any data center manager in the world is reading this and screaming at the remedial nature of this post. This is Facilities Management 101 right after the seminar entitled ‘The EPO button is rarely your friend’ . In fact, I am probably insulting their intelligence because there are even more subtleties than what I have outlined here. But my dear facilities comrade, its not to you I am writing this section to. Its the business management and IT folks. With physics being its pesky self combined with some business decisions, you are effectively taking more power down that you initially thought. Additionally you now have a tax on future capacity as that load associated with physics and redundancy is forever in reserve. Not to be touched without great efforts if at all.
Dual Cording is not bad. Redundancy is not bad. Its a business risk, and that’s something you can understand, and in fact, as a business manager its something I would be willing to bet you do every day in your job. You are weighing the impact of the outage of the business to actual cost. One can even easily calculate the cost of such a decision by taking proportional allocations of your capital cost from an infrastructure perspective and weigh it against the economic impact of not having certain tools and applications available. Even when this is done and its well understood, there is a strange phenomena of amnesia that sets in and in a few months/years the same business manager may look at the facility and give the facilities person a hard time for not utilizing all the power. To Data Center Managers – Having sat as a Data Center Professional for many years – I’m sad to say, you can expect to have this “Reserved” power conversations multiple times with your manager over and over again, especially when things get tight in terms of capacity left. To business managers, book mark this post and read it about every 6 months or so.
Taking it up a notch for a moment…
That last section introduced the concept of Reserved Power. Reserved Power is a concept that sits at the Facility level of Capacity Planning. When a data center hall is first built out there are three terms and concepts you need to know. The first is Critical load (sometimes called IT load). This is the power available to IT and computer equipment in your facility. The second is called Non-Critical load, which has to do with the amount of power allocated to things like lighting, mechanical systems and your electrical plant, generators, etc. What I commonly call ‘Back of the house’ or ‘Big Iron’. The last term is Total load. Total load is the total amount of power available to the facility and can usually be calculated by adding Critical and Non-Critical loads.
A Facility is born with all three facets. You generally cannot have one without the other. In plan on having a future post called ‘Data Center Metrics for Dummies’ which will explore the interconnection between these. For now lets keep it really simple.
The wonderful facility we have built has a certain amount of IT gear that it will hold. Essentially every server we deploy into the facility will subtract from the total amount of Critical Load available for new deployments. As we deduct the power from the facility we are allocating that capacity out. In our previous example we deployed two racks at 2.5kilowatts (and essentially reserved capacity for two more for redundancy). With those two racks we have allocated enough power for 5 kilowatts of real draw and have reserved 10 kilowatts in total.
Before people get all mad at me, I just want to point out that some people don’t count the dual cording extra because they essentially de-rate the room with the understanding that everything will be dual corded. I’m keeping it simple for people to understand what’s actually happening.
Ok back to the show – As I mentioned those racks would really only draw 2.1kW each at full load (we have essentially stranded 400watts per rack of potential capacity and combined its almost 800 watts per rack). As a business we already knew this but we still have to calculate it out and apply our “Budgeted Power” to the room level. So, across our two racks we have an allocated power of 5 kilowatts, with a budgeted amount of 4.2 kilowatts.
Now here is where our IT friends come into play and make things a bit more difficult. That wonderful application that was going to solve world hunger for us and was going to be such a beefy application from a resources perspective is not living up to its reputation. Instead of driving 100% utilization, its sitting down around 8 percent per box. In fact the estimated world wide server utilization number for servers sits between 5-14%. Most server manufacturers have built their boxes in a way where they draw less power at lower utilizations. Therefore our 210watts per server might be closer to 180watts per server of “Actual Load”. That’s another 30watts per server. So while we have allocated 2.5kilowatts, and reserved 2.1kilowatts, we are only drawing 1.8kilowatts of power. We have two racks, we so double it. So now we are in a situation where we are not using 700watts per rack or 1.4 kilowatts across our two racks. Ouch that’s potentially 28% of power wasted!
The higher IT and applications drive the utilization rate, the lower amount of waste you will have. Virtualization can help here, but its not without its own challenges as we will see.
Big Rocks Little Rocks…
Now luckily, as a breed, data center managers are a smart bunch and they have a various of ways to try and reduce this waste. It goes back to our concept of budgeted or reserved power combined with our “stereo jacks” in the PDU. As long as we have some extra jacks, the Data Center Manager can return back to our two racks and artificially set a lower power budget per rack. This time after metering the power for some time he makes the call to artificially limit the racks allocation to 2 kilowatts – he could go to 1.8kilowatts, but remember he is conservative and wants to still give himself some cushion. He can then deploy new racks or cabinets and pretend that the extra 200 watts to the new racks. He can continue this until he runs out of power or out of slots on the PDU. This is a manual process that is physically managed by the facilities manager. There is an emerging technology called power capping which will allow you to do this in software on a server to server basis which will be hugely impactful in our industry, its just not ready for prime time yet.
This inefficiency in allocation creates strange gaps and holes in data centers. Its a phenomena I call Big Rocks, Little Rocks, and like everything in this post is somehow tied to physics.
In this case it was my freshman year physics class in college. The professor was at the front of the class with a bucket full of good sized rocks. He politely asked if the the bucket was full. The class of course responded in the affirmative. He then took out a smaller bucket of pebbles and poured and rigorously sifted and shook the heck out of the larger bucket until every last pebble was emptied into that bucket with the big rocks. He asked again, “Now is it full?” The class responded once more in the affirmative and he pulled out a bucket of sand. He proceeded to re-perform the sifting and shaking, etc and emptied the sand into the bucket. “Its finally full now right?” The class shook their heads one last time in the affirmative and he produced a small bucket of water and poured it into the bucket as well.
That ‘Is the bucket full exercise’ is a lot like the capacity planning that every data center manager eventually has to get very good at. Those holes in capacity at a rack level or PDU level I spoke out are the spaces for servers and equipment to ultimately fit into. At first its easy to fit in big rocks, then it gets harder and harder. You are ultimately left trying to manage to those small spaces of capacity. Trying to utilize every last bit of energy in the facility.
This can be extremely frustrating to business managers and IT personnel. Lets say you do a great job of informing the company how much capacity you actually have in your facility, if there is no knowledge of our “rocks” problem you can easily get yourself into trouble.
Lets go back for a second to our example facility. Time has now passed and our facility is now nearly full. Out of the 1MW of total capacity we have been very aggressive in managing our holes and still have 100kilowatts of capacity. The IT personnel have a new application that is database intensive that will draw 80 kilowatts and because the facility manager has done a good job of managing his facility, there is every expectation that it will be just fine. Until of course they mention that these servers have to be contiguous and close together for performance or even functionality purposes. The problem of course is that you now have a large rock that you need to try and squeeze into small rock places. It wont work. It may actually even force you to either move other infrastructure around in your facility impacting other applications and services, or cause you to get more data center space.
You see the ‘Is it full exercise’ does not work in reverse. You cannot fill a bucket with water, then sand, then pebbles, then rocks. Again lack of understanding can lead to ill-will or the perception that the data center manager is not doing a good job of managing his facility when in fact they are being very aggressive in that management. Its something the business side and IT side should understand.
Virtualization Promise and Pitfalls…
Virtualization is a topic unto itself that is pretty vast and interesting, but I did want to point out some key things to think about it. As you hopefully saw, server utilization has a huge impact on power draw. The higher the utilization the better performance from a power perspective. Additionally many server manufacturers have certain power ramps built into their equipment where you might see an incrementally large jump in power consumption from 11 percent to 12 percent for example. It has to do with throttling of the power consumption I mentioned above. This is a topic that most facility managers have no experience and knowledge of as it has more to do with server design and performance. If your facility manager is aggressively managing your facility as in the example above and virtualization is introduced, you might find yourself tripping circuits as you drive the utilization higher and it crosses these internal utilization thresholds. HP has a good paper talking about how this works. If you pay particular attention to page 14, The lower line is the throttled processor as a function utilization. The upper line is full speed as a function of utilization and then their dynamic power regulation feature is the one that jumps up to full speed a 60% utilization. This gives the box performance only at high utilizations. Its a feature that is turned on by default in HP Servers. Other manufacturers have similar technologies built into their products as well. Typically your Facilities people would not be reading such things. Therefore its imperative that when considering virtualization and its impacts – it should be something that the IT folks and Data Center managers should work on jointly.
I hope this was at least partially valuable out there and hopefully explained some things that may have been considered black box or arcane data center challenges in your mind. Keep in mind with this series I am trying to educate on all sides the challenges we are facing together.