Data Center – LooseBolts

IPv6 to IPv4 Translation Made Business Beautiful. Think an Easy, less painful to your business in transitioning your Data Center.

I am a lover of simple, efficient, and beautiful things. Ivan Peplnjak of ipSpace gets The Loosebolt’s Oscar Award for Elegance and Simplicity in a Complex Network Application. There may not be a little statue holding up a giant router or anything but his solution to IPv4 to IPv6 translation on the Internet is pretty compelling and allows the application developers and IT folks to “outsource” all concerns about this issue to the network.

At some point your Data Centers and network are going to have to tackle the interface between the commercial IPv4 Internet and the IPv6 Internet. If you are pretty aggressive on the IPv6 conversion in your data center, that pesky IPv4 Internet is going to prove to be a problem. Some think this can be handled by straight Network Address Translation, or having to dual home the servers in your data center on both networks. But this challenge has cascading challenges to your organization. Essentially it creates work for your System Admins, your developers, Web admins, etc. In short, you may have to figure out solutions at every level of the stack. I think Ivan’s approach makes it pretty simple and compelling if a bit of an unorthodox. His use of Stateless IP/ICMP Translation, which was originally intended a part of NAT64 and not on its own, solves an interesting problem and allows businesses to begin the conversion in a way that allows them to solve it one layer at a time and still allow those non-adopting IPv4 folks access to all the goodness within your data center.

His webcast on his approach can be found here.

\Mm

Insider Redux: Data Barn in a Farm Town

I thought I would start my first post by addressing the second New York Times article first. Why? Because it specifically mentions activities and messages sourced from me at the time when I was responsible for running the Microsoft Data Center program. I will try to track the timeline mentioned in the article with my specific recollections of the events. As Paul Harvey used to say, so then you could know the ‘REST of the STORY’.

I remember my first visit to Quincy, Washington. It was a bit of a road trip for myself and a few other key members of the Microsoft site selection team. We had visited a few of the local communities and power utility districts doing our due diligence on the area at large. Our ‘Heat map’ process had led us to Eastern Washington state. Not very far (just a few hours) from the ‘mothership’ of Redmond, Washington. It was a bit of a crow eating exercise for me as just a few weeks earlier I had proudly exclaimed that our next facility would not be located on the West Coast of the United States. We were developing an interesting site selection model that would categorize and weight areas around the world. It would take in FEMA disaster data, fault zones, airport and logistics information, location of fiber optic and carrier presence, workforce distributions, regulatory and tax data, water sources, and power. This was going to be the first real construction effort undertaken by Microsoft. The cost of power was definitely a factor as the article calls out. But just as equal was the generation mix of the power in the area. In this case a predominance of hydroelectric. Low to No carbon footprint (Rivers it turns out actually give off carbon emissions I came to find out). Regardless the generation mix was and would continue to be a hallmark of site selection of the program when I was there. The crow-eating exercise began when we realized that the ‘greenest’ area per our methodology was actually located in Eastern Washington along the Columbia River.

We had a series of meetings with Real Estate folks, the local Grant County PUD, and the Economic Development folks of the area. Back in those days the secrecy around who we were was paramount, so we kept our identities and that of our company secret. Like geeky secret agents on an information gathering mission. We would not answer questions about where we were from, who we were, or even our names. We ‘hid’ behind third party agents who took everyone’s contact information and acted as brokers of information. That was early days…the cloak and dagger would soon come out as part of the process as it became a more advantageous tool to be known in tax negotiations with local and state governments.

During that trip we found the perfect parcel of land, 75 acres with great proximity to local sub stations, just down line from the Dams on the nearby Columbia River. It was November 2005. As we left that day and headed back it was clear that we felt we had found Site Selection gold. As we started to prepare a purchase offer we got wind that Yahoo! was planning on taking a trip out to the area as well. As the local folks seemingly thought that we were a bank or large financial institution they wanted to let us know that someone on the Internet was interested in the area as well. This acted like a lightning rod and we raced back to the area and locked up the land before they Yahoo had a chance to leave the Bay Area. In these early days the competition was fierce. I have tons of interesting tales of cloak and dagger intrigue between Google, Microsoft, and Yahoo. While it was work there was definitely an air of something big on the horizon. That we were all at the beginning of something. In many ways many of the Technology professionals involved regardless of company forged some deep relationships and competition with each other.

Manos on the Bean Field December 2005 The article talks about how the ‘Gee-Whiz moment faded pretty fast’. While I am sure that it faded in time (as all things do), I also seem to recall the huge increase of local business as thousands of construction workers descended upon this wonderful little town, the tours we would give local folks and city council dignitaries, a spirit of true working together. Then of course there was the ultimate reduction in properties taxes resulting from even our first building and an increase in home values to boot at the time. Its an oft missed benefit that I am sure the town of Quincy and Grant County has continued to benefit from as the Data Center Cluster added Yahoo, Sabey, IAC, and others. I warmly remember the opening day ceremonies and ribbon cutting and a sense of pride that we did something good. Corny? Probably – but that was the feeling. There was no talk of generators. There were no picket signs, in fact the EPA of Washington state had no idea on how to deal with a facility of this size and I remember openly working in partnership on them. That of course eventually wore off to the realities of life. We had a business to run, the city moved on, and concerns eventually arose.

The article calls out a showdown between Microsoft and the Power Utility District (PUD) over a fine for missing capacity forecasting target. As this happened much after I left the company I cannot really comment on that specific matter. But I can see how that forecast could miss. Projecting power usage months ahead is more than a bit of science mixed with art. It gets into the complexity of understanding capacity planning in your data centers. How big will certain projects grow. Will they meet expectations?, fall short?, new product launches can be duds or massive successes. All of these things go into a model to try and forecast the growth. If you think this is easy I would submit that NOONE in the industry has been able to master the crystal ball. I would also submit that most small companies haven’t been able to figure it out either. At least at companies like Microsoft, Google, and others you can start using the law and averages of big numbers to get close. But you will always miss. Either too high, or too low. Guess to low and you impact internal budgeting figures and run rates. Not Good. Guess to high and you could fall victim to missing minimal contracts with utility companies and be subject to fines.

In the case mentioned in the article, the approach taken if true would not be the smartest method especially given the monthly electric bill for these facilities. It’s a cost of doing business and largely not consequential at the amount of consumption these buildings draw. Again, if true, it was a PR nightmare waiting to happen.

At this point the article breaks out and talks about how the Microsoft experience would feel more like dealing with old-school manufacturing rather than ‘modern magic’ and diverts to a situation at a Microsoft facility in Santa Clara, California.

The article references that this situation is still being dealt with inside California so I will not go into any detailed specifics, but I can tell you something does not smell right in the state of Denmark and I don’t mean the Diesel fumes. Microsoft purchased that facility from another company. As the usage of the facility ramped up to the levels it was certified to operate at, operators noticed a pretty serious issue developing. While the building was rated to run at certain load size, it was clear that the underground feeders were undersized and the by-product could have polluted the soil and gotten into the water system. This was an inherited problem and Microsoft did the right thing and took the high road to remedy it. It is my recollection that all sides were clearly in know of the risks, and agreed to the generator usage whenever needed while the larger issue was fixed. If this has come up as a ‘air quality issue’ I personally would guess that there is politics at play. I’m not trying to be an apologist but if true, it goes to show that no good deed goes unpunished.

At this point the article cuts back to Quincy. It’s a great town, with great people. To some degree it was the winner of the Internet Jackpot lottery because of the natural tech resources it is situated on. I thought that figures quoted around taxes were an interesting component missed in many of the reporting I read.

“Quincy’s revenue from property taxes, which data centers do pay, has risen from $815,250 in 2005 to a projected $3.6 million this year, paying for a library and repaved streets, among other benefits, according to Tim Snead, the city administrator.”

As I mentioned in yesterday’s post my job is ultimately to get things done and deliver results. When you are in charge of a capital program as large as Microsoft’s program was at the time – your mission is clear – deliver the capacity and start generating value to the company. As I was presented the last crop of beans harvested from the field at the ceremony we still had some ways to go before all construction and capacity was ready to go. One of the key missing components was the delivery and installation of a transformer for one of the substations required to bring the facility up to full service. The article denotes that I was upset that the PUD was slow to deliver the capacity. Capacity I would add that was promised along a certain set of timelines and promises and commitments were made and money was exchanged based upon those commitments. As you can see from the article, the money exchanged was not insignificant. If Mr. Culbertson felt that I was a bit arrogant in demanding a follow through on promises and commitments after monies and investments were made in a spirit of true partnership, my response would be ‘Welcome to the real world’. As far as being cooperative, by April the construction had already progressed 15 months since its start. Hardly a surprise, and if it was, perhaps the 11 acre building and large construction machinery driving around town could have been a clue to the sincerity of the investment and timelines. Harsh? Maybe. Have you ever built a house? If so, then you know you need to make sure that the process is tightly managed and controlled to ensure you make the delivery date.

The article then goes on to talk about the permitting for the Diesel generators. Through the admission of the Department of Ecology’s own statement, “At the time, we were in scramble mode to permit our first one of these data centers.” Additionally it also states that:

Although emissions containing diesel particulates are an environmental threat, they were was not yet classified as toxic pollutants in Washington. The original permit did not impose stringent limits, allowing Microsoft to operate its generators for a combined total of more than 6,000 hours a year for “emergency backup electrical power” or unspecified “maintenance purposes.”

At the time all this stuff was so new, everyone was learning together. I simply don’t buy that this was some kind Big Corporation versus Little Farmer thing. I cannot comment on the events of 2010 where Microsoft asked for itself to be disconnected from the Grid. Honestly that makes no sense to me even if the PUD was working on the substation and I would agree with the articles ‘experts’.

Well that’s my take on my recollection of events during those early days of the Quincy build out as it relates to the articles. Maybe someday I will write a book as the process and adventures of those early days of birth of Big Infrastructure was certainly exciting. The bottom line is that the data center industry is amazingly complex and the forces in play are as varied as technology to politics to people and everything in between. There is always a deeper story. More than meets the eye. More variables. Decisions are never black and white and are always weighted against a dizzying array of forces.

\Mm

Budget Challenged States, Data Center Site Selection, and the Dangers of Pay to Play

Site Selection can be a tricky thing. You spend a ton of time upfront looking for that perfect location. The confluence of dozens of criteria, digging through fiber maps, looking at real estate, income and other state taxes. Even the best laid plans, and most thoughtful of approaches can be waylaid by changes in government, the emergence of new laws, and other regulatory changes which can put your selection at risk. I was recently made aware of yet another cautionary artifact you might want to pay attention to: Pay to Play laws and budget challenged States.

As many of my frequent readers know, I am from Chicago. In Chicago, and Illinois at large “Pay to Play” has much different connotations than the topic I am about to bring up right now. In fact the Chicago version broke out into an all out National and International Scandal. There is a great book about it if you are interested, aptly entitled, Pay to Play.

The Pay to Play that I am referring to is an emerging set of regulations and litigation techniques that require companies to pay tax bills upfront (without any kind of recourse or mediation) which then forces companies to litigate to try and recover those taxes if unfair. Increasingly I am seeing this in states where the budgets are challenged and governments are looking for additional funds and are targeting Internet based products and services. In fact, I was surprised to learn that AOL has been going through this very challenge. While I will not comment on the specifics of our case (its not specifically related to Data Centers anyway) it may highlight potential pitfalls and longer term items to take into effect when performing Data Center Site Selection. You can learn more about the AOL case here, if you are interested.

For me it highlights that lack of understanding of Internet services by federal and local governments combined with a lack of inhibition in aggressively pursuing revenue despite that lack of understanding can be dangerous and impactful to companies in this space. These can pose real dangers especially in where one site selects for their facility. These types of challenges can come into play whether you are building your own facility, selecting a colocation facility and hosting partner, or if stretched eventually where your cloud provider may have located their facility.

It does beg the question as to whether or not you have checked into the financial health of the States you may be hosting your data and services in. Have you looked at the risk that this may pose to your business? It may be something to take a look at!

\Mm

Chaos Monkeys, Donkeys and the Innovation of Action

Last week I once again had the pleasure of speaking at the Uptime Institute’s Symposium. As one of the premiere events in the Data Center industry it is definitely one of those conferences that is a must attend to get a view into what’s new, what’s changing, and where we are going as an industry. Having attended the event numerous times in the past, this year I set out on my adventure with a slightly different agenda.

Oh sure I would definitely attend the various sessions on technology, process, and approach. But this time I was also going with the intent to listen equally to the presenters as well as the scuttlebutt, side conversations, and hushed whispers of the attendees. Think of it as a cultural experiment in being a professional busy body. As I wove my way around from session to session I was growing increasingly anxious that while the topics were of great quality, and discussed much needed areas of improvement in our technology sector – most of them were issues we have covered, talked about and have been dealing with as an industry for many years. In fact I was hard pressed to find anything of real significance in the new category. These thoughts were mirrored in those side conversations and hushed whispers I heard around the various rooms as well.

One of the new features of Symposium is that the 451 Group has opted to expand the scope of the event to be more far reaching covering all aspects of the issues facing our industry. It has brought in speakers from Tier 1 Research and other groups that have added an incredible depth to the conference. With that depth came some really good data. In many respects the data reflected (in my interpretation) that while technology and processes are improving in small pockets, our industry ranges from stagnant to largely slow to act. Despite mountains of data showing energy efficiency benefits, resulting cost benefits, and the like we just are not moving the proverbial ball down the field.

In a purely unscientific poll I was astounded to find out that some of the most popular sessions were directly related to those folks who have actually done something. Those that took the new technologies (or old technologies) and put them into practice were roundly more interesting than more generic technology conversations. Giving very specific attention to detail on the how they accomplished the tasks at hand, what they learned, what they would do differently. Most of these “favorites” were not necessarily in those topics of “bleeding edge” thought leadership but specifically the implementation of technologies and approaches we have talked about the event for many years. If I am honest, one of those sessions that surprised me the most was our own. AOL had the honor of winning an IT Innovation Award from Uptime and as a result the teams responsible for driving our cloud and virtualization platforms were allowed to give a talk about what we did, what the impact was and how it all worked out. I was surprised because I was not sure how many people would come to this side session and listen to presentation or find the presentation relevant. Of course I thought it was relevant (We were after all going to get a nifty plaque for the achievement) but to my surprise the room was packed full, ran out of chairs, and had numerous people standing for the presentation. During the talk we had a good interaction of questions from the audience and after the talk we were inundated with people coming up to specifically dig into more details. We had many comments around the usefulness of the talk because we were giving real life experiences in making the kinds of changes that we as an industry have been talking about for years. Our talk and adaption of technology even got a little conversation in some of the Industry press such as Data Center Dynamics.

Another session that got incredible reviews was the presentation by Andrew Stokes of Deutsche Bank who guided the audience through their adoption of 100% free air cooled data center in the middle of New York City. Again, the technology here was not new (I had built large scale facilities using this in 2007) – but it was the fact that Andrew and the folks at Deutsche Bank actually went out and did something. Not someone from those building large-scale cloud facilities, not some new experimental type of server infrastructure. Someone who used this technology servicing IT equipment that everyone uses, in a fairly standard facility who actually went ahead and did something Innovative. They put into practice something that others have not. Backed back facts, and data, and real life experiences the presentation went off incredibly and was roundly applauded by those I spoke with as one of the most eye-opening presentations of the event.

By listening the audiences, the hallway conversations, and the multitude of networking opportunities throughout the event a pattern started to emerge, a pattern that reinforced the belief that I was already coming to in my mind. Despite a myriad of talk on very cool technology, application, and evolving thought leadership innovations – the most popular and most impactful sessions seemed to center on those folks who actually did something, not with the new bleeding edge technologies, but utilizing those recurring themes that have carried from Symposium to Symposium over the years. Air Side economization? Not new. Someone (outside Google, Microsoft, Yahoo, etc) doing it? Very New-Very Exciting. It was what I am calling the Innovation of ACTION. Actually doing those things we have talked about for so long.

While this Innovation of Action had really gotten many people buzzing at the conference there was still a healthy population of people who were downplaying those technologies. Downplaying their own ability to do those things. Re-stating the perennial dogmatic chant that these types of things (essentially any new ideas post 2001 in my mind) would never work for their companies.

This got me thinking (and a little upset) about our industry. If you listen to those general complaints, and combine it with the data that we have been mostly stagnant in adopting these new technologies – we really only have ourselves to blame. There is a pervasive defeatist attitude amongst a large population of our industry who view anything new with suspicion, or surround it with the fear that it will ultimately take their jobs away. Even when the technologies or “new things” aren’t even very new any more. This phenomenon is clearly visible in any conversation around ‘The Cloud’ and its impact on our industry. The data center professional should be front and center on any conversation on this topic but more often than not self-selects out of the conversation because they view it more as an application thing, or more IT than data center thing. Which is of course complete bunk. Listening to those in attendance complain that the ‘Cloud’ is going to take their jobs away, or that only big companies like Google , Amazon, Rackspace, or Microsoft would ever need them in the future were driving me mad. As my keynote at Uptime was to be centered around a Cloud survival guide – I had to change my presentation to account for what I was hearing at the conference.

In my talk I tried to focus on what I felt to be emerging camps at the conference. To the first, I placed a slide prominently featuring Eeyore (from Winnie the Pooh fame) and captured many of the quotes I had heard at the conference referring to how the Cloud, and new technologies were something to be mistrusted rather than an opportunity to help drive the conversation. I then stated that we as an industry were an industry of donkeys. That fact seems to be backed up by data. I have to admit, I was a bit nervous calling a room full of perhaps the most dedicated professionals in our industry a bunch of donkeys – but I always call it like I see it.

I contrasted this with those willing to evolve their thought forward, embrace that Innovation of Action by highlighting the Cloud example of Netflix. When Netflix moved heavily into the cloud they clearly wanted to evolve past the normal IT environment and build real resiliency into their product. They did so by creating a rogue process (on purpose) call the Chaos Monkey which randomly shut down processes and wreaked havoc in their environment. At first the Chaos Monkey was painful, but as they architected around those impacts their environments got stronger. This was no ordinary IT environment. This was something similar, but new. The Chaos Monkey creates Action, results in Action and on the whole moves the ball forward.

Interestingly after my talk I literally have dozens of people come up and admit they had been donkeys and offered to reconnect next year to demonstrate what they had done to evolve their operations.

My challenge to the audience at Uptime, and ultimately my challenge to you the industry is to stop being donkeys. Lets embrace the Innovation of Action and evolve into our own versions of Chaos Monkeys. Lets do more to put the technologies and approaches we have talked about for so long into action. Next Year at Uptime (and across a host of other conferences) lets highlight those things that we are doing. Lets put our Chaos Monkeys on display.

As you contemplate your own job – whether IT or Data Center professional….Are you a Donkey or Chaos Monkey?

\Mm

Olivier Sanche, My Dear Friend, Adieu

The data center industry has suffered a huge loss this holiday weekend with the passing of Olivier Sanche, head of Apple’s Data Center program. He was an incredibly thoughtful man, a great father and husband, and very sincerely a great friend. As I got off the phone with his brother and wife in France who gave me this devastating news and I could not help but remember my first encounter with Olivier. At the time he worked for Ebay and we were both invited to speak and debate at an industry event in Las Vegas. As we sat in a room full of ‘experts’ to discuss the future of our industry, the conversation quickly turned controversial. Passions were raised and I found myself standing side by side with this enigmatic French giant on numerous topics. His passion for the space coupled with his cool logic were items that endeared me greatly to the man. We were comrades in ideas, and soon became fast friends.

Olivier was the type of person who could light up a room with his mere presence. It was as if he embraced the entire room in one giant hug even if they were strangers. He could sit quietly mulling a topic, pensively going through his calculations and explode into the conversation and rigorously debate everyone. That passion never belied his ability to learn, to adapt, to incorporate new thinking into his persona either. Through the years we knew each other I saw him forge his ideas through debate, always evolving. Many people know the public Olivier, the Olivier they saw at press conferences, or speaking engagements, and the like. Some of us, got to know Olivier much better. The data center industry is small indeed and those of us who have had the pleasure and terror at working in the worlds largest infrastructures know a special kind of bond. We routinely meet off-hours and have dinner and drinks. Its a small cadre of names you probably know, or have heard about, joined in the fact that we have all dealt with or are dealing with challenges most data center environments will never see. In these less formal affairs, company positions melted away, technological challenges came to the fore, and most importantly the real people behind these companies emerge. In these forums, you could always count on Olivier to be a warm and calming force. He was incredibly intelligent, and although he might disagree, you could count on him to champion the free discussion of ideas.

It was in those types of forums where I truly met Olivier. The man who was so dedicated to his family, and the light of his life little Emilie. His honesty and direct to the point style made it easy to understand where you stood, and where he was coming from.

More information about memorial services and the like will be coming out shortly and they are trying to get the word out to all of his friends.

The world has lost a great mind, Apple has lost a visionary, His family has lost their world, and I have lost a good friend.

Adieu, Dear Olivier, You and your family will be in my thoughts and prayers.

Your friend,

Mike Manos

\Mm

Data Center Regulation Awareness Increasing, Prepare for CO2K

This week I had the pleasure of presenting at the Gartner Data Center Conference in Las Vegas, NV. This was my first time presenting at the Gartner event and it represented an interesting departure from my usual conference experience in a few ways and I came away with some new observations and thoughts. As always, the greatest benefit I personally get from these events is the networking opportunities with some of the smartest people across the industry. I was surprised by both the number of attendees ( especially given the economic drag and the almost universal slow-down on corporate travel) and the quality of questions I heard in almost every session.

My talk centered around the coming Carbon Cap and Trade Regulation and its specific impact on IT organizations and the data center industry. I started my talk with a joke about how excited I was to be addressing a room of tomorrow’s eco-terrorists. The joke went flat and the audience definitely had a fairly serious demeanor. This was reinforced when I asked how many people in the audience thought that regulation was a real and coming concern for IT organizations. Their response startled me.

I was surprised because nearly 85% of the audience had raised their hands. If I contrast that to the response to the exact same question asked three months earlier at the Tier One Research Data Center Conference where only about 5% of the audience raised their hands, its clear that this is a message that is beginning to resonate, especially in the larger organizations.

In my talk, I went through the Carbon Reduction Commitment legislation passed in the UK and the negative effects it is having upon data center and IT industry there, as well as the negative impacts to Site Selection Activity that it is causing firms to skip investing Data Center capital in the UK by and large. I also went through the specifics of the Waxman-Markey bill in the US House of Representatives and the most recent thought on the various Senate based initiatives on this topic. I have talked here about these topics before, so I will not rehash those issues for this post. Most specifically I talked about the potential cost impacts to IT organizations and Data Center Operations and the complexity of managing both carbon reporting and both direct and indirect costs resulting from these efforts.

While I was pleasantly surprised by the increased awareness of senior IT, business managers, and Data Center Operators around the coming regulation impacts, I was not surprised by the responses I received with regards to their level of preparedness to reacting these initiatives. Less than 10% of the room had the technology in place to even begin to collect the needed base information for such reporting and roughly 5% had begun a search for software or initiate development efforts to aggregate and report this information.

With this broad lack of infrastructural systems in place, let alone software for reporting – I predict we are going to see a phenomena similar to the Y2K craziness in the next 2-3 years. As the regulatory efforts here in the United States and across the EU begin to crystallize, organizations will need to scramble to get the proper systems and infrastructure in place to ensure compliance. I call this coming phenomena – CO2K. Regardless what you call it, I suspect, the coming years will be good for those firms with power management infrastructure and reporting capabilities.

\Mm

Speaking on Container Solutions and Applications at Interop New York

I have been invited to speak and chair a panel at Interop, New York (November 16-20, 2009) to give a talk exploring the hype and reality surrounding Data Center based containers and Green IT in general.

The goal of the panel discussion will help data center managers evaluate and approach containers by understanding their economics, key considerations and real-life customer examples. It’s going to be a great conversation. If you are attending Interop this year I would love to see you there!

\Mm

Changing An Industry of Cottage Industries

If you happen to be following the news around Digital Realty Trust you may have seen the recent announcement of our Pod Architecture Services (PAS) offering. Although the response has been deafening there seems to be a lot of questions and confusion around what it is, what it is not, and what this ultimately means for Digital Realty Trust and our place in the industry.

First a simple observation – the Data Center Industry as it stands today is in actuality an industry of cottage industries. Its an industry dominated by boutique firms in specialized niches all in support of the building out of these large technically complex facilities. For the initiated its a world full of religious arguments like battery versus rotary, air-side economization versus water-side economization, raised floor versus no raised floor. To the uninitiated its an industry categorized by mysterious wizards of calculus and fluid dynamics and magical electrical energies. Its an illusion the wizards of the collective cottage industries are well paid and incented to keep up. They ply their trade in ensuring that each facility’s creation is a one-off event, and likewise, so is the next one. Its a world of competing General Contractors, architecture firms, competing electrical and mechanical firms, of specialists in all sizes, shapes and colors. Ultimately – in my mind there is absolutely nothing wrong with this. Everyone has the right to earn a buck no matter how inefficient the process.

After all, there is a real science to most of the technologies and application of design involved in data centers and the magical mysteries they control are real. They are are all highly trained professionals educated in some very technical areas. But if we are honest, each generally looks at the data center world from their own corner of the eco-system and while they solve their own issues and challenges quite handily, they stumble when having to get out of their comfort zone. When they need to cooperate with other partners in the eco-system and solve more far reaching issues it almost universally results in those solutions being applied in a one-off or per-job perspective. I can tell you that leveraging consistency across a large construction program is difficult at best even with multiple projects underway, let alone a single project.

The challenge of course is that in reality the end-user/owner/purchaser does not view the data center as an assembly of different components but rather as a single data center facility. The complexity in design and construction are must-have headaches for the business manager who ultimately just wants to sate his need to have capacity for some application or business solution.

Now into that background enter our POD Architecture Services offering. In a nutshell it allows those customers who do not necessarily want to lease (or cannot due to regulatory or statutory reasons) a facility to use their own capital in building a facility without all the complexity associated with these buildings.

Our PAS offering is ultimately a way for a company to purchase a data center product. Leveraging Digital’s Data Center product approach a company can simply select the size and configuration of their facility using the same “SKU’s” we use internally in building out of our own facilities. In effect we license the use of our design to these customers so that they enjoy the same benefits as our the customers of our turn-key data center facilities.

This means that customers of our PAS product can leverage our specific designs which are optimized across the four most crucial aspects of the data center lifecycle. Our facility design is optimized around cost, mission, long term simplicity in operability and efficiency. This is anchored around a belief that a design is comprised of both upfront first time costs along with the lifetime costs of the facility. This is an “owners” perspective, which is the only perspective we have. As the world’s largest data center REIT and wholesaler, we need to take the full picture into account. We will own these facilities for a very long time.

Many designs like to optimize around the technology, or around the upfront facility costs, or drive significant complexity in design to ensure that every possible corner case is covered in the facility. But the fact is if you cut corners up front, you potentially pay for it over the life of the asset, if you look for the most technologically advanced gear or provide for lots of levers, knobs, and buttons for the ultimate flexibility you open yourself up for more human error or drive more costs in the ongoing operation of the facility. The owners perspective is incredibly important. Many times companies allow these decisions to be made by their partner firms (the cottage industries) and this view gets clouded. Given the complexity of these buildings and the fact that they are not built often by the customers in the first time its hard to maintain that owners perspective without dedicated and vigilant attention. PAS changes all that as the designs have already been optimized and companies can simply purchase the product they most desire with the guarantee of what they receive on the other end of the process, is what they expected.

Moreover, PAS includes the watchful eye, experience and oversight of our veteran data center construction management. These are highly qualified program managers who have built tens of data centers and act as highly experienced owners representatives. The additional benefit being that they have built this product multiples of times and have become very good at the delivery of these types of facilities based upon our standardized product. In addition, our customers can leverage our significant supply chain of construction partners, parts and equipment which allows for incredible benefits in the speed of delivery of these buildings along with reductions in upfront costs due to our volume purchasing.

This does not mean that Digital is going to become an Architectural or engineering firm and stamp drawings. This does not mean we will become the general contractor. This simply means that we will leverage our supply chain to deliver our designs and facilities based upon our best practices on behalf of the customers in a process that is completely managed and delivered by experienced professionals. We will still have general contractors, and A&E firms, and the like that have experience in building our standardized product. We are driving towards standardization. If you believe there is value in having each facility as a one off design, the more power to you. We have a service there too, its call Build to Suit. PAS is just another step in the formal definition of standard data center product. Its a key element in modularization of capacity. It is through standardization by which we can begin to have a larger impact on efficiency, and other key “Green” initiatives.

I have often referred to my belief that data centers are simply the sub-stations of the Information Utility. This service allows for commercial companies to start benefitting from the same standardization and efficiency gains that we are making in the wholesale space and enjoy the same cost factors.

Hope that makes things a bit clearer!

\Mm

Schneider / Digital New York Speaking Engagement

Just in case anyone wanted to connect – I wanted to highlight that I will be co-presenting the keynote at the Schneider Symposium at the Millenium Broadway hotel in New York City with Chris Crosby of Digital Realty Trust. I will also be giving a talk on practical applications of Energy Efficiency and sitting on an Energy Efficiency panel led by Dan Golding from Tier One Research. The program kicks off at 8am on Wednesday. Feel free to stop and say hi!

/Mm

Chiller-Side Chats: The Capacity Problem

I recently had a conversation with the CIO of a well respected company who informed me that his “data center people” had completely mismanaged his data center space which was now causing them to look at having to lease additional capacity or more aggressively pursue virtualization to solve the problem. Furthermore he was powerless to drive and address change as that data center facilities people worked for a different organization. To top it off it frustrated him to no end that in his mind they simply did not understand IT equipment or technologies being deployed. Unfortunately its a common refrain that I hear over and over again. It speaks to the heart of the problems with understanding data center issues in the industry.

How Data Center Capacity Planning Really works!

Data Center managers are by their very nature extremely conservative people. At the root of this conservatism is the understanding that if and when a facility goes down, it is their bottoms on the bottom line. As such, risk takers are very few and far between in this industry. I don’t think I would get much argument from most business-side managers, who would readily agree to that in a heart beat. But before we hang the albatross around the neck of our facilities management brethren lets take a look at some of the challenges they actually have.

Data Center Capacity Planning is a swirling vortex of science, art, best guess-work, and bad information with a sprinkling of cult of personality for taste. One would think that is should be a straight numbers and math game, but its not. First and foremost, the currency of Data Center Capacity Management and Planning is power. Simple right? Well we shall see.

Lets start at a blissful moment in time, when the facility is shiny and new. The floor shines from its first cleaning, the VESDA (Very Early Smoke Detection and Alarm) equipment has not yet begun to throw off false positives, and all is right with the world. The equipment has been fully commissioned and is now ready to address the needs of the business.

Our Data Center Manager is full of hope and optimism. He or she is confident that this time it will be much different than the legacy problems they had to deal with before. They now have the perfect mix of power and cooling to handle any challenge to be thrown at them. They are then approached by their good friends in Information Services with their first requirement. The business has decided to adopt a new application platform which will of course solve all the evils of previous installations.

Its a brand new day, a new beginning. The Data Center Manager asks the IT personnel how many servers are associated with this new deployment. They also ask how much power those servers will draw so that the room can be optimized for this wonderful new solution. The IT personnel may be using consultants, or maybe they are providing the server specifications themselves. In advanced cases they may even have standardized the types and classes of servers they use. How much power? Well, the nameplate on the server says that each of these bit crunching wonders will draw 300watts a piece. As this application is bound to be a huge draw on resources, they inform the facilities team that there are approximately 20 machines at 300watts that are going to be deployed.

The facilities team knows that no machine ever draws its nameplate ratings once ‘in the wilds’ of the data center and therefore for capacity planning purposes they ‘manually’ calculate a 30% reduction into the server deployment numbers. You see, its not that they don’t trust the IT folks, its just that they generally know better. So that nets out to a 90 watt reduction per server bringing the “budgeted power allocation” down to 210 watts per server. This is an important number to keep in mind. You now have two ratings that you have to deal with. Nameplate, and Budgeted. For advanced data center users they may use even more scientific methods of testing to derive their budgeted amount. For example they may run test software on the server designed to drive the machine to 100% CPU utilization, 100% disk utilization, and the like. Interestingly even after these rigorous tests, the machine never gets close to nameplate. Makes you wonder what that rating is even good for, doesn’t it? Our data center manager doesn’t have that level of sophistication, so he is using a 30% reduction. Keep these numbers in mind as we move forward.

The next question is typically are these servers dual or single corded? Essentially will these machines have redundancy built into the power supplies so in the event of a power loss they might still operate through another source of electricity? Well as every good business manager, IT professional, and data center manager knows – This is a good thing. Sure lets make them double corded.

The data center manager, now well armed with information begins building out the racks, pulls the power whips from diverse PDU (power distribution units) to the location of those racks to ensure that the wonders of dual cording can come to full effect.

The servers arrive, they are installed and in a matter of days the new application is humming along just fine, running into all kinds of user adoption issues, unexpected hick-ups, budget over-runs, etc. Okay maybe I am being a bit sarcastic and jaded there but I think it works for many installations. All in all a successful project right? I say sure. But do all parties agree today? tomorrow? 3 years from now?

Lets break this down a bit more on the data center side. The data center manager has to allocate power out for the deployment. He has already de-rated the server draw but there is a certain minimum amount of infrastructure he has to deploy regardless. The power being pulled from those PDUs are taking up valuable slots inside that equipment. Think of your stereo equipment at home, there are only so many speakers you can connect to your base unit no matter how loud you want it to get. The data center manager had to make certain decisions based upon the rack configuration. If we believe that they can squeeze 10 of these new servers into a rack, the data center manager has pulled enough capacity to address 2.1 kilowatts per rack (210watts*10 servers). With twenty total servers that means he has two racks of 2.1kilowatts of base load. Sounds easy right? Its just math. And Mike – you said it was harder than regular math. You lied. Did I? Well it turns out that physics is physics and as Scotty from the Enterprise taught us, “You cannot change the laws of Physics, Jim!” Its likely that the power capacity being allocated to the rack might actually be a bit over the 2.1kilowatts due to the nature of what sized circuits might be required. For example he or she may have only needed enough power for 32 amps of power, but because of those pesky connections he had to pull two 20 amp circuits. Lets say for the sake of argument that in this case he has to reserve 2.5 kilowatts as a function of the physical infrastructure requirements. You start to see a little waste right? Its a little more than one servers expected draw, so you might think its not terribly bad. As a business manager, your frustrated with that waste but you might be ok with it. Especially since its a new facility and you have plenty of capacity.

But wait! Remember that dual cording thing? Now you have to double the power you are setting aside. You have to ensure that you have enough power to ensure you can maintain the servers. Usually this is from another PDU so that you can survive a single PDU failure. Additionally you need to reserve that each side (each cord) has enough power to failover. In some cases the total load of the server is divided between the two power supplies, in some cases, power is drawn from the primary with a small trickle of draw from the redundant connection. If the load is divided between both power supplies you are effectively drawing HALF of the total reserved power. If its the situation where they draw full load off one, and have a trickle draw off the second power supply, you are actually drawing the correct amount on one leg, and dramatically less than HALF on the second. Either way the power is allocated and reserved and I bet its more than you thought when we started out this crazy story. Well hold tight, because its going to get even more complicated in a bit.

Right now, any data center manager in the world is reading this and screaming at the remedial nature of this post. This is Facilities Management 101 right after the seminar entitled ‘The EPO button is rarely your friend’ . In fact, I am probably insulting their intelligence because there are even more subtleties than what I have outlined here. But my dear facilities comrade, its not to you I am writing this section to. Its the business management and IT folks. With physics being its pesky self combined with some business decisions, you are effectively taking more power down that you initially thought. Additionally you now have a tax on future capacity as that load associated with physics and redundancy is forever in reserve. Not to be touched without great efforts if at all.

Dual Cording is not bad. Redundancy is not bad. Its a business risk, and that’s something you can understand, and in fact, as a business manager its something I would be willing to bet you do every day in your job. You are weighing the impact of the outage of the business to actual cost. One can even easily calculate the cost of such a decision by taking proportional allocations of your capital cost from an infrastructure perspective and weigh it against the economic impact of not having certain tools and applications available. Even when this is done and its well understood, there is a strange phenomena of amnesia that sets in and in a few months/years the same business manager may look at the facility and give the facilities person a hard time for not utilizing all the power. To Data Center Managers – Having sat as a Data Center Professional for many years – I’m sad to say, you can expect to have this “Reserved” power conversations multiple times with your manager over and over again, especially when things get tight in terms of capacity left. To business managers, book mark this post and read it about every 6 months or so.

Taking it up a notch for a moment…

That last section introduced the concept of Reserved Power. Reserved Power is a concept that sits at the Facility level of Capacity Planning. When a data center hall is first built out there are three terms and concepts you need to know. The first is Critical load (sometimes called IT load). This is the power available to IT and computer equipment in your facility. The second is called Non-Critical load, which has to do with the amount of power allocated to things like lighting, mechanical systems and your electrical plant, generators, etc. What I commonly call ‘Back of the house’ or ‘Big Iron’. The last term is Total load. Total load is the total amount of power available to the facility and can usually be calculated by adding Critical and Non-Critical loads.

A Facility is born with all three facets. You generally cannot have one without the other. In plan on having a future post called ‘Data Center Metrics for Dummies’ which will explore the interconnection between these. For now lets keep it really simple.

The wonderful facility we have built has a certain amount of IT gear that it will hold. Essentially every server we deploy into the facility will subtract from the total amount of Critical Load available for new deployments. As we deduct the power from the facility we are allocating that capacity out. In our previous example we deployed two racks at 2.5kilowatts (and essentially reserved capacity for two more for redundancy). With those two racks we have allocated enough power for 5 kilowatts of real draw and have reserved 10 kilowatts in total.

Before people get all mad at me, I just want to point out that some people don’t count the dual cording extra because they essentially de-rate the room with the understanding that everything will be dual corded. I’m keeping it simple for people to understand what’s actually happening.

Ok back to the show – As I mentioned those racks would really only draw 2.1kW each at full load (we have essentially stranded 400watts per rack of potential capacity and combined its almost 800 watts per rack). As a business we already knew this but we still have to calculate it out and apply our “Budgeted Power” to the room level. So, across our two racks we have an allocated power of 5 kilowatts, with a budgeted amount of 4.2 kilowatts.

Now here is where our IT friends come into play and make things a bit more difficult. That wonderful application that was going to solve world hunger for us and was going to be such a beefy application from a resources perspective is not living up to its reputation. Instead of driving 100% utilization, its sitting down around 8 percent per box. In fact the estimated world wide server utilization number for servers sits between 5-14%. Most server manufacturers have built their boxes in a way where they draw less power at lower utilizations. Therefore our 210watts per server might be closer to 180watts per server of “Actual Load”. That’s another 30watts per server. So while we have allocated 2.5kilowatts, and reserved 2.1kilowatts, we are only drawing 1.8kilowatts of power. We have two racks, we so double it. So now we are in a situation where we are not using 700watts per rack or 1.4 kilowatts across our two racks. Ouch that’s potentially 28% of power wasted!

The higher IT and applications drive the utilization rate, the lower amount of waste you will have. Virtualization can help here, but its not without its own challenges as we will see.

Big Rocks Little Rocks…

Now luckily, as a breed, data center managers are a smart bunch and they have a various of ways to try and reduce this waste. It goes back to our concept of budgeted or reserved power combined with our “stereo jacks” in the PDU. As long as we have some extra jacks, the Data Center Manager can return back to our two racks and artificially set a lower power budget per rack. This time after metering the power for some time he makes the call to artificially limit the racks allocation to 2 kilowatts – he could go to 1.8kilowatts, but remember he is conservative and wants to still give himself some cushion. He can then deploy new racks or cabinets and pretend that the extra 200 watts to the new racks. He can continue this until he runs out of power or out of slots on the PDU. This is a manual process that is physically managed by the facilities manager. There is an emerging technology called power capping which will allow you to do this in software on a server to server basis which will be hugely impactful in our industry, its just not ready for prime time yet.

This inefficiency in allocation creates strange gaps and holes in data centers. Its a phenomena I call Big Rocks, Little Rocks, and like everything in this post is somehow tied to physics.

In this case it was my freshman year physics class in college. The professor was at the front of the class with a bucket full of good sized rocks. He politely asked if the the bucket was full. The class of course responded in the affirmative. He then took out a smaller bucket of pebbles and poured and rigorously sifted and shook the heck out of the larger bucket until every last pebble was emptied into that bucket with the big rocks. He asked again, “Now is it full?” The class responded once more in the affirmative and he pulled out a bucket of sand. He proceeded to re-perform the sifting and shaking, etc and emptied the sand into the bucket. “Its finally full now right?” The class shook their heads one last time in the affirmative and he produced a small bucket of water and poured it into the bucket as well.

That ‘Is the bucket full exercise’ is a lot like the capacity planning that every data center manager eventually has to get very good at. Those holes in capacity at a rack level or PDU level I spoke out are the spaces for servers and equipment to ultimately fit into. At first its easy to fit in big rocks, then it gets harder and harder. You are ultimately left trying to manage to those small spaces of capacity. Trying to utilize every last bit of energy in the facility.

This can be extremely frustrating to business managers and IT personnel. Lets say you do a great job of informing the company how much capacity you actually have in your facility, if there is no knowledge of our “rocks” problem you can easily get yourself into trouble.

Lets go back for a second to our example facility. Time has now passed and our facility is now nearly full. Out of the 1MW of total capacity we have been very aggressive in managing our holes and still have 100kilowatts of capacity. The IT personnel have a new application that is database intensive that will draw 80 kilowatts and because the facility manager has done a good job of managing his facility, there is every expectation that it will be just fine. Until of course they mention that these servers have to be contiguous and close together for performance or even functionality purposes. The problem of course is that you now have a large rock that you need to try and squeeze into small rock places. It wont work. It may actually even force you to either move other infrastructure around in your facility impacting other applications and services, or cause you to get more data center space.

You see the ‘Is it full exercise’ does not work in reverse. You cannot fill a bucket with water, then sand, then pebbles, then rocks. Again lack of understanding can lead to ill-will or the perception that the data center manager is not doing a good job of managing his facility when in fact they are being very aggressive in that management. Its something the business side and IT side should understand.

Virtualization Promise and Pitfalls…

Virtualization is a topic unto itself that is pretty vast and interesting, but I did want to point out some key things to think about it. As you hopefully saw, server utilization has a huge impact on power draw. The higher the utilization the better performance from a power perspective. Additionally many server manufacturers have certain power ramps built into their equipment where you might see an incrementally large jump in power consumption from 11 percent to 12 percent for example. It has to do with throttling of the power consumption I mentioned above. This is a topic that most facility managers have no experience and knowledge of as it has more to do with server design and performance. If your facility manager is aggressively managing your facility as in the example above and virtualization is introduced, you might find yourself tripping circuits as you drive the utilization higher and it crosses these internal utilization thresholds. HP has a good paper talking about how this works. If you pay particular attention to page 14, The lower line is the throttled processor as a function utilization. The upper line is full speed as a function of utilization and then their dynamic power regulation feature is the one that jumps up to full speed a 60% utilization. This gives the box performance only at high utilizations. Its a feature that is turned on by default in HP Servers. Other manufacturers have similar technologies built into their products as well. Typically your Facilities people would not be reading such things. Therefore its imperative that when considering virtualization and its impacts – it should be something that the IT folks and Data Center managers should work on jointly.

I hope this was at least partially valuable out there and hopefully explained some things that may have been considered black box or arcane data center challenges in your mind. Keep in mind with this series I am trying to educate on all sides the challenges we are facing together.

/Mm