Our Vision for Generation 4 Modular Data Centers – One way of Getting it just right . . .

 

image

Data Centers are a hot topic these days. No matter where you look, this once obscure aspect of infrastructure is getting a lot of attention. For years, there have been cost pressures on IT operations and this, when the need for modern capacity is greater than ever, has thrust data centers into the spotlight. Server and rack density continues to rise, placing DC professionals and businesses in tighter and tougher situations while they struggle to manage their IT environments. And now hyper-scale cloud infrastructure is taking traditional technologies to limits never explored before and focusing the imagination of the IT industry on new possibilities.

At Microsoft, we have focused a lot of thought and research around how to best operate and maintain our global infrastructure and we want to share those learnings. While obviously there are some aspects that we keep to ourselves, we have shared how we operate facilities daily, our technologies and methodologies, and, most importantly, how we monitor and manage our facilities. Whether it’s speaking at industry events, inviting customers to our “Microsoft data center conferences” held in our data centers, or through other media like blogging and white papers, we believe sharing best practices is paramount and will drive the industry forward.  So in that vein, we have some interesting news to share.

Today we are sharing our Generation 4 Modular Data Center plan. This is our vision and will be the foundation of our cloud data center infrastructure in the next five years. We believe it is one of the most revolutionary changes to happen to data centers in the last 30 years. Joining me, in writing this blog are Daniel Costello, my director of Data Center Research and Engineering and Christian Belady, principal power and cooling architect. I feel their voices will add significant value to driving understanding around the many benefits included in this new design paradigm.

Our “Gen 4” modular data centers will take the flexibility of containerized servers—like those in our Chicago data center—and apply it across the entire facility. So what do we mean by modular? Think of it like “building blocks”, where the data center will be composed of modular units of prefabricated mechanical, electrical, security components, etc., in addition to containerized servers.

Was there a key driver for the Generation 4 Data Center?

If we were to summarize the promise of our Gen 4 design into a single sentence it would be something like this: “A highly modular, scalable, efficient, just-in-time data center capacity program that can be delivered anywhere in the world very quickly and cheaply, while allowing for continued growth as required.”  Sounds too good to be true, doesn’t it?  Well, keep in mind that these concepts have been in initial development and prototyping for over a year and are based on cumulative knowledge of previous facility generations and the advances we have made since we began our investments in earnest on this new design.

One of the biggest challenges we’ve had at Microsoft is something Mike likes to call the ‘Goldilock’s Problem’.  In a nutshell, the problem can be stated as:

The worst thing we can do in delivering facilities for the business is not have enough capacity online, thus limiting the growth of our products and services.

The second worst thing we can do in delivering facilities for the business is to have too much capacity online.

This has led to a focus on smart, intelligent growth for the business — refining our overall demand picture. It can’t be too hot. It can’t be too cold. It has to be ‘Just Right!’ The capital dollars of investment are too large to make without long term planning. As we struggled to master these interesting challenges, we had to ensure that our technological plan also included solutions for the business and operational challenges we faced as well. 

So let’s take a high level look at our Generation 4 design

Are you ready for some great visuals? Check out this video at Soapbox. Click here for the Microsoft 4th Gen Video.  It’s a concept video that came out of my Data Center Research and Engineering team, under Daniel Costello, that will give you a view into what we think is the future.

image

From a configuration, construct-ability and time to market perspective, our primary goals and objectives are to modularize the whole data center. Not just the server side (like the Chicago facility), but the mechanical and electrical space as well. This means using the same kind of parts in pre-manufactured modules, the ability to use containers, skids, or rack-based deployments and the ability to tailor the Redundancy and Reliability requirements to the application at a very specific level.

image

Our goals from a cost perspective were simple in concept but tough to deliver. First and foremost, we had to reduce the capital cost per critical Mega Watt by the class of use.  Some applications can run with N-level redundancy in the infrastructure, others require a little more infrastructure for support. These different classes of infrastructure requirements meant that optimizing for all cost classes was paramount.  At Microsoft, we are not a one trick pony and have many Online products and services (240+) that require different levels of operational support. We understand that and ensured that we addressed it in our design which will allow us to reduce capital costs by 20%-40% or greater depending upon class. 

For example, non-critical or geo redundant applications have low hardware reliability requirements on a location basis. As a result, Gen 4 can be configured to provide stripped down, low-cost infrastructure with little or no redundancy and/or temperature control.  Let’s say an Online service team decides that due to the dramatically lower cost, they will simply use uncontrolled outside air with temperatures ranging 10-35 C and 20-80% RH. The reality is we are already spec-ing this for all of our servers today and working with server vendors to broaden that range even further as Gen 4 becomes a reality.  For this class of infrastructure, we eliminate generators, chillers, UPSs, and possibly lower costs relative to traditional infrastructure.

Applications that demand higher level of redundancy or temperature control will use configurations of Gen 4 to meet those needs, however, they will also cost more (but still less than traditional data centers). We see this cost difference driving engineering behavioral change in that we predict more applications will drive towards Geo redundancy to lower costs.

Another cool thing about Gen 4 is that it allows us to deploy capacity when our demand dictates it.  Once finalized, we will no longer need to make large upfront investments. Imagine driving capital costs more closely in-line with actual demand, thus greatly reducing time-to-market and adding the capacity Online inherent in the design.  Also reduced is the amount of construction labor required to put these “building blocks” together. Since the entire platform requires pre-manufacture of its core components, on-site construction costs are lowered. This allows us to maximize our return on invested capital.

 

image

In our design process, we questioned everything. You may notice there is no roof and some might be uncomfortable with this. We explored the need of one and throughout our research we got some surprising (positive) results that showed one wasn’t needed.

In short, we are striving to bring Henry Ford’s Model T factory to the data center. http://en.wikipedia.org/wiki/Henry_Ford#Model_T.  Gen 4 will move data centers from a custom design and build model to a commoditized manufacturing approach. We intend to have our components built in factories and then assemble them in one location (the data center site) very quickly. Think about how a computer, car or plane is built today. Components are manufactured by different companies all over the world to a predefined spec and then integrated in one location based on demands and feature requirements.  And just like Henry Ford’s assembly line drove the cost of building and the time-to-market down dramatically for the automobile industry, we expect Gen 4 to do the same for data centers. Everything will be pre-manufactured and assembled on the pad.

image

And did we mention that this platform will be, overall, incredibly energy efficient? From a total energy perspective not only will we have remarkable PUE values, but the total cost of energy going into the facility will be greatly reduced as well.  How much energy goes into making concrete?  Will we need as much of it?  How much energy goes into the fuel of the construction vehicles?  This will also be greatly reduced! A key driver is our goal to achieve an average PUE at or below 1.125 by 2012 across our data centers.  More than that, we are on a mission to reduce the overall amount of copper and water used in these facilities. We believe these will be the next areas of industry attention when and if the energy problem is solved. So we are asking today…“how can we build a data center with less building”?

image

We have talked openly and publicly about building chiller-less data centers and running our facilities using aggressive outside economization. Our sincerest hope is that Gen 4 will completely eliminate the use of water. Today’s data centers use massive amounts of water and we see water as the next scarce resource and have decided to take a proactive stance on making water conservation part of our plan. 

By sharing this with the industry, we believe everyone can benefit from our methodology.  While this concept and approach may be intimidating (or downright frightening) to some in the industry, disclosure ultimately is better for all of us. 

Gen 4 design (even more than just containers), could reduce the ‘religious’ debates in our industry. With the central spine infrastructure in place, containers or pre-manufactured server halls can be either AC or DC, air-side economized or water-side economized, or not economized at all (though the sanity of that might be questioned).  Gen 4 will allow us to decommission, repair and upgrade quickly because everything is modular. No longer will we be governed by the initial decisions made when constructing the facility. We will have almost unlimited use and re-use of the facility and site. We will also be able to use power in an ultra-fluid fashion moving load from critical to non-critical as use and capacity requirements dictate. 

Finally, we believe this is a big game changer. Gen 4 will provide a standard platform that our industry can innovate around. For example, all modules in our Gen 4 will have common interfaces clearly defined by our specs and any vendor that meets these specifications will be able to plug into our infrastructure.  Whether you are a computer vendor, UPS vendor, generator vendor, etc., you will be able to plug and play into our infrastructure. This means we can also source anyone, anywhere on the globe to minimize costs and maximize performance.  We want to help motivate the industry to further innovate—with innovations from which everyone can reap the benefits. 

To summarize, the key characteristics of our Generation 4 data centers are:

  • Scalable
  • Plug-and-play spine infrastructure
  • Factory pre-assembled: Pre-Assembled Containers (PACs) & Pre-Manufactured Buildings (PMBs)
  • Rapid deployment
  • De-mountable
  • Reduce TTM
  • Reduced construction
  • Sustainable measures
  • Map applications to DC Class

image

We hope you join us on this incredible journey of change and innovation!

Long hours of research and engineering time are invested into this process. There are still some long days and nights ahead, but the vision is clear. Rest assured however, that we as refine Generation 4, the team will soon be looking to Generation 5 (even if it is a bit farther out).  There is always room to get better. 

So if you happen to come across Goldilocks in the forest, and you are curious as to why she is smiling you will know that she feels very good about getting very close to ‘JUST RIGHT’.   

Generations of Evolution – some background on our data center designs

We thought you might be interested in understanding what happened in the first three generations of our data center designs. When Ray Ozzie wrote his Software plus Services memo it posed a very interesting challenge to us. The winds of change were at ‘tornado’ proportions.   That “plus Services” tag had some significant (and unstated) challenges inherent to it.  The first was that Microsoft was going to evolve even further into an operations company.  While we had been running large scale Internet services since 1995, this development lead us to an entirely new level.  Additionally, these “services” would span across both Internet and Enterprise businesses. To those of you who have to operate “stuff”, you know that these are two very different worlds in operational models and challenges. It also meant that, to achieve the same level of reliability and performance required our infrastructure was going to have to scale globally and in a significant way.

It was that intense atmosphere of change that we first started re-evaluating data center technology and processes in general and our ideas began to reach farther than what was accepted by the industry at large. This was the era of Generation 1.  As we look at where most of the world’s data centers are today (and where our facilities were), it represented all the known learning and design requirements that had been in place since IBM built the first purpose-built computer room. These facilities focused more around uptime, reliability and redundancy. Big infrastructure was held accountable to solve all potential environmental shortfalls. This is where the majority of infrastructure in the industry still is today.

We soon realized that traditional data centers were quickly becoming outdated. They were not keeping up with the demands of what was happening technologically and environmentally.  That’s when we kicked off our Generation 2 design. Gen 2 facilities started taking into account sustainability, energy efficiency, and really looking at the total cost of energy and operations. No longer did we view data centers just for the upfront capital costs, but we took a hard look at the facility over the course of its life.  Our Quincy, Washington and San Antonio, Texas facilities are examples of our Gen 2 data centers where we explored and implemented new ways to lessen the impact on the environment. These facilities are considered two leading industry examples, based on their energy efficiency and ability to run and operate at new levels of scale and performance by leveraging clean hydro power (Quincy) and recycled waste water (San Antonio) to cool the facility during peak cooling months.

As we were delivering our Gen 2 facilities into steel and concrete, our Generation 3 facilities were rapidly driving the evolution of the program. The key concepts for our Gen 3 design are increased modularity and greater concentration around energy efficiency and scale.  The Gen 3 facility will be best represented by the Chicago, Illinois facility currently under construction.  This facility will seem very foreign compared to the traditional data center concepts most of the industry is comfortable with. In fact, if you ever sit around in our container hanger in Chicago it will look incredibly different from a traditional raised-floor data center. We anticipate this modularization will drive huge efficiencies in terms of cost and operations for our business. We will also introduce significant changes in the environmental systems used to run our facilities.  These concepts and processes (where applicable) will help us gain even greater efficiencies in our existing footprint, allowing us to further maximize infrastructure investments.

This is definitely a journey, not a destination industry. In fact, our Generation 4 design has been under heavy engineering for viability and cost for over a year.  While the demand of our commercial growth required us to make investments as we grew, we treated each step in the learning as a process for further innovation in data centers.  The design for our future Gen 4 facilities enabled us to make visionary advances that addressed the challenges of building, running, and operating facilities all in one concerted effort.

/Mm/Dc/Cb

Advertisements

In disappointment, there is opportunity. . .

I was personally greatly disappointed with the news coming out of last week that the Uptime Institute had branded Microsoft and Google as the enemy to traditional data center operators.  To be truthful, I did not give the reports much credit especially given our long and successful relationship with that organization.  However, when our representatives to the event returned and corroborated the story, I have to admit that I felt more than  a bit let down.

As reported elsewhere, there are some discrepancies in how our mission was portrayed versus the reality of our position.   One of the primary messages of our cloud initiatives is that there is a certain amount of work/information that you will want to be accessed via the cloud, and there is some work/information that you want to keep privately.  Its why we call it SOFTWARE + SERVICES.  There’s quite a few things people just would not feel comfortable running in the cloud.   We are doing this (data center construction and operation)  because the market, competitive forces, and our own research is driving us there.   I did want to address some of the misconceptions coming out of that meeting however:

On PUE, Measurement, and our threat to the IT industry

The comments that Microsoft and Google are the biggest threat to the IT industry and that Microsoft is “making the industry look bad by putting our facilities in areas that would bring the PUE numbers down” are very interesting.  First as mentioned before, please revisit our Software + Services strategy, its kind of hard to be a threat if we are openly acknowledging the need for corporate data centers in our expressed strategy.   I can assure you that we have no intention of making anyone look “bad”, nor do we in any way market our PUE values.  We are not a data center real estate firm and we do not lease out our space where this might even remotely be a factor. 

While Microsoft believes in Economization (both water and air-side), not all of our facilities employ this technology.  In fact, if a criticism does exist its that we believe that its imperative to widen your environmental envelopes as open as you can.  Simply stated – run your facilities hotter!

The fact of the matter is that Microsoft has invested in both technology and software to allow us to run our environments more aggressively than a traditional data center environment.   We understand that certain industries have very specific requirements around the operation of storage of information which drive and dictate certain physical reliability and redundancy needs.   I have been very vocal around getting the Best PUE for your facility.  Our targets are definitely unrealistic for the industry at large but the goal of driving the most efficiency you can out of your facilities is something everyone should be focused on.

It was also mentioned that we do not measure our facilities over time which is patently untrue.   We have years and years worth of measured information for our facilities with multiple measurements per day.  We have been fairly public about this and have produced specifics on numbers (including the Uptime Symposium last year) which makes this somewhat perplexing. 

On Bullying the Industry

If the big cloud players are trying to bully the industry with  money and resources, I guess I have to ask – To what end?  Does this focus on energy efficiency equate to something bad?  Aside from the obvious corporate responsibility of using resources wisely and lowering operating costs, the visibility we are bringing to this space is not inherently bad.  Given the energy constraints we are seeing across the planet, a focus on energy efficiency is a good thing. 

Lets not Overreact, There is yet hope

While many people (external and internal) approached me about pulling out of the Uptime organization entirely or even suggesting that we create a true non-for-profit end user forum, motivated by technology and operations issues alone, I think its more important to stay the course.   As an industry we have so much yet to accomplish.  We are at the beginning of some pretty radical changes in both technology, operations, and software that will define our industry in the coming decades.   Now is not the time to splinter but instead redouble our efforts to work together in the best interests of all involved.

Instead of picking apart the work done by the Green Grid and attacking the PUE metric by and large, I would love to see Uptime and Green Grid working together to give some real guidance.  Instead of calling out that PUE’s of 1.2 are unrealistic for traditional data center operators, would it not be more useful for Uptime and Green Grid to produce PUE targets and ranges associated with each Uptime Tier?   In my mind that would go along way to drive the standardization of reporting and reduce ridiculous marketing claims of PUE.

This industry is blessed with two organizations full of smart people attacking the same problem set.  We will continue our efforts through the Microsoft Data Center Experience (MDX) events, conferences, and white-papers to share what we are doing in the most transparent way possible.

/Mm

Out of the Box Paradox – Manifested (aka Chicago Area Data Center begins its journey)

Comment on Microsoft’s official launch of the Chicago facility and the announcement of another Microsoft Data Center Experience Conference in the Chicago facility.

clip_image001

With modern conventional thinking and untold management consultants coaching people to think outside the box, I find it humorous that we have actually physically manifested an “Out of the Box Paradox” in Chicago.  

What is an Out of the Box Paradox you ask?  Well I will refer to Wikipedia on this one for a great example:

“The encouragement of thinking outside the box, however, has possibly become so popular that thinking inside the box is starting to become more unconventional.  This kind of “going against the grain means going with the grain” mentality causes a paradox in that there may be no such thing as conventionality when unconventionality becomes convention.”

The funny part here is that we are actually doing this with….you guessed it…..boxes. Today we finished the first phase of construction and we are rolling into the testing of container-based deployments.  Our facility in Chicago is our first purpose-built data center to accommodate containers on a large scale.  It has been an incredibly interesting journey.  The challenges of solving things that have never been done before are many.  We even had to create our own container specification, one specifically with the end-user in mind to ensure we maximized the cost and efficiency gains possible, not to mention standard blocking and tackling issues like standardizing power, water, network and other interfaces.  All sorts of interesting things have been discovered, corrected, and perfected.  From electrical harmonics issues to streamlining materials movement, to whole new operational procedures.

Chicago Container Spaces with load banks

The facility is already simply amazing and it’s a wonder to behold. Construction kicked off only one year ago and when completed it will have the capacity to scale to hundreds of thousands of servers which can be deployed (and de-commissioned as needed) very quickly.  The joke we use internally is that this is not your mother’s data center.  You get that impression from the first moment you step into the “hangar bay” on the first floor. The “hangar’s” first floor will house the container deployments and I can assure you it is like no data center you have ever seen.  It’s one more step to the industrialization of the IT world, or at least the cloud-scale operations space.  To be fair, and it’s important to note, only one half of the total facility is ready at this point, but even half of this facility is significant in terms of total capacity.

That “Industrialization of IT” is one of the core tenets of my mission at Microsoft. Throwing smart bodies at dumb problems is not really smart at all. The real quest is how to drive innovation and automation into everything that you do to reduce the amount of work that needs to be performed by humans.  Dedicate your smart people for solving hard problems.  It’s more than a mission, it’s a philosophy deeply rooted in our organization.  Besides, industry numbers tell us that humans are the leading cause of outages in data center facilities. 🙂 Our Chicago facility is a huge step forward to driving that industrialization increasingly forward.  It truly represents an evolution and demonstrates what could happen when you blend the power of software and breakthrough innovative design and engineering. Even for buildings!

 Chicago Container Spines being constructed

I have watched with much interest the back and forth on containers in the media, in the industry, and the interesting uses being proposed by the industry. The fact of the matter is that Containers are a great “Out of the Box Paradox” that really should not be terribly shocking to the industry at large. 

The idea of “containment” is almost as old as mechanical engineering and thermodynamics itself. Containment gives you the ability to manage the heat or lack thereof more effectively in individual ecosystems. Forward looking designers have been doing “containment” for a long time. So going back to the paradox that “out of the box, is in the box thinking” shift, the concept is not terribly new.  It’s the application at our scale and specifically to the data center world which is most interesting.  

It allows us to get out of the traditional decision points common to the data center industry in that certain infrastructure decisions actually reside in the container itself, which allows for a much quicker refresh cycle of key components and the ability to swap out for the next greatest technology rapidly.  Therefore, by default it allows us to deploy our capital infrastructure costs much more closely aligned with actual need versus the large step functions one normally sees in data center construction (build a large expensive facility, and fill it up over time versus build capacity out as you need it).   This allows you to better manage costs, better manage your business, and give you the best possible ramp for technology refresh.  You don’t particularly care if its AC or DC, if it’s water cooled or air cooled.  Our metrics are simple – Give us the best performing, most efficient, lowest TCO technology to meet our needs. If today that’s AC, great.  Tomorrow DC?  Fantastic.  Do I want to be able to do a bake-off between the two?  Sure. I don’t have to reinvest huge funds in my facilities to make those changes. 

For those of you with real lives and have not been following the whole container debates here is a quick recap –

  1. Microsoft is using standard 40 foot shipping containers for the deployment of servers in support of the software + services strategy and in support of our cloud services infrastructure initiatives.
  2. The containers can house as many as 2500 servers achieving a density of 10 times the amount of compute in the equivalent space in a traditional data center.
  3. We believe containers offer huge advantages at scale in terms of both initial capital and ongoing operating costs.
  4. This idea has met some resistance in the industry. As highlighted by my interesting back and forth with Eric Lai from Computerworld magazine. Original article can be found here, with my “Anthills” response found here.
  5. Chicago represents one of the first purpose-built container-built facilities ever.

To be clear, as I have said in the past, containers are not for everyone, but they are great for us.

The other thing which is important is the energy efficiency of the containers. Now I want to be careful here as the reporting of efficiency numbers can be a dangerous exercise in the blogo-sphere. But our testing shows that our containers in Chicago can deliver an average PUE of 1.22 with an AVERAGE ANNUAL PEAK PUE of 1.36. I break these two numbers out separately because there is still some debate (at least in the circles I travel in) on which of these metrics is more meaningful.  Regardless of your position on which is more meaningful, you have to admit those numbers are pretty darn compelling. 

image

For the purists and math-heads out there, Microsoft includes house lighting and office loads in our PUE calculation. They are required to run the facility so we count them as overhead.

On the “Sustainability” side of containers it’s also interesting to note that shipping 2500 servers in one big container has a positive reduction on the CO2 related to transportation, let alone the amount of packaging material eliminated.

So in my mind, containers are driving huge cost and efficiency (read also as cost benefits in addition to “green” benefits) gains for the business.  This is an extremely important point, as Microsoft expands its data center infrastructure, it is supremely important that we follow an established smart growth methodology for our facilities that is designed to prevent overbuilding—and thus avoid associated costs to the environment and to our shareholders.  We are a business after all.  We must do all of this while also meeting the rapidly growing demand for Microsoft’s Online and Live services.

Containers, and this new approach is definitely a change in how facilities have traditionally been developed, and as a result many people in our industry are intimidated by it.  But they shouldn’t be. Data center’s have not changed in fundamental design for decades.  Sometimes change is good. The exposure to any new idea is always met with resistance, but with a little education things change over time.

In that vein we are looking at holding our second Microsoft Data Center Experience (MDX) event in Chicago in the Spring/Summer 2009.  Our first event held in San Antonio, was basically an opportunity for a couple hundred Microsoft enterprise customers to tour our facilities, ask all the questions they wanted, interact with our Data Center experts (mechanical, electrical, operations, facilities management, etc.), and generally get a feel to our approach. It’s not that ours is the right way, or the wrong way…..just our way.  Think of an Operations event for Operations people, by Operations people. 

It’s not glamorous, there are no product pitches, no slick brochures, no hardware hunks or booth babes, but hopefully it’s interesting.  That first event was hugely successful with incredible feedback from our customers. As a result, we decided to do the same thing in Chicago with the very first container data center.  Which of course makes things a bit tricky.  While the facility will be going through a vigorous testing phase from effectively now moving forward, we thought it better to ensure that any and all construction activity be formally complete before we go moving large groups of people through our facility to ensure safety.  Plus, I don’t think I have enough hard hats and safety gear for you all.  

So if you attended MDX-San Antonio and really want to drill deeper in on Containers, in a facility custom built for them, or would like to attend just to ask questions, look for details on it from your Microsoft account management team or your local Microsoft sales office for details next Spring. (Although it’s not a sales event, you are more likely to reach someone there faster than calling into Global Foundation Services directly, after all we have a global infrastructure to run.)

/Mm