Data Center Industry

Industry Impact : Brothers from Different Mothers and Beyond…

My reading material and video watching habits these past two weeks have brought me some incredible joy and happiness. Why? Because Najam Ahmad of Facebook is finally getting some credit for the amazing work that he has done and been doing in the world of Software Defined Networking. In my opinion Najam is a Force Majeure in the networking world. He is passionate. He is focused. He just gets things done. Najam and I worked very closely at Microsoft as we built out and managed the company’s global infrastructure. So closely in fact that we were frequently referred to as brothers from different mothers. Wherever Najam was-I was not far behind, and vice versa. We laughed. We cried. We fought. We had alot of fun while delivered some pretty serious stuff. To find out that he is behind the incredible Open Compute Project advances in Networking is not surprising at all. Always a forward thinking guy he has never been satisfied with the status quo.

If you have missed any of that coverage you I strongly encourage you to have a read at the links below.

http://www.theregister.co.uk/2013/11/12/facebook_cisco_killer_ocp_networking/

http://www.wired.com/wiredenterprise/2013/11/facebook_network/

http://www.datacenterknowledge.com/archives/2013/11/12/open-compute-reports-progress-open-network-switch/

http://www.businessinsider.com/facebooks-plan-for-the-network-market-2013-5

http://gigaom.com/2013/11/11/facebooks-hardware-vp-says-were-very-close-to-open-source-switches/

This got me to thinking about the legacy of the Microsoft program on the Cloud and Infrastructure Industry at large. Data Center Knowledge had an article covering the impact of some of the Yahoo Alumni a few years ago. Many of those folks are friends of mine and deserve great credit. In fact, Tom Furlong now works side by side with Najam at Facebook. The purpose of my thoughts are not to take away from their achievements and impacts on the industry but rather to really highlight the impact of some of the amazing people and alumni from the Microsoft program. Its a long overdue acknowledgement of the legacy of that program and how it has been a real driving force in large scale infrastructure. The list of folks below is by no means comprehensive and doesnt talk about the talented people Microsoft maintains in their deep stable that continue to drive the innovative boundaries of our industry.

Christian Belady of Microsoft – Here we go, first person mentioned and I already blow my own rule. I know Christian is still there at Microsoft but its hard not to mention him as he is the public face of the program today. He was an innovative thinker before he joined the program at Microsoft and was a driving thought leader and thought provoker while I was there. While his industry level engagements have been greatly sidelined as he steers the program into the future – he continues to be someone willing to throw everything we know and accept today into the wind to explore new directions.

Najam Ahmad of Facbook – You thought I was done talking about this incredible guy? Not in the least, few people have solved network infrastructure problems at scale like Najam has. With his recent work on the OCP front finally coming to the fore, he continues to drive the capabilities of what is possible forward. I remember long meetings with Network vendors where Najam tried to influence capabilities and features with the box manufacturers within the paradigm of the time, and his work at Facebook is likely to end him up in a position where he is both loved and revilved by the Industry at large. If that doesn’t say your an industry heavy weight…nothing does.

James Hamilton of Amazon – There is no question that James continues to drive deep thinking in our industry. I remain an avid reader of his blog and follower of his talks. Back in my Microsoft days we would sit and argue philosophical issues around the approach to our growth, towards compute, towards just about everything. Those conversations either changed or strengthed my positions as the program evolved. His work in the industry while at Microsoft and beyond has continued to shape thinking around data centers, power, compute, networking and more.

Dan Costello of Google – Dan Costello now works at Google, but his impacts on the Generation 3 and Generation 4 data center approaches and the modular DC industry direction overall will be felt for a very long time to come whether Google goes that route or not. Incredibly well balanced in his approach between technology and business his ideas and talks continue to shape infrastructre at scale. I will spare people the story of how I hired him away from his previous employer but if you ever catch me at a conference, its a pretty funny story. Not to mention the fact that he is the second best break dancer in the Data Center Industry.

Nic Bustamonte of Google – Nic is another guy who has had some serious impact on the industry as it relates to innovating the running and operating of large scale facilities. His focus on the various aspects of the operating environments of large scale data centers, monioring, and internal technology has shifted the industry and really set the infancy for DCIM in motion. Yes, BMS systems have been around forever, and DCIM is the next interation and blending of that data, but his early work here has continued to influence thinking around the industry.

Arne Josefsberg of ServiceNow – Today Arne is the CTO of Service Now, and focusing on infrastructure and management for enterprises to the big players alike and if their overall success is any measure, he continues to impact the industry through results. He is *THE* guy who had the foresight of building an organiation to adapt to this growing change of building and operating at scale. He the is the architect of building an amazing team that would eventually change the industry.

Joel Stone of Savvis/CenturyLink – Previously the guy who ran global operations for Microsoft, he has continued to drive excellence in Operations at Global Switch and now at Savvis. An early adopter and implmenter of blending facilities and IT organizations he mastered issues a decade ago that most companies are still struggling with today.

Sean Farney of Ubiquity – Truly the first Data center professional who ever had to productize and operationalize data center containers at scale. Sean has recently taken on the challenge of diversifying data center site selection and placement at Ubquity repurposing old neighorbood retail spaces (Sears, etc) in the industry. Given the general challenges of finding places with a confluence of large scale power and network, this approach may prove to be quite interesting as markets continue to drive demand.

Chris Brown of Opscode – One of the chief automation architects at my time at Microsoft, he has moved on to become the CTO of Opscode. Everyone on the planet who is adopting and embracing a DevOps has heard of, and is probably using, Chef. In fact if you are doing any kind of automation at large scale you are likely using his code.

None of these people would be comfortable with the attention but I do feel credit should be given to these amazing individuals who are changing our industry every day. I am so very proud to have worked the trenches with these people. Life is always better when you are surrounded by those who challenge and support you and in my opinion these folks have taken it to the next level.

\Mm

Google Purchase of Deep Earth Mining Equipment in Support of ‘Project Rabbit Ears’ and Worldwide WIFI availability…

(10/31/2013 – Mountain View, California) – Close examination of Google’s data center construction related purchases has revealed the procurement of large scale deep earth mining equipment. While the actual need for the deep mining gear is unclear, many speculate that it has to do with a secretive internal project that has come to light known only as Project: Rabbit Ears.

According to sources not at all familiar with Google technology infrastructure strategy, Project Rabbit ears is the natural outgrowth of Google’ desire to provide ubiquitous infrastructure world wide. On the surface, these efforts seem consistent with other incorrectly speculated projects such as Project Loon, Google’s attempt to provide Internet services to residents in the upper atmosphere through the use of high altitude balloons, and a project that has only recently become visible and the source of much public debate – known as ‘Project Floating Herring’, where apparently a significantly sized floating barge with modular container-based data centers sitting in the San Francisco Bay has been spied.

“You will notice there is no power or network infrastructure going to any of those data center shipping containers,” said John Knownothing, chief Engineer at Dubious Lee Technical Engineering Credibility Corp. “That’s because they have mastered wireless electrical transfer at the large multi-megawatt scale.”

Real Estate rates in the Bay Area have increased almost exponentially over the last ten years making the construction of large scale data center facilities an expensive endeavor. During the same period, The Port of San Francisco has unfortunately seen a steady decline of its import export trade. After a deep analysis it was discovered that docking fees in the Port of San Francisco are considerably undervalued and will provide Google with an incredibly cheap real estate option in one of the most expensive markets in the world.

It will also allow them to expand their use of renewable energy through the use of tidal power generation built directly into the barges hull. “They may be able to collect as much as 30 kilowatts of power sitting on the top of the water like that”, continues Knownothing, “and while none of that technology is actually visible, possible, or exists, we are certain that Google has it.”

While the technical intricacies of the project fascinate many, the initiative does have its critics like Compass Data Center CEO, Chris Crosby, who laments the potential social aspects of this approach, “Life at sea can be lonely, and no one wants to think about what might happen when a bunch of drunken data center engineers hit port.” Additionally, Crosby mentions the potential for a backslide of human rights violations, “I think we can all agree that the prospect of being flogged or keel hauled really narrows down the possibility for those outage causing human errors. Of course, this sterner level of discipline does open up the possibility of mutiny.”

However, the public launch of Project Floating Herring will certainly need to await the delivery of the more shrouded Project Rabbit Ears for various reasons. Most specifically the primary reason for the development of this technology is so that Google can ultimately drive the floating facility out past twelve miles into International waters where it can then dodge all national, regional, and local taxation, the safe harbor and privacy legislation of any country or national entity on the planet that would use its services. In order to realize that vision, in the current network paradigm, Google would need exceedingly long network cables to attach to Network Access Points and Carrier Connection points as the facilities drive through international waters.

This is where Project Rabbit Ears becomes critical to the Google Strategy. Making use of the deep earth mining equipment, Google will be able to drill deep into the Earths crust, into the mantle, and ultimately build a large Network Access Point near the Earth’s core. This Planetary WIFI solution will be centrally located to cover the entire earth without the use of regional WIFI repeaters. Google’s floating facilities could then gain access to unlimited bandwidth and provide yet another consumer based monetization strategy for the company.

Knownothing also speculates that such a move would allow Google to make use of enormous amounts of free geo-thermic power and almost singlehandedly become the greenest power user on the planet. Speculation also abounds that Google could then sell that power through its as yet un-invented large scale multi-megawatt wireless power transfer technology as unseen on its floating data centers.

Much of the discussion around this kind of technology innovation driven by Google has been given credible amounts of veracity and discussed by many seemingly intelligent technology based news outlets and industry organizations who should intellectually know better, but prefer not to acknowledge the inconvenient lack of evidence.

\Mm

Editors Note: I have many close friends in the Google Infrastructure organization and firmly believe that they are doing some amazing, incredible work in moving the industry along especially solving problems at scale. What I find simply amazing is in the search for innovation how often our industry creates things that may or may not be there and convince ourselves so firmly that it exists.

2014 The Year Cloud Computing and Internet Services will be taxed. A.K.A Je déteste dire ça. Je vous l’avais dit.

Its one of those times I really hate to be right. As many of you know I have been talking about the various grass roots efforts afoot across many of the Member EU countries to start driving a more significant tax regimen on Internet based companies. My predictions for the last few years have more been cautionary tales based on what I saw happening from a regulatory perspective on a much smaller scale, country to country.

Today’s Wall Street Journal has an article discussing France’s movements to begin taxation on Internet related companies who derive revenue from users and companies across the entirety of the EU, but holding those companies responsible to the tax base in each country. This could likely mean that such legislation is likely to become quite fractured and tough for Internet Companies to navigate. The French proposition is asking the European Commission to draw up proposals by the Spring of 2014.

This is likely to have a very interesting (read as cost increases) across just about every aspect of Internet and Cloud Computing resources. From a business perspective this is going to increase costs which will likely be passed on to consumers in small but interesting ways. Internet advertising will need to be differentiated on a country by country basis, and advertisers will end up having different cost structures, Cloud Computing Companies will DEFINITELY need to understand where instances of customer instances were, and whether or not they were making money. Potentially more impactful, customers of Cloud computing may be held to account for taxation accountability that they did not know they had! Things like Data Center Site Selection are likely going to become even more complicated from a tax analysis perspective as countries with higher populations will likely become no-go zones (perhaps) or require the passage of even more restrictive laws around it.

Its not like the seeds of this haven’t been around since 2005, I think most people just preferred to keep a blind eye to the tax that the seed was sprouting into a full fledged tree. Going back to my Cat and Mouse Papers from a few years ago… The Cat has caught the mouse, its now the mouse’s move.

\Mm

Authors Note: If you don’t have a subscription to the WSJ, All Things Digital did a quick synopsis of the article here.

Through an idea and force of will, he created an industry…

This week the Data Center Industry got the terrible news it knew might be coming for some time. That Ken Brill, founder of the Uptime Institute had passed away. Many of us knew that Ken had been ill for some time and although it may sound silly, were hoping he could somehow pull through it. Even as ill as he was, Ken was still sending and receiving emails and staying in touch with this industry that quite frankly he helped give birth to.

I was recently asked about Ken and his legacy for a Computerworld article and it really caused me to stop and re-think his overall legacy and gift to the rest of us in the industry. Ken Brill was a pioneering, courageous, tenacious, visionary who through his own force of will saw the inefficiencies in a nascent industry and helped craft it into what it is today.

Throughout his early career experience Ken was able to see the absolute silo’ing of information, best practices, and approaches that different enterprises were developing around managing their mission critical IT spaces. While certainly not alone in the effort, he became the strongest voice and champion to break down those walls, help others through the process and build a network of people who would share these ideas amongst each other. Before long an industry was born. Sewn together through his sometimes delicate, sometimes not so delicate cajoling and through it all his absolute passion for the Data Center industry at large.

In that effort he also created and permeated the language that the industry uses as commonplace. Seeing a huge gap in terms of how people communicated and compared mission critical capabilities he became the klaxon of the Tiering system which essentially normalized the those conversations across the Data Center Industry. While some (including myself) have come to think it’s a time to re-define how we classify our mission critical spaces, we all have to pay homage to the fact that Ken’s insistence and drive for the Tiering system created a place and a platform to even have such conversations.

One of Ken’s greatest strengths was his adaptability. For example, Ken and I did not always agree. I remember an Uptime Fellows meeting back in 2005 or 2006 or so in Arizona. In this meeting I started talking about the benefits of modularization and reduced infrastructure requirements augmented by better software. Ken was incredulous and we had significant conversations around the feasibility of such an approach. At another meeting we discussed the relative importance or non-importance of a new organization called ‘The Green Grid’ ( Smile )and if Uptime should closely align itself with those efforts. Through it all Ken was ultimately adaptable. Whether it was giving those ideas light for conversation amongst the rest of the Uptime community via audio blogs, or other means, Ken was there to have a conversation.

In an industry where complacency has become commonplace, where people rarely question established norms, it was always comforting to know that Ken was there acting the firebrand, causing the conversation to happen. This week we lost one of the ‘Great Ones’ and I for one will truly miss him. To his family my deepest sympathies, to our industry I ask, “Who will take his place?”

\Mm

Lots of interest in the MicroDC, but do you know what I am getting the most questions about?

Scott Killian of AOL talks about the MicroDC

Last week I put up a post about how AOL.com has 25% of all traffic now running through our MicroDC infrastructure. There was a great follow up post by James LaPlaine our VP of Operations on his blog Mental Effort, which goes into even greater detail. While many of the email inquiries I get have been based around the technology itself, surprisingly a large majority of the notes have been questions around how to make your software. applications, and development efforts ready for such an infrastructure and what the timelines for realistically doing so would be.

The general response of course is that it depends. If you are a web-based platform or property focused solely on Internet based consumers, or a firm that needs diversified presence in different regions without the hefty price tag of renting and taking down additional space this may be an option. However many of the enterprise based applications have been written in a way that is highly dependent upon localized infrastructure, short application based latency, and lack adequate scaling. So for more corporate data center applications this may not be a great fit. It will take sometime for those big traditional application firms to be able to truly build out their infrastructure to work in an environment like this (they may never do so). I suspect most will take an easier approach and try to ‘cloudify’ their own applications and run it within their own infrastructure or data centers under their control. This essentially will allow them to control the access portion of users needs, but continue to rely on the same kinds of infrastructure you might have in your own data center to support it. Its much easier to build a web based application which then connects to a traditional IT based environment, than to truly build out infrastructure capable of accommodating scale. I am happy to continue answer questions as they come up, but as I had an overwhelming response of questions about this I thought I would throw something quick up here that will hopefully help.

\Mm

On Micro Datacenters, Sandy, Supercomputing 2012, and Coding for Containerized Data Centers….

As everyone has been painfully aware last week the United States saw the devastation caused by the Superstorm Sandy. My original intention was to talk about yet another milestone with our Micro Data Center approach. As the storm slammed into the East Coast I felt it was probably a bad time to talk about achieving something significant especially as people were suffering through the storms outcome. In fact, after the storm AOL kicked off an incredible supplies drive and sent truckloads of goods up to the worst of the affected areas.

So, here we are a week after the storm, and while people are still in need and suffering, it is clear that the worst is over and the clean up and healing has begun. It turns out that Super Storm Sandy also allowed us to test another interesting case in the journey of the Micro Data Center as well that I will touch on.

25% of ALL AOL.COM Traffic runs through Micro Data Centers

I have talked about the potential value of our use of Micro Data Centers and the pure agility and economics the platform will provide for us. Up until this point we had used this technology in pockets. Think of our explorations as focusing on beta and demo environments. But that all changed in October when we officially flipped the switch and began taking production traffic for AOL.COM with the Micro Data Center. We are currently (and have been since flipping the switch) running about 25% of all traffic coming to our main web site. This is an interesting achievement in many ways. First, from a performance perspective we are manually limiting the platform (it could do more!) to ~65,000 requests per minute and a traffic volume of about 280mbits per second. To date I haven’t seen many people post performance statistics about applications in modular use, so hopefully this is relevant and interesting to folks in terms of the volume of load an approach such as this could take. We recently celebrated this at a recent All-Hands with an internal version of our MDC being plugged into the conference room. To prove our point we added it to the global pool of capacity for AOL.com and started taking production traffic right there at the conference facility. This proves in large part the value, agility and mobility a platform like this could bring to bear.

As I mentioned before, Super Storm Sandy threw us another curveball as the hurricane crashed into the Mid-Atlantic. While Virginia was not hit anywhere near as hard as New York and New Jersey, there were incredible sustained winds, tumultuous rains, and storm related damage everywhere. Through it all, our outdoor version of the MDC weathered the storm just fine and continued serving traffic for AOL.com without fail.

This kind of Capability is not EASY or Turn-Key

That’s not to say there isn’t a ton of work to do to get an application to work in an environment like this. If you take the problem space at different levels whether it be DNS, Load Balancing, network redundancy, configuration management, underlying application level timeouts, systems dependencies like databases, other information stores and the like the non-infrastructure related work and coding is not insignificant. There is a huge amount of complexity in running a site like AOL.Com. Lots of interdependencies, sophistication, advertising related collection and distribution and the like. It’s safe to say that this is not as simple as throwing up an Apache/Tomcat instance into a VM.

I have talked for quite awhile about what Netflix engineers originally coined as Chaos Monkeys. The ability, development paradigm, or even rogue processes for your applications to survive significant infrastructure and application level outages. Its essentially taking the redundancy out of the infrastructure and putting into the code. While extremely painful at the start, the savings long term are proving hugely beneficial. For most companies, this is still something futuristic, very far out there. They may be beholden to software manufacturers and developers to start thinking this way which may take a very very long time. Infrastructure is the easy way to solve it. It may be easy, but its not cheap. Nor, if you care about the environmental angle on it, is it very ‘sustainable’ or green. Limit the infrastructure. Limit the Waste. While we haven’t really thought about in terms of rolling it up into our environmental positions, perhaps we should.

The point is that getting to this level of redundancy is going to take work and to that end will continue to be a regulator or anchor slowing down a greater adoption of more modular approaches. But at least in my mind, the future is set, directionally it will be hard to ignore the economics of this type of approach for long. Of course as an industry we need to start training or re-training developers to think in this kind of model. To build code in such a way that it takes into effect the Chaos Monkey Potential out there.

Want to see One Live?

We have been asked to provide an AOL MicroData Center for the Super Computing 12 conference next week in Salt Lake City, Utah with our partner Penguin Computing. If you want to see one of our Internal versions live and up-close feel free to stop by and take a look. Jay Moran (my Distinguished Engineer here at AOL) and Scott Killian (The leader of our data center operations teams) will be onsite to discuss the technologies and our use cases.

\Mm

Insider Redux: Data Barn in a Farm Town

I thought I would start my first post by addressing the second New York Times article first. Why? Because it specifically mentions activities and messages sourced from me at the time when I was responsible for running the Microsoft Data Center program. I will try to track the timeline mentioned in the article with my specific recollections of the events. As Paul Harvey used to say, so then you could know the ‘REST of the STORY’.

I remember my first visit to Quincy, Washington. It was a bit of a road trip for myself and a few other key members of the Microsoft site selection team. We had visited a few of the local communities and power utility districts doing our due diligence on the area at large. Our ‘Heat map’ process had led us to Eastern Washington state. Not very far (just a few hours) from the ‘mothership’ of Redmond, Washington. It was a bit of a crow eating exercise for me as just a few weeks earlier I had proudly exclaimed that our next facility would not be located on the West Coast of the United States. We were developing an interesting site selection model that would categorize and weight areas around the world. It would take in FEMA disaster data, fault zones, airport and logistics information, location of fiber optic and carrier presence, workforce distributions, regulatory and tax data, water sources, and power. This was going to be the first real construction effort undertaken by Microsoft. The cost of power was definitely a factor as the article calls out. But just as equal was the generation mix of the power in the area. In this case a predominance of hydroelectric. Low to No carbon footprint (Rivers it turns out actually give off carbon emissions I came to find out). Regardless the generation mix was and would continue to be a hallmark of site selection of the program when I was there. The crow-eating exercise began when we realized that the ‘greenest’ area per our methodology was actually located in Eastern Washington along the Columbia River.

We had a series of meetings with Real Estate folks, the local Grant County PUD, and the Economic Development folks of the area. Back in those days the secrecy around who we were was paramount, so we kept our identities and that of our company secret. Like geeky secret agents on an information gathering mission. We would not answer questions about where we were from, who we were, or even our names. We ‘hid’ behind third party agents who took everyone’s contact information and acted as brokers of information. That was early days…the cloak and dagger would soon come out as part of the process as it became a more advantageous tool to be known in tax negotiations with local and state governments.

During that trip we found the perfect parcel of land, 75 acres with great proximity to local sub stations, just down line from the Dams on the nearby Columbia River. It was November 2005. As we left that day and headed back it was clear that we felt we had found Site Selection gold. As we started to prepare a purchase offer we got wind that Yahoo! was planning on taking a trip out to the area as well. As the local folks seemingly thought that we were a bank or large financial institution they wanted to let us know that someone on the Internet was interested in the area as well. This acted like a lightning rod and we raced back to the area and locked up the land before they Yahoo had a chance to leave the Bay Area. In these early days the competition was fierce. I have tons of interesting tales of cloak and dagger intrigue between Google, Microsoft, and Yahoo. While it was work there was definitely an air of something big on the horizon. That we were all at the beginning of something. In many ways many of the Technology professionals involved regardless of company forged some deep relationships and competition with each other.

Manos on the Bean Field December 2005 The article talks about how the ‘Gee-Whiz moment faded pretty fast’. While I am sure that it faded in time (as all things do), I also seem to recall the huge increase of local business as thousands of construction workers descended upon this wonderful little town, the tours we would give local folks and city council dignitaries, a spirit of true working together. Then of course there was the ultimate reduction in properties taxes resulting from even our first building and an increase in home values to boot at the time. Its an oft missed benefit that I am sure the town of Quincy and Grant County has continued to benefit from as the Data Center Cluster added Yahoo, Sabey, IAC, and others. I warmly remember the opening day ceremonies and ribbon cutting and a sense of pride that we did something good. Corny? Probably – but that was the feeling. There was no talk of generators. There were no picket signs, in fact the EPA of Washington state had no idea on how to deal with a facility of this size and I remember openly working in partnership on them. That of course eventually wore off to the realities of life. We had a business to run, the city moved on, and concerns eventually arose.

The article calls out a showdown between Microsoft and the Power Utility District (PUD) over a fine for missing capacity forecasting target. As this happened much after I left the company I cannot really comment on that specific matter. But I can see how that forecast could miss. Projecting power usage months ahead is more than a bit of science mixed with art. It gets into the complexity of understanding capacity planning in your data centers. How big will certain projects grow. Will they meet expectations?, fall short?, new product launches can be duds or massive successes. All of these things go into a model to try and forecast the growth. If you think this is easy I would submit that NOONE in the industry has been able to master the crystal ball. I would also submit that most small companies haven’t been able to figure it out either. At least at companies like Microsoft, Google, and others you can start using the law and averages of big numbers to get close. But you will always miss. Either too high, or too low. Guess to low and you impact internal budgeting figures and run rates. Not Good. Guess to high and you could fall victim to missing minimal contracts with utility companies and be subject to fines.

In the case mentioned in the article, the approach taken if true would not be the smartest method especially given the monthly electric bill for these facilities. It’s a cost of doing business and largely not consequential at the amount of consumption these buildings draw. Again, if true, it was a PR nightmare waiting to happen.

At this point the article breaks out and talks about how the Microsoft experience would feel more like dealing with old-school manufacturing rather than ‘modern magic’ and diverts to a situation at a Microsoft facility in Santa Clara, California.

The article references that this situation is still being dealt with inside California so I will not go into any detailed specifics, but I can tell you something does not smell right in the state of Denmark and I don’t mean the Diesel fumes. Microsoft purchased that facility from another company. As the usage of the facility ramped up to the levels it was certified to operate at, operators noticed a pretty serious issue developing. While the building was rated to run at certain load size, it was clear that the underground feeders were undersized and the by-product could have polluted the soil and gotten into the water system. This was an inherited problem and Microsoft did the right thing and took the high road to remedy it. It is my recollection that all sides were clearly in know of the risks, and agreed to the generator usage whenever needed while the larger issue was fixed. If this has come up as a ‘air quality issue’ I personally would guess that there is politics at play. I’m not trying to be an apologist but if true, it goes to show that no good deed goes unpunished.

At this point the article cuts back to Quincy. It’s a great town, with great people. To some degree it was the winner of the Internet Jackpot lottery because of the natural tech resources it is situated on. I thought that figures quoted around taxes were an interesting component missed in many of the reporting I read.

“Quincy’s revenue from property taxes, which data centers do pay, has risen from $815,250 in 2005 to a projected $3.6 million this year, paying for a library and repaved streets, among other benefits, according to Tim Snead, the city administrator.”

As I mentioned in yesterday’s post my job is ultimately to get things done and deliver results. When you are in charge of a capital program as large as Microsoft’s program was at the time – your mission is clear – deliver the capacity and start generating value to the company. As I was presented the last crop of beans harvested from the field at the ceremony we still had some ways to go before all construction and capacity was ready to go. One of the key missing components was the delivery and installation of a transformer for one of the substations required to bring the facility up to full service. The article denotes that I was upset that the PUD was slow to deliver the capacity. Capacity I would add that was promised along a certain set of timelines and promises and commitments were made and money was exchanged based upon those commitments. As you can see from the article, the money exchanged was not insignificant. If Mr. Culbertson felt that I was a bit arrogant in demanding a follow through on promises and commitments after monies and investments were made in a spirit of true partnership, my response would be ‘Welcome to the real world’. As far as being cooperative, by April the construction had already progressed 15 months since its start. Hardly a surprise, and if it was, perhaps the 11 acre building and large construction machinery driving around town could have been a clue to the sincerity of the investment and timelines. Harsh? Maybe. Have you ever built a house? If so, then you know you need to make sure that the process is tightly managed and controlled to ensure you make the delivery date.

The article then goes on to talk about the permitting for the Diesel generators. Through the admission of the Department of Ecology’s own statement, “At the time, we were in scramble mode to permit our first one of these data centers.” Additionally it also states that:

Although emissions containing diesel particulates are an environmental threat, they were was not yet classified as toxic pollutants in Washington. The original permit did not impose stringent limits, allowing Microsoft to operate its generators for a combined total of more than 6,000 hours a year for “emergency backup electrical power” or unspecified “maintenance purposes.”

At the time all this stuff was so new, everyone was learning together. I simply don’t buy that this was some kind Big Corporation versus Little Farmer thing. I cannot comment on the events of 2010 where Microsoft asked for itself to be disconnected from the Grid. Honestly that makes no sense to me even if the PUD was working on the substation and I would agree with the articles ‘experts’.

Well that’s my take on my recollection of events during those early days of the Quincy build out as it relates to the articles. Maybe someday I will write a book as the process and adventures of those early days of birth of Big Infrastructure was certainly exciting. The bottom line is that the data center industry is amazingly complex and the forces in play are as varied as technology to politics to people and everything in between. There is always a deeper story. More than meets the eye. More variables. Decisions are never black and white and are always weighted against a dizzying array of forces.

\Mm

Pointy Elbows, Bags of Beans, and a little anthill excavation…A response to the New York Times Data Center Articles

I have been following with some interest the series of articles in the New York Times by Jim Glanz. The series premiered on Sunday with an article entitled Power, Pollution and the Internet, which was followed up today with a deeper dive in some specific examples. The examples today (Data Barns in a farm town, Gobbling Power and Flexing muscle) focused on the Microsoft program, a program of which I have more than some familiarity since I ran it for many years. After just two articles, reading the feedback in comments, and seeing some of the reaction in the blogosphere it is very clear that there is more than a significant amount of misunderstanding, over-simplification, and a lack of detail I think is probably important. In doing so I want to be very clear that I am not representing AOL, Microsoft, or any other organization other than my own personal observations and opinions.

As mentioned in both of the articles I was one of hundreds of people interviewed by the New York Times for this series. In those conversations with Jim Glanz a few things became very apparent. First – He has been on this story for a very long time, at least a year. As far as journalists go, he was incredibly deeply engaged and armed with tons of facts. In fact, he had a trove of internal emails, meeting minutes, and a mountain of data through government filings that must have taken him months to collect. Secondly, he had the very hard job of turning this very complex space into a format where the uneducated masses can begin to understand it. Therein lies much of the problem – This is an incredibly complex space to try and communicate it to those not tackling it day to day or even understand that technological, regulatory forces involved. This is not an area or topic that can be sifted down to a sound bite. If this were easy, there really wouldn’t be a story would there?

At issue for me is that the complexity of the powers involved seems to get scant attention aiming larger for the “Data Centers are big bad energy vampires hurting the environment” story. Its clearly evident reading through the comments on the both of the articles so far. Claiming that the sources and causes have everything to do from poor web page design to government or multi-national companies conspiracies to corner the market on energy.

So I thought I would take a crack article by article to shed some light (the kind that doesn’t burn energy) on some of the topics and just call out where I disagree completely. In full transparency the “Data Barns” article doesn’t necessarily paint me as a “nice guy”. Sometimes I am. Sometimes I am not. I am not an apologist, nor do I intend to do so in this post. I am paid to get stuff done. To execute. To deliver. Quite frankly the PUD missed deadlines (the progenitor event to my email quoted in the piece) and sometimes people (even utility companies) have to live in the real world of consequences. I think my industry reputation, work, and fundamental stances around driving energy efficiency and environmental conservancy in this industry can stand on its own both publicly and for those that have worked for me.

There is an inherent irony here that these articles were published in both print and electronically to maximize the audience and readership. To do that, these articles made “multiple trips” through a data center, and ultimately reside in one (or more). They seem to denote that keeping things online is bad which seems to go against the availability and need of the articles themselves. Doesn’t the New York times expect to make these articles available on-line for people to read? Its posted online already. Perhaps they expect that their micro-fiche experts would be able to serve the demand for these articles in the future? I do not think so.

This is a complex eco-system of users, suppliers, technology, software, platforms, content creators, data (both BIG and small), regulatory forces, utilities, governments, financials, energy consumption, people, personalities, politics, company operating tenets, community outreach to name a very few. On top of managing through all these variables they also have to keep things running with no downtime.

\Mm

The AOL Micro-DC adds new capability

Back in July, I announced AOL’s Data Center Independence Day with the release of our new ‘Micro Data Center’ approach. In that post we highlighted the terrific work that the teams put in to revolutionize our data center approach and align it completely to not only technology goals but business goals as well. It was an incredible amount of engineering and work to get to that point and it would be foolish to think that the work represented a ‘One and Done’ type of effort.

So today I am happy to announce the roll out of a new capability for our Micro-DC – An indoor version of the Micro-DC.

While the first instantiations of our new capability were focused on outdoor environments, we were also hard at work at an indoor version with the same set of goals. Why work on an indoor version as well? Well you might recall in the original post I stated:

We are no longer tied to traditional data center facilities or colocation markets. That doesn’t mean we wont use them, it means we now have a choice. Of course this is only possible because of the internally developed cloud infrastructure but we have freed ourselves from having to be bolted onto or into existing big infrastructure. It allows us to have an incredible amount geo-distributed capacity at a very low cost point in terms of upfront capital and ongoing operational expense.

We need to maintain a portfolio of options for our products and services. In this case – having an indoor version of our capabilities to ensure that our solution can live absolutely anywhere. This will allow our footprint, automation and all, to live inside any data center co-location environment or the interior of any office building anywhere around the planet, and retain the extremely low maintenance profile that we were targeting from an operational cost perspective. In a sense you can think of it as “productizing” our infrastructure. Could we have just deployed racks of servers, network kit, etc. like we have always done? Sure. But by continuing to productize our infrastructure we continue to drive down the costs relating to our short term and long term infrastructure costs. In my mind, Productizing your infrastructure, is actually the next evolution in standardization of your infrastructure. You can have infrastructure standards in place – Server Model, RAM, HD space, Access switches, Core switches, and the like. But until you get to that next phase of standardizing, automating, and ‘productizing’ it into a discrete set of capabilities – you only get a partial win.

Some people have asked me, “Why didn’t you begin with the interior version to start with? It seems like it would be the easier one to accomplish.” Indeed I cannot argue with them, it would have probably been easier as there were much less challenges to solve. You can make basic assumptions around where this kind of indoor solution would live in, and reduce much of the complexity. I guess it all nets out to a philosophy of solving the harder problems first. Once you prove the more complicated use case, the easier ones come much faster. This is definitely the situation here.

While this new capability continues the success we are seeing in re-defining the cost and operations of our particular engineering environments, the real challenge here (as with all sorts infrastructure and cloud automation) is whether or not we can map similar success of our applications and services to work correctly in that space. On that note, I should have more to post soon. Stay Tuned! Smile

\Mm

AOL’s Data Center Independence Day

Yesterday we celebrated Independence Day here in the United States. It’s a day where we embrace the freedoms we enjoy as a country, look back on where we have come, and celebrate the promise of the future. Yesterday was also a different kind of Independence Day for my teams at AOL. A Data Center Independence Day, if you will.

You may or may not have been following the progress of the work that we have been doing here at AOL over the last 14 or so months but the pace of change has been simply breathtaking. One of the first things I did when I entered into the company was deeply review all of the aspects of Operations. From Data Centers to Network Engineering, to the engineering teams supporting the products and services and everything in between. The net of the exercise was that AOL was probably similar to most companies out there in terms of technology mix, from the CRUFT that I mentioned in a previous post, to latest technologies. There were some incredible technologies built over the last three decades, some outdated processes and procedures, and if I am honest traces of a culture where the past had more meaning of the present or future.

In a very short period of time all of that changed. We aggressively made changes to the organization, re-aligned priorities, and perhaps most of all we created and defined a powerful collection of changes and evolutions we would need to bring about with very aggressive timelines. These changes were part of a defined Technology Roadmap that broke the work we needed to accomplish into three categories of work. The categorization focused on the internal technical challenges and tools we needed to make to enhance our own internal efficiencies. The second categorization focused on the technical challenges and aggressive things we could do to enhance and bring greater scalability to our products and services. This would include things like additional services and technology suites to our internally developed cloud infrastructure, and other items that would allow for more rapid product delivery of our products and services. The last categorization of work, was for the incredibly aggressive “wish list” types of changes. Items that could be so disruptive, so incredibly game-changing for us, that they could redefine our work on the whole. In fact we named this group of work “Nibiru” after a mythical planet that is said to cross into our solar system and wreaks havoc and brings about great change.

On July 4, 2012, one of our Nibiru items arrived and I am extremely ecstatic to state that we achieved our “Data Center Independence Day”. Our primary “Nibiru” goal was to develop and deliver a data center environment without the need of a physical building. The environment needed to require as minimal amount of physical “touch” as possible and allow us the ultimate flexibility in terms of how we delivered capacity for our products and services. We called this effort the Micro Data Center. If you think about the amount of things that need to change to evolve to this type of strategy it’s a bit mind-boggling.

Here is just a few of the things required to look at/change/and automate to even make this kind of achievement possible:

Developing an entirely new Technology Suite and the ability to deliver that capacity anywhere in the world with minimal to no staffing.
Delivering extremely dense compute capacity (think the latest technology) to give us the longest possible use of these assets once deployed into the field.
The ability to deliver a “Microdata Center” anywhere on the planet regardless of temperature and humidity settings
The ability to support/maintain/and administer remotely.
The ability to fit into the power envelope of a normal office building
Participation in our cloud environment and capabilities
The processes by which these facilities are maintained and serviced
and much much more…

In my mind, Its one thing to claim a technical achievement, its quite another to operationalize that achievement and make the process of supporting it repeatable. That’s my measure as to when you can REALLY declare victory. Science Experiments don’t count. It has to just plain work. To that end our first “beta” site for the technology was the AOL campus in Dulles, Virginia. Out on a lonely slab of concrete in the back of one of the buildings our future has taken shape.

Thanks in part to a lot of the work going on in the data center containerization space, we were able to jump start much of the work in a relatively quick fashion. In fact the pace set the Data Center and Technology Operations teams to deliver this achievement is more than a bit astounding. Most, if not all, of the existing AOL Data Centers would fall somewhere around a traditional Tier III / Tier II Uptime Institute definition. The teams really pushed ahead way outside their comfort zones to deliver some incredibly evolutions in a very short period of time. Of course there were steps along the way to get here. But those steps now seem to be in double time. A few months back we announced the launching of ATC, Our first completely automated facility. The work that went into ATC, was foundational to our achievement yesterday. It allowed us to really start working on the hard stuff first. That is to say the ‘Operationalization’ of these kinds of environments. It set the stage of how we could evolve to this next tier of evolution. Below is a summary of some of the achievements of our ATC launch, but if you were curious for the specifics on our work there feel free to click the ‘Breaking the Chrysalis’ post I did at that time. You can see how the work that we have been driving in our own internal cloud environments, the changes in operational procedure, the change in thought is additive and fundamental to our latest achievement. Its especially interesting to note that with all of the interesting blips and hiccups occurring in the ‘cloud industry’ like the leap second and the terrible storms on the East Coast this week which affected many data centers, that ATC, our completely unmanned facility just kept humming along with no issues (To be fair neither did our traditional facilities) despite much of the initial negative feedback we had received was solely based around the reliability of such moves. It goes to show how important engineering FOR Operation is. For AOL we have built this in from the start.

What does this actually buy AOL?

Ok, we stuck some computers in a box and we made sure it requires very little care and feeding – what does this buy us? Quite a bit actually. Jay Moran, the Distinguished Engineer who was in charge of driving this effort is always quick to point out that the problem space here is not just about the Technology. It has to be a marriage with the business side as well. Obviously the inherent flexibility of the design allows us a greater number of places around the planet we can deploy capacity to and that in and of itself is pretty revolutionary. We are no longer tied to traditional data center facilities or colocation markets. That doesn’t mean we wont use them, it means we now have a choice. Of course this is only possible because of the internally developed cloud infrastructure but we have freed ourselves from having to be bolted onto or into existing big infrastructure. It allows us to have an incredible amount geo-distributed capacity at a very low cost point in terms of upfront capital and ongoing operational expense. This is a huge game changer. So much so, allow me to do a bit of the ‘back of the napkin math’ with you. Lets call our global capacity in terms of compute, storage, etc. that we have today in our traditional environments – the Total Compute Capability or TCC. Its essentially the bandwidth for the work that we can get done. Inside the cost for TCC you have operating costs such power, lease costs, Data Center facility maintenance costs, support staff, etc. You additionally have the depreciation for the facilities themselves (or the specific buildouts – if colocating), the server and other equipment depreciation, and the rest. Lets call that baseline X. The MicroData Center strategy built out with the latest, our most dense server standards and infrastructure would allow us to have 5X the amount of total TCC in less than 10% of the cost and physical footprint. If you think about how this will allow us to aggregate and grow over time it ultimately drives us to a VERY LOW operational cost structure for delivering our products and services. Additionally it positions us for the future in very significant ways.

It redefines software architecture for greater resiliency
It allows us an incredibly flexible platform for driving and addressing privacy laws, regulatory oversight, and other such concerns allowing us to respond rapidly.
It further reduces energy consumption and carbon footprint emissions (important as taxation evolves around the world, as well as ongoing operational costs)
Gives us the ability to drive Edge Computing delivery to potentially bypass CDNs for certain content.
Gives us the capability to drive ‘Community-in-a-box’ whereby we can quickly launch new products in markets, quickly expand existing footprints like Patch in a low cost, but still hyper-local platform, allow the Huffington Post a platform to rapidly partner and enter new markets with minimal cost turn ups.
The fact that the technology mix in our SKUs is comprised of compute, storage, and network capacity maximizes the amount of products and services we can deploy to it.

As Always its really about the People

I cannot let a post about this huge win for us to go by without mentioning the teams involved in delivering this capability. This is not just a win for AOL, or to a lesser degree the industry at large in another proof-point that it cant evolve if it puts its mind to changing, but rather the Technology Teams at AOL. When I was first approached about joining AOL, my slightly sarcastic and comedic response was probably much like yours – ‘Are they still around?’ But the fact of the matter is that AOL has a vision of where they want to go, and what they want to be. That was compelling for me personally, compelling enough for me to make the move. What has truly amazed me however is the dedication and tenacity of its employees. These achievements would not be possible without the outright aggressiveness the organization has taken to moving the company forward. Its always hard to assess from the outside just how hard an effort is internally to achieve. In the case of our micro Data Center Strategy, the teams had just about every kind of barrier to deliver this capacity. Every kind of excuse to not make it, or even not to try. They put all of those things aside and just plain executed. If you allow me a small moment of bravado – Not only did my teams simply kick ass, they did it in a way that moved the needle for the company, and in my mind once again catapulted themselves into the forefront of operations and technology at scale. We still have a bunch of Nibiru projects to deliver, so my guess is we haven’t heard the last of some of these big wins.

\Mm