Online Operations – LooseBolts

Whiffs of Wisdom #18 – Project Managers and Security People

I am not sure why but for some reason this topic has come up for me like 8 times this week. Rather than continue to talk about it I figured I would just post one of my “Whiffs of Wisdom” some people call them “Manosisms”. Apparently I am the worst person in the world at coming up with anecdotes but people get my drift so in my book that means success.

Whiffs of Wisdom #18

On Project Managers and Security People

Every Technology organization needs Project Managers and Security-focused Engineers. There ACTUALLY IS a magic number of these individuals to have in your organization. I don’t know what that number is, but I know when I have one too many of either. These folks bring order to chaos (Engineers are notoriously terrible at project management) but the moment is starts becoming more about the process versus the END RESULTS I know we have gotten off track. There is nothing more effective than a great project manager and nothing more destructive than an overbearing rule-nazi project manager. You need to watch it closely because left to their own well-meaning devices these groups tend to create Bureaus of Business Prevention.

\Mm

IPv6 to IPv4 Translation Made Business Beautiful. Think an Easy, less painful to your business in transitioning your Data Center.

I am a lover of simple, efficient, and beautiful things. Ivan Peplnjak of ipSpace gets The Loosebolt’s Oscar Award for Elegance and Simplicity in a Complex Network Application. There may not be a little statue holding up a giant router or anything but his solution to IPv4 to IPv6 translation on the Internet is pretty compelling and allows the application developers and IT folks to “outsource” all concerns about this issue to the network.

At some point your Data Centers and network are going to have to tackle the interface between the commercial IPv4 Internet and the IPv6 Internet. If you are pretty aggressive on the IPv6 conversion in your data center, that pesky IPv4 Internet is going to prove to be a problem. Some think this can be handled by straight Network Address Translation, or having to dual home the servers in your data center on both networks. But this challenge has cascading challenges to your organization. Essentially it creates work for your System Admins, your developers, Web admins, etc. In short, you may have to figure out solutions at every level of the stack. I think Ivan’s approach makes it pretty simple and compelling if a bit of an unorthodox. His use of Stateless IP/ICMP Translation, which was originally intended a part of NAT64 and not on its own, solves an interesting problem and allows businesses to begin the conversion in a way that allows them to solve it one layer at a time and still allow those non-adopting IPv4 folks access to all the goodness within your data center.

His webcast on his approach can be found here.

\Mm

Uptime, Cowgirls, and Success in California

This week my teams have descended upon the Uptime Institute Symposium in Santa Clara. The moment is extremely bittersweet for me as this is the first Symposium in quite sometime I have been unable to attend. With my responsibilities expanding at AOL beginning this week there was simply too much going on for me to make the trip out. It’s a down right shame too. Why?

We (AOL) will be featured in two key parts at Symposiums this time around for some incredibly ground breaking work happening at the company. The first is a recap of the incredible work going on in the development of our own cloud platforms. Last year you may recall that we were asked to talk about some of the wins and achievements we were able to accomplish with the development of our cloud platform. The session was extremely well received. We were asked to come back, one year on, to discuss about how that work has progressed even more. Aaron Lake, the primary developer of our cloud platforms and my Infrastructure Development Team, will be talking on the continued success, features, and functionality, and the launch of our ATC Cloud Only Data Center. Its been an incredible break neck pace for Aaron and his team and they have delivered world-class capabilities for us internally.

Much of Aaron’s work has also enabled us to win the Uptime Institutes First Annual Server Round Up Award. I am especially proud of this particular honor as it is the result of an amazing amount of hard work within the organization on a problem faced by companies all over the planet. Essentially this is Operations Hygiene at a huge scale, getting rid of old servers, driving consolidation, moving platforms to our cloud environments and more. This talk will be lead by Julie Edwards, our Director of Business Operations and Christy Abramson, our Director of Service Management. Together these two teams led the effort to drive out “Operational Absurdities” and our “Power Hog” programs. We have sent along Lee Ann Macerelli and Rachel Paiva who were the primary project managers instrumental in making this initiative such a huge success. These “Cowgirls” drove an insane amount of work across the company resulting in over 5 million dollars of un-forecasted operational savings, proving that there is always room for good operational practices. They even starred in a funny internal video to celebrate their win which can be found here using the AOL Studio Now service.

If you happen to be attending Symposium this year feel free to stop by and say hello to these amazing individuals. I am incredibly proud of the work that they have driven within the company.

\Mm

The Cloud Cat and Mouse Papers–Site Selection Roulette and the Insurance Policies of Mobile infrastructure

Its always hard to pick exactly where to start in a conversation like this especially since this entire process really represents a changing life-cycle. Its more of a circular spiral that moves out (or evolves) as new data is introduced than a traditional life-cycle because new data can fundamentally shift the technology or approach. That being said I thought I would start our conversations at a logical starting point. Where does one place your infrastructure? Even in its embryonic “idea phase” the intersection of government and technology begins its delicate dance to a significant degree. These decisions will ultimately have an impact on more than just where the Capital investments a company decides to make are located. It has affects on the products and services they offer, and as I propose, an impact ultimately on the customers that use the services at those locations.

As I think back to the early days of building out a global infrastructure, the Site Selection phase started at a very interesting place. In some ways we approached it with a level of sophistication that has still to be matched today and in other ways, we were children playing a game whose rules had not yet been defined.

I remember sitting across numerous tables with government officials talking about making an investment (largely just land purchase decisions) in their local community. Our Site Selection methodology had brought us to these areas. A Site Selection process which continued to evolve as we got smarter, and as we started to truly understand the dynamics of the system were being introduced to. In these meetings we always sat stealthily behind a third party real estate partner. We never divulged who we were, nor were they allowed to ask us that directly. We would pepper them with questions, and they in turn would return the favor. It was all cloak and dagger with the Real Estate entity taking all action items to follow up with both parties.

Invariably during these early days - these locales would always walk away with the firm belief that we were a bank or financial institution. When they delved into our financial viability (for things like power loads, commitment to capital build-out etc.) we always stated that any capital commitments and longer term operational cost commitments were not a problem. In large part the cloak and dagger aspect was to keep land costs down (as we matured, we discovered this was quite literally the last thing we needed to worry about) as we feared that once our name became attached to the deal our costs would go up. These were the early days of seeding global infrastructure and it was not just us. I still laugh at the fact that one of our competitors bound a locality up so much in secrecy – that the community referred to the data center as Voldemort – He who shall not be named, in deference to the Harry Potter book series.

This of course was not the only criteria that we used. We had over 56 by the time I left that particular effort with various levels of importance and weighting. Some Internet companies today use less, some about the same, and some don’t use any, they ride on the backs of others who have trail-blazed a certain market or locale. I have long called this effect Data Center Clustering. The rewards for being first mover are big, less so if you follow them ultimately still positive.

If you think about most of the criteria used to find a location it almost always focuses on the current conditions, with some acknowledge in some of the criteria of the look forward. This is true for example when looking at power costs. Power costs today are important to siting a data center, but so is understanding the generation mix of that power, the corresponding price volatility, and modeling that ahead to predict (as best as possible) longer term power costs.

What many miss is understanding the more subtle political layer that occurs once a data center has been placed or a cluster has developed. Specifically that the political and regulatory landscape can change very quickly (in relationship to the life of a data center facility which is typically measured in 20, 30, or 40 year lifetimes). It’s a risk that places a large amount of capital assets potentially in play and vulnerable to these kinds of changes. Its something that is very hard to plan or model against. That being said there are indicators and clues that one can use to at least play risk factors against or as some are doing – ensuring that the technology they deploy limits their exposure. In cloud environments the question remains open – how liable are companies using cloud infrastructure in these facilities at risk? We will explore this a little later.

That’s not to say that this process is all downside either. As we matured in our approach, we came to realize that the governments (local or otherwise) were strongly incented to work with us on getting us a great deal and in fact competed over this kind of business. Soon you started to see the offers changing materially. It was little about the land or location and quickly evolved to what types of tax incentives, power deals, and other mechanisms could be put in play. You saw (and continue to see) deals structured around sales tax breaks, real estate and real estate tax deals, economic incentives around breaks in power rates, specialized rate structures for Internet and Cloud companies and the like. The goal here of course was to create the public equivalent of “golden handcuffs” for the Tech companies and try to marry them to particular region, state, or country. In many cases – all three. The benefits here are self apparent. But can they (or more specifically will they) be passed on in some way to small companies who make use of cloud infrastructure in these facilities? While definitely not part of the package deals done today – I could easily see site selection negotiations evolving to incent local adoption of cloud technology in these facilities or provisions being put in place tying adoption and hosting to tax breaks and other deal structures in the mid to longer timeframe for hosting and cloud companies.

There is still a learning curve out there as most governments mistakenly try and tie these investments with jobs creation. Data Centers, Operations, and the like represents the cost of goods sold (COGS) to the cloud business. Therefore there is a constant drive towards efficiency and reduction of the highest cost components to deliver those products and services. Generally speaking, people, are the primary targets in these environments. Driving automation in these environments is job one for any global infrastructure player. One of the big drivers for us investing and developing a 100% lights-out data center at AOL was eliminating those kinds of costs. Those governments that generally highlight job creation targets over other types typically don’t get the site selection. After having commissioned an economic study done after a few of my previous big data center builds I can tell you that the value to a region or a state does not come from the up front jobs the data center employs. After a local radio stationed called into question the value of having such a facility in their backyard, we used a internationally recognized university to perform a third party “neutral” assessment of the economic benefits (sans direct people) and the numbers were telling. We had surrendered all construction costs and other related material to them, and they investigated over the course of a year through regional interviews and the like of what the direct impacts of a data center was on the local community, and the overall impacts by the addition. The results of that study are owned by a previous employer but I can tell you with certainty – these facilities can be beneficial to local regions.

No one likes constraints and as such you are beginning to see Technology companies use their primary weapon – technology – to mitigate their risks even in these scenarios. One cannot argue for example, that while container-based data centers offer some interesting benefits in terms of energy and cost efficiencies, there is a certain mobility to that kind of infrastructure that has never been available before. Historically, data centers are viewed as large capital anchors to a location. Once in place, hundreds of millions to billions (depending on the size of the company) of dollars of capital investment are tied to that region for its lifespan. Its as close to permanent in the Tech Industry as building a factory was during the industrial revolution.

In some ways Modularization of the data center industry is/can/will have the same effect as the shipping container did in manufacturing. All puns intended. If you are unaware of how the shipping container revolutionized the world, I would highly recommend the book “The Box” by Marc Levinson, it’s a quick read and very interesting if you read it through the lens of IT infrastructure and the parallels of modularization in the Data Center Industry at large.

It gives the infrastructure companies more exit options and mobility in the future than they would have had in the past under large capital build-outs. Its an insurance policy if you will for potential changes is legislation or regulation that might negatively impact the Technology companies over time. Just another move in the cat and mouse games that we will see evolving here over the next decade or so in terms of the interactions between governments and global infrastructure.

So what about the consumers of cloud services? How much of a concern should this represent for them? You don’t have to be a big infrastructure player to understand that there are potential risks in where your products and services live. Whether you are building a data center or hosting inside a real estate or co-location provider – these are issues that will affect you. Even in cases where you only use the cloud provisioning capabilities within your chosen provider – you will typically be given options of what region or area would you like you gear hosted in. Typically this is done for performance reasons – reaching your customers – but perhaps this information might cause you to think of the larger ramifications to your business. It might even drive requirements into the infrastructure providers to make this more transparent in the future.

These evolutions in the relationship between governments and Technology and the technology options available to them will continue to shape site selection policy for years to come. So too will it ultimately affect the those that use this infrastructure whether directly or indirectly remains to be seen. In the next paper we will explore the this interaction more deeply as it relates to the customers of cloud services and the risks and challenges specifically for them in this environment.

\Mm

DataCentres2012–Nice, France

Next month I will be one of the key note speakers at the DataCentres2012 conference in Nice, France. This event produced and put on by the BroadGroup is far and away the most pre-eminent conference for the Data Center Industry in Europe. As an alumni of other BroadGroup events I can assure you that the quality of the presentations and training available is of the highest quality. I am also looking forward to re-connecting with some great friends such as Christian Belady of Microsoft, Tom Furlong from Facebook and others. If you are planning on attending please feel free to reach out and say hello. It’s a great opportunity to network, build friendships, and discuss the issues pressing our industry today. You can find out more by visiting the event website below.

http://www.datacentres2012.com/

\Mm

This is just lost on so many companies / organizations…

Having experienced nearly all of the pain and desire one could have in trying to scale out applications, operations, and infrastructure, I have become a huge proponent of blending efforts between Development with Operations. Additionally I think the blend should include lower level stuff like facilities as well. The entire online paradigm fundamentally changes how the problem space should be viewed.

With Concepts like NoOps, DevOps, and the like becoming fashionable in the Development community its probably no surprise that these issues are being addressed from people’s own comfort spaces. To a development engineer – those Ops folks are crusty and cranky. To an Operations engineer those darn developers don’t really code for long term operations. Its always the ‘throw the code over the wall’ and the Ops folks will make it work mentality. In reality both sides are right.

The simple truth is (in my opinion) that the University System is to a large degree failing the industry especially when it comes to developing for future platforms. Graduates are coming out by the thousands versed in the development of Java, Ruby on Rails, and insert your favorite flavor of high level web platform here. The amount of graduates who understand the underlying systems, and more basically how things work are becoming rarer by the year. Add to this mix an understanding of developing for code to RUN, and the infrastructural and operational requirements associated with it you are dealing with a very rare set of skill sets. Many of the big companies who do build for the RUN of software (read as SAAS, large scale online services, etc) actually go through a bit of “re-education” with new hires to either teach them these skill sets for the first time, or “re-program” the bad stuff they learned out.

I had a series of related things come through my inbox and a video shared with me from last years Velocity conference. I think they are powerful thought provokers to read and watch.

The first is the video from Velocity by Theo Schlossnagle, a founder and principal at OmniTI. It is somewhat skunk-worked under the heading of Career Development. Theo takes this from the perspective of the individual, but it is easily applied to organizations at large. Its 13 minutes, but well worth the time.

http://velocityconf.com/velocity2011/public/schedule/detail/20406

The second is a post by Adrian Cockcroft from Netflix talking about the development and evolution of DevOps/NoOps in the culture there. The approach is right on, although I think to a large degree some of the real “ops” stuff has been outsourced to the cloud provider. That being said I think the mindset shown here from a broader “Development Responsibilities” is definitely in the right way to think about the problem space. I have often talked about the NetFlix Chaos Monkey approach and just how powerful that paradigm is:

http://perfcap.blogspot.com/2012/03/ops-devops-and-noops-at-netflix.html

The last is actually a response to the Cockcroft post by John Allspaw who used to run Flickr’s Operations and is now at Etsy. Who while arguing the benefit of a stronger Ops presence and involvement also highlights the benefits of having Development and Engineering more aware of their surroundings.

Happy Reading!

\Mm