Technology Ramblings – Page 4

Enterprise IT Planning for Clouds

June 27, 2009 15 Comments

Over the past few weeks I have been involved in long term planning discussion with the senior IT management from multiple clients. While I can’t go into details of these meetings, a few common general trends emerged in these long term virtualization strategies.

First, all of them were roughly at the benchmark of having 30% of their compute workloads virtualized and looking at how to get well beyond that (see my prior post on Breaking the 30% Barrier). Part of the growth strategy included defining a specific set of applications that are set aside from virtualizing in the next wave, typically somewhere between 10%-30% of the overall computer workloads. General reasons for this are:

Organizing things into the next logical set of workloads primed for easy virtualization
Setting aside the workloads where resistance was being felt toward virtualization to give the business users more time to warm up to virtualization
Workloads that are “too big” to virtualize (typically because of CPU requirements, storage requirements, or IO requirements; some of these are just misnomers with the current VM scalability limits of vSphere 4)
workloads where the ISV specifically don’t support the software running in a VM (this is becoming less and less as more ISVs actively embrace virtualization or Enterprise customers flatly tell their ISVs “we’re running it in a VM, get on board”.)

Second, they are all planning on building a specific internal cloud within part of their infrastructure. This alone isn’t that surprising. There are specific use cases where a self-service internal cloud solves a lot of problems for the business users, most glaring being dev/test scenarios where lots of dynamic short used workloads are involved and web hosting where the business units (typically marketing) need to be able to react faster to market opportunities and popularity spikes of new products and viral marketing activity.

What is surprising is that when IT started to talk about the ideal end state view of their “non cloud” virtualized environment…it was essentially a cloud. As Jian Zhen described recently in The Thousand Faces of Cloud Computing Part 4: Architecture Characteristics there are a set of architectural characteristics that describe cloud computing:

Infrastructure Abstraction
Resource Pooling
Ubiquitous Access
On-Demand Self-Service
Elasticity

(Note: Jian Zhen changed his list of characteristics in the above post from his initial The Thousand Faces of Cloud Computing post.

The enterprise long term vision for their virtualized computing environment include all of these characteristics with exception of On-Demand Self-Service and in some ways Ubiquitous Access. On-Demand Self-Service is typically not in their plans because the Enterprises don’t have a key part of this, an internal finance model that allow for charge back of resources used — though most seem to be thinking about that. On-Demand isn’t as needed in this part of the enterprise environment as the workloads are the known sized, planned, enterprise applications that are the classic “Enterprise IT”. Ubiquitous Access is also something that isn’t being thought of by IT for this part of their environment primarily because access to these workloads is already pre-defined by the workloads themselves: web servers are accessed by web browsers, email is from an email client (whether static or mobile), etc.

And yet, all the other things that Enterprise IT strategist are thinking about fall squarely in the realm of “cloud computing”. Get the business users to think in terms of capacity and SLAs and abstract all other aspects of the infrastructure from them. And then drive up the utilization on that infrastructure to maximize their ROI. Some are still only comfortable driving per physical server utilization up to 50%-60% range while others are damning the torpedoes and want to get as close to 100% as possible. On an overall basis across the entire infrastructure, you can never reach 100% utilization because not all work loads are that consistent, this is where resource pooling and elasticity come into play.

Thought I do have to argue that Resource Pooling is not the best term to use for what is meant for this characteristic. Creating and managing pools of resources is included in this specific characteristic, but I think a more accurate term to describe this is as Resource SLAs. The end users of the environment are buying a specific amount of resources as a “guaranteed maximums” or as a an “on average maximum”. The architecture of the cloud needs to ensure that spikes in resource usage by one user are serviced up to their agreed upon limit, but also allow IT to “over subscribe” the environment during the non-spike times. Then mixing in guaranteed with on average work loads allow the performance spike of guaranteed work loads to be serviced at the cost of the on average work loads should no extra capacity be available.

It becomes a game of how tight of an IT airship do you want to run…