I had an interesting discussion today with an engineering manager at one of my customers. We were discussing the capabilities of ESX and Dynamic Resource Scheduling (DRS). During the discussion I needed to explain how virtualization helps to build clouds, but not grids. At least not typically. And not Supercomputers. At least not yet.
Grid Computing generally refers to breaking up a large compute intensive work load into smaller blocks, then scheduling those blocks to be run across a set of smaller computer systems (typically small enough so that the work load will utilize nearly all, if not all, the computing capacity of the computer system). Widely known examples of this are the SETI@Home project or the Search for Cancer projects that ran as screen savers on your PC when you were not using it. Typically in a high performance grid environment you don’t want all the overhead of a traditional operation system, so each system runs a special operating system that just performs the compute job on the block of work and then returns the answer to centralized scheduler before getting the next block of work.
Cloud Computing on the other hand is about turning your compute capacity into an on demand utility. Self-service infrastructure that allows any compute job containing inside of, just about, any container (Operating System). The computer systems of a cloud envrionment can be small or big, lately the trend has been toward big. Users can request access, load and run their compute jobs and be billed just for what they use. Need more, add more and pay more. Need less, remove capacity and pay less. Some Cloud Systems are more restrictive like grids in that their infrastructure is designed for running certain types of applications (LAMP stacks). Others are less restrictive and let you run any application.
Virtualization is the foundation of Cloud Computing. Hypervisors like ESX provide the infrastructure and manamgent tools layer on top of that to add the self-service, automation, and control. Virtualization can also be the infrastructure on which Grid Computing can run. The reasons you would do this is for the flexability of managing the underlying hardware or if the underlying hardware has more compute capacity than the Grid can use for each compute job. While this is usually technically, I haven’t seen it very often.
Once you connect a large storage array, fast and large network pipes (10GigE is getting more popular for this, and faster unified fabrics is not far away), and large computing capacty (cpu and memory) you get something that looks awfully similar to a supercomputer. But one thing is always the case for either Grid Computing and Cloud Computing. The compute workload (in whole or the grid block) is always running on just one physical computer within the environment.
If you have 3 VMs running on an ESX host and that host only has one CPU/core and 1 Gig of RAM capacity left and the next VM to run needs 2 cores or 2 Gigs of RAM, that square peg won’t fit into the round hole that is available. With VMware’s DRS, the system is smart enough to either place the square peg into a square hole of the right size, or shuffle around the VMs to turn the round hold into the right sized square hole. But virtualization can’t split that computing workload up and run it across the compute capacity of two physical systems. Or take the CPU from two physical machines or RAM from two physical machiens and give them both to the same VM.
That is the line in the sand that separates Cloud Computing and Grid Computing from Supercomputing. At least for today. I wonder how much longer that line will exist? In the near future you will see virtualization provide more capabilities that used to be the realm of specialized systems. It won’t be that much longer until you see if step across that supercomputer line.