Cloud for Virtualization Engineers: Breaking Free from the Non-Volatile Ball & Chain

The concepts discussed in this article can apply to VMware vCloud Director, CloudStack, or other IaaS platform. This article is meant to provide design considerations in the planning stage of rolling out an IaaS solution. Garbage in, garbage out, so let's think smarter.

In the first article of this series, “CloudStack for Virtualization Engineers - A New Series from ThinKVIRT,” I discussed some of the key technologies, both open source and proprietary.  I also outlined a few technologies that will be covered in this series, including:

  • OpenStack Swift as an object-storage layer
  • Citrx CloudStack as an IaaS layer managing a vSphere, et al. environment
  • VMware Cloud Foundry as a PaaS layer on top of the IaaS

This first article touched on, “abstracting the Operating System” due to an, “increasing importance of the apps that an organization uses and the decreasing importance of the operating system.”  This is an important concept to understand as it is a cornerstone of designing a cloud solution.  This article will discuss why the abstraction of the underlying operating system instance from the application or service provided is a concept of paramount importance.  This will be covered in even greater detail.

The concept of volatility, as discussed in this article, is not something proprietary to a CloudStack or VMware vCloud Director-based solution. It is a consideration that should be made when designing your internal IaaS solution as well as how external IaaS providers will be leveraged.

Werner Vogels' presentation at AWS Gov Summit 2011 discussed the advantages of instance volatility, such a significantly lower price per instance and the importance of time-to-market. Casey Coleman, the keynote speaker, also discussed how her agency is realizing very significant cost savings by adopting the use of cloud computing in various forms (IaaS, PaaS, SaaS).

Are You Ready for Cloud?

There you are at your desk, the king of virtualization at your organization and your boss, or your boss’ boss, just read the latest article on cloud and is now demanding that you put your cloud strategy together.  You’re the, “VMware ninja,” the “virtualization rockstar,” regular VMware tweeter, and in some cases a recognized, “VMware vExpert.”  Surely the future of the organization’s cloud strategy rests upon your shoulders, but are you prepared to change the way you think of virtual machines and the services they provide?

Let’s start by calling them instances.  One of the techniques I use when I talk cloud, especially with seasoned virtualization (typically VMware-focused) professionals is that I use Amazon-esque lingo as much as possible.  Why?  The first step to thinking differently is to change the conversation.  Cloud is not server virtualization.

From an IaaS perspective, it’s possible you aren’t ready for cloud and instead you’re ready for a smarter way of doing server virtualization.  If you’re looking to simply automate virtual machine lifecycle and not change the way you design solutions, cloud is not for you (yet).

Let’s pretend we know nothing of VMware, we don’t design DRS clusters, and we aren’t in VMware vCenter 10 hours a day.  Instead, let’s pretend we’ve been asked to build a new environment completely in AWS.

Why Do We Keep Talking About Amazon?

Many online reports have Amazon well out in front of the public cloud service provider market, with Rackspace likely in P2.  There are also dozens of success stories from companies using Amazon for application hosting (like 99designs), content delivery (like IMDb), high-performance computing (like AeroDynamic Solutions), and other uses (see Etsy, Hardvard Medical School, NASA JPL, and The Washington Post).  Someone could spend a day reading through all of the AWS case studies.  As Randy Bias states, “probably the number one reason for Amazon’s success isn’t what they let you do, but what they don’t let you do.” 

To physical or server virt engineers using AWS for the first time, there’s normally a rude awakening when they realize that Amazon EC2 instances backed by Amazon S3 are volatile. I did not want to get into a long discussion on S3-backed versus EBS-backed EC2 instances, but for a phenom write-up on the topic, please see this article by Gerardo Viedma. Nevertheless, AWS can teach us all a little something about how to build our own on-prem clouds more efficiently; as such, I'm a big believer in looking at techniques used in AWS and how we can apply them (where it makes sense) to our on-prem solutoins.

Volatility – The Reaction to Abstraction

Often, someone that brings up the importance of volatility in a meeting with a room full of server engineers will be chased out of the building with pitchforks and declared a witch.  “Blasphemer!" they will shout.  This is the reaction to abstraction, but stay the course!

Volatile instances:

  • Force engineers to use an external, non-volatile repository for non-volatile data
    • This can be in AWS (e.g. Simple Storage Service (S3), Elastic Block Store (EBS)), in another cloud provider, or on-prem storage (e.g. corporate owned file server). This lowers the cost of the compute (as they are now more generic) and lowers the cost of the storage (because we are only keeping what we truly need).
  • Focuses the solution’s design above the operating system with the assumption that the instances will fail
    • This inherently makes engineers design for failure (versus relying on hypervisor features such as VMware Fault Tolerance) which ultimately creates a more robust solution.

Volatility should not be used with a negative connotation as volatility is one of the components of keeping costs down in cloud (which, isn’t that one of the primary drivers in the first place?).  Designing for volatility can also allow an organization to capitalize on the lowest prices possible, as is the concept behind EC2 Spot Instances.  EC2 Spot Instances “allow customers to bid on unused Amazon EC2 capacity and run those instances for as long as their bid exceeds the current Spot Price,” which is based on supply and demand.  Amazon EC2 Spot Instances are one of the more popular examples of extremely volatile+affordable instances with other avenues emerging on the market, such as SpotCloud.  EC2 Spot Instances may be the extreme of volatility, but being able to use a Spot Instance is not a terrible goal.

This is not to say that non-volatile instances do not have their place (Domain Controller perhaps), but should also not be the status quo.

In a typical ServerVirt solution, the operating system, the VM’s identity (hostname, IP address, etc.), the application, and the application’s settings all reside within the same instance.  Therefore, it becomes critical to protect that instance  because the loss of that instance means the loss of an application or service.  Protection techniques include:

  • VMware HA (restarts the instance on another host in the cluster)
  • VMware Fault Tolerance (a hot-warm instance scenario)
  • Backup software
  • Snapshots

Regardless of the technique, the recovery process still looks the same; recover the OS, the identity, the application/service, configuration settings, registry entries, et cetera.  The solution is tied to an individual virtual machine.

In a mature Cloud solution, the instance has an application/service installed on an operating system.  In addition the instance has data that is deemed non-volatile (such as configuration variables and the instance’s identity).  An instance just die?  No problem, spin up another instance, script/deploy the identification of the instance, and viola!  This type of a design allows a given solution to not only mitigate individual instance failures, but also mitigate the outage of an entire region (if done correctly) with relative ease.

Do I have to use non-volatile instances to do cloud or to use VMware vCloud Director/CloudStack?

No.  However, if you are looking at Cloud, it’s a design concept that should be weighed heavily.

The reason I’ve spent the first article and a half in this series discussing, emphasizing, and re-emphasizing concepts (some outside of the VMware mainstream) is because it’s important to not design your cloud solution as a run-of-the-mill ServerVirt environment as it will likely become a stop-gap solution until people start using their own Stealth Clouds (as discussed by @werner, Amazon CTO). 

Stealth Clouds are cloud resources that business units spin up outside of their own IT’s purview.  Stealth Clouds are most often used to circumvent the latency of internal IT departments (think of it in terms of time-to-market).

Design a cloud using the old mentality of ServerVirt and watch as the cloud you build becomes bypassed in months/quarters/years for a more flexible solution (AWS, perhaps).  Stay ahead of the curve and bring real value.

"Change before you have to." - Jack Welch.

Conclusion

Now that (hopefully) the concepts of volatile instances and non-volatile storage are sinking in, the next step is to take these concepts and plug them into an IaaS solution. The next article will focus on:

  • Defining your offering
  • Quick Way to Get CloudStack Up and Running
  • The Service Catalog

Again, thanks to Andy Murphy and others for their feedback on the article.