A Market Void: Multi-Tenant VDI Solutions
As someone who has been almost primarily focused on VDI for quite some time it has come to my attention that there is an under-served VDI market – service providers (SP’s). Service Providers and organizations needing to adopt a SP-style VDI solution have unique requirements that aren’t found in a typical VDI solution. I am now on almost weekly calls with a customer, OEM, or SI looking to provide this capability. It’s important to note that SP VDI (or, desktops-as-a-service, DaaS) is not traditional VDI.
The unique requirements of multi-tenant VDI include:
Contrary to what your favorite vendor sales rep may tell you, the two major VDI players, Citrix and VMware, do not have an out-of-the-box multi-tenant VDI solution. There are lots of people throwing the word, “cloud” around for various reasons and I often hear it applied to VDI conversation. Citrix even has a whitepaper that covers multi-tenant VDI, “Citrix Reference Architecture for Multi-Tenant Desktop as a Service,” whereby the cover using a solution of Citrix XenApp and Microsoft Remote Desktop Services.
A typical sales call may go something like the following:
SP Customer: “We need to build a desktop-as-a-service offering for multiple tenants on our hardware platform of choice.”
Citrix/Microsoft/VMware Rep: “No problem. We do cloud. We do VDI. We do CLOUD VDI!”
SP Customer: “Great, can you show us?”
To understand why the main players don’t fit the desktop-as-a-service (DaaS) model, it’s important to understand the four multi-tenant bullets listed above.
Since Citrix has published, "Citrix Reference Architecture for Multi-Tenant Desktop as a Service," which in theory, should cover the contents of this article, I want to clarify the important difference between VDI and session-based desktops:
The Citrix reference architecture talks about session-based desktops, which is not within the scope of this article. This article is about multi-tenant VDI.
Multi-tenant management is the ability for a cloud tenant to have omnipotence over the instances, data, and networks in their cloud-hosted solution. In terms of an SP VDI solution this means the vDesktops, the master images, the application distribution mechanism (if applicable), patching, user data, vDesktop networks, access policies, pool size, et cetera. Essentially, the tenant’s management portal needs the ability to perform the primary tasks performed in the VMware View Admin Console or XenDesktop Desktop Delivery Controller console.
In addition, the multi-tenant management solution needs to have the ability to securely provide this level of access to multiple tenants. Unfortunately, this is the first hurdle the major players, Citrix XenDesktop, VMware View, and Microsoft VDI trip over. These solutions have one primary console that’s used to manage the entire environment.
The above diagram attempts to illustrate one Service Provider management console (MGMT0) and two tenant consoles (MGMT1 and MGMT2, respectively). The MGMT0 has the ability to manage both MGMT1 and MGMT2 to perform appropriate tasks from a Service Provider perspective. However, MGMT1 and MGMT2 do not have access to one another or to the Service Provider environment.
While Microsoft, Citrix and VMware all offer role-based user access controls, meaning that granular administrative privileges can be granted in their respective management consoles, they do not offer the ability to have a self-contained management consoles per tenant. Microsoft, Citrix and VMware offer one entry point into their one management console. For multi-tenant administration to work, the solution requires multiple entry points and multiple administrative domains (e.g. one for the Service Provider, one for Tenant1, one for Tenant2, etc.). The aforementioned Citrix whitepaper leveraging Microsoft RDS discusses using an Active Directory forest with each tenant having their own OU, which is by no means a proper multi-tenant solution in today’s world.
The ability to use a local/native authentication mechanism (e.g. local accounts), the Service Provider’s authentication infrastructure (e.g. a dedicated OU or child domain within the ServiceProvider forest), or an external LDAP mechanism (e.g. Google Apps Directory Sync). The ability to use an external LDAP mechanism is likely most important as more organizations use SaaS-delivered business applications, and a single cloud-based authentication authority for an organization may prove useful (e.g. Tricipher).
Microsoft Remote Desktop Virtualization Host = FAIL.
Citrix XenDesktop = FAIL.
VMware View = FAIL.
Multi-Tenant Network Segmentation
Multi-tenant network segmentation is the ability for each tenant to have an independent network topology irrelevant of other tenants in the desktop cloud that includes:
The above diagram attempts to depict a physical network being virtualized by VLANs and/or hypervisor and/or network hypervisor, supporting two tenants, Red and Green, respectively. For a multi-tenant VDI solution at a Service Provider scale to be feasible, it must use Layer 3 techniques as its method of scalability as opposed to the typical Layer 2 techniques used by most non-Service Provider datacenters.
In a mature multi-tenant VDI solution, a tenant will want control over the access the vDesktops have in getting out to the internet, in getting to other cloud-based resources, and in getting to each other. In addition, the back-end, provided by the Service Provider, must be able to scale to mega scale. Service Providers used to serving up dozens or hundreds or server OS virtual machines may experience design bottlenecks when serving up hundreds, thousands or tens of thousands of desktop OS virtual machines.
These solutions use a Controller to orchestrate the manipulation of forwarding tables and other configurations of network gear (whether virtual or physical). This allows significantly greater flexibility and freedom for virtual machines (for example) on where they live in an overall cloud or multi-site environment. This technology also allows exponentially greater scalability as well as capabilities such as firewalls, routers, and intelligent switching. For further information, please refer to the OpenFlow wiki.
Standard Switches, Distributed Switches, Network Isolation
For those familiar with the major hypervisor solutions on the market, you are likely familiar with standard vSwitch technology, which provides basic Layer 2 functionality in a virtualized environment. Standard vSwitches are configured on a per host basis, although scripting techniques can be used to automate the configuration of the standard vSwitches.
Distributed vSwitches offer greater functionality over their standard vSwitch counterparts. These features include network vMotion support, virtual PortChannels, LACP, more advanced load balancing, inbound/outbound rate management, Private VLANs, ACLs, Netflow, et cetera. Distributed vSwitches are configured on a cluster level and enables centralized provisioning and management, a must-have for any service provider looking to streamline operational tasks, contain operational costs, and increase their of reliability.
Network isolation can be obtained through VLANs, whether public or private. Public VLANs would require the upstream network infrastructure connected to the hypervisor hosts to be configured, either manually or automatically as new tenants are added to the environment. Private VLANs would require manual or programmatic configuration of the distributed switches whenever a new tenant is added to the environment. In any service provider solution I’ve ever dealt with, any manual process that manipulates the infrastructure must be automated. If it can’t be automated, then it’s considered to be the wrong solution or the wrong framework. This emphasized the importance of an ability to manipulate an environment via an API or series of API’s. Private VLAN’s certainly provide network isolation, but private VLAN’s don’t dynamically build routing tables and allow for central processing of the dataflow within a multi-tenant cloud.
Why Network Isolation is Important
For a multi-tenant VDI solution, it is possible, perhaps even likely, that a tenant will want to connect their cloud-based VDI to their own on-premise resources (or even other cloud-based resources). Without network isolation, TenantA and TenantB would not be able to have the same on-premise IP scheme and connect their on-premise resources to their own respective cloud-based VDI. More importantly, they likely would not have any control over the IP scheme of their cloud-bsaed VDI.
Software Defined Networking (SDN)
Software defined networking (SDNs) is a technology that separates the control plane from the data plane; meaning that the routing (data plane) portion is still controlled by the hardware switch or virtual switch but the intelligence overseeing the entire network topology and making decisions on packet flow is controlled centrally by a Controller.
Doesn’t VMware vShield handle this from a central point for me? After all there’s vShield App (read: hypervisor-level virtual firewall appliance) and vShield Edge (read: DHCP, NAT, virtual load balancer, virtual VPN appliance, virtual-machine level virtual firewall appliance).
Kind of, but more importantly, only for your virtualized network components running within a VMware environment. Allwyn Sequeira, VP/CTO of Cloud Infrastructure Security & Networking at VMware wrote a blog post worth reading for VMware cloud aficionados, "Let's get logical - the case for network virtualization," in which he discusses the need to, “un-tether VMs from the underlying physical network, much as we un-tethered OSes from the server hardware.” I couldn’t agree more. Unfortunately, this solution should be able to manage hardware devices and potentially virtual switches on other hypervisors, which the vShield stack is not capable of at this moment.
OpenFlow / Nicira
Enter OpenFlow, Nicira, or a SDN solution that can live on both virtual switches as well as physical switches. In this scenario, the hardware (virtual/physical) controls the datapath, the OpenFlow (e.g.) software controls the control path, and the (OpenFlow) Controller manages the control path on all devices via the secured OpenFlow Protocol. In addition, the full list of network features do not have to be baked into the solution, as is the case in a hardware appliance. In today’s typical hardware network appliance, all of the features are cooked into the firmware/OS and are enabled via license keys. This makes for a very bloated operating environment.
To the contrary, in a SDN scenario, the network OS (e.g. OpenFlow) has hooks for various features. Need a VPN capability? Plug-in the VPN module. Need load balancing? Plug-in the load balancing module. This plug-in happens via the exposed API of the network OS.
Central management of virtual and physical devices in a much more streamlined fashion; a requirement for mega-scale solutions supporting multi-tenant infrastructures.
Out of the box, no solution has a proper multi-tenant network solution, whether SDN or not, integrated into a VDI solution. There is no next-next-finish approach to a multi-tenant VMware vShield environment and VMware View. Same goes for Citrix XenDesktop on XenServer. Today it is possible to cobble this solution together, but over the next few months it will be possible to more easily integrate a multi-tenant network solution into a multi-tenant VDI design. However, I believe we are a year or so away from seeing these two components sold as a truly integrated solution by a major vendor.
Multi-tenant storage has less to do with the VDI solution from the broker perspective, and more about design considerations that need to be made to provide a multi-tenant storage solution for the broker environment to use. A virtual desktop is typically a virtual machine configuration file (e.g. vmx file), virtual disk(s) (e.g. vhd file), and other virtual machine specific files (e.g. BIOS, swap file, …).
As, in my opinion, there are no easy, out-of-the-box, multi-tenant storage solutions on the market for hypervisor hosts (I’m not talking content management here), I will only cover the concepts that need to be understood and not discuss custom solutions I’ve been involved with, to leave the reader to find his own way.
The two main concepts that need to be understood are:
To understand the underlying storage solution, it’s important to first identify the varying levels of storage isolation. The levels of isolation can be unique to each tenant, unique to each classification of tenant, or unique to each cloud.
In a shared model, the vDesktops of TenantA may reside on a datastore with the vDesktops of TenantB, or TenantC. There is no separation at a datastore level, although there very likely would be separation at a resource pool level (to ensure that CPU/Memory usage of TenantA do not impact CPU/Memory availability of TenantB). This may be an offering at the Bronze level for a multi-tiered, multi-tenant VDI solution.
In a partial isolation model, the vDesktops of TenantA reside on a datastore dedicated to TenantA while the vDesktops of TenantB reside on a datastore dedicated to TenantB. However, both tenants may both have read-only access (for example) to an ISO datastore used for patches, software distribution, operating system installation media, gold vDesktop images, et cetera. There is separation at a datastore level on a per-tenant basis, but there is also a community datastore used by some or all tenants of the cloud. This may be an offering at the Silver level for a multi-tiered, multi-tenant VDI solution.
In a full isolation model, the vDesktops of TenantA reside on a datastore dedicated to TenantA while the vDesktops of TenantB reside on a datastore dedicated to TenantB. Both tenants will have their own respective ISO datastore used for patches, software distribution, operating system installation media, gold vDesktop images, et cetera. No data of any two separate tenants reside on the same datastore. This may be an offering at the Gold level for a multi-tiered, multi-tenant VDI solution.
Multi-tenant storage is not automated tiering, where often-accessed data is moved to faster disk groups and seldom-accessed data is moved to slower disk groups. What if TenantB is using a cloud-based VDI for emergency responders as opposed to daily business activity? TenantB’s seldom-accessed virtual disks may be moved to slower disk groups and when a disaster occurs and they need to access their vDesktops, there is a latency in:
Is Unified Storage multi-tenant? No. Unified Storage is about flexibility and reliability, not multi-tenancy.
The above diagram illustrates three tenants, red, green, and blue, respectively. Each tenant’s vDesktops reside on their own unique datastore (color-coded) and each datastore has a unique LUN ID (e.g. 3).
Having worked first-hand in an extremely large Service Provider infrastructure, once upon a time, I’ve seen block-level storage gone wrong. This environment had hundreds of tenants, many of which with very recognizable names, all* residing in a VMware-powered infrastructure. The VMware portion of the environment was rock solid and custom scripts were used to automate many of the maintenance tasks in an environment with a couple thousand ESX hosts. This environment also leveraged block-level storage from one of the usual suspects. The problem is that in this environment it was required for each tenant to live on their own storage.
Storage in a VMware environment is known as a datastore.
A logical volume presented from a block-level storage system is known as a LUN.
Therefore, every tenant needed their own LUN. Actually, in the case of this environment, every tenant needed their own 10+ LUNs. Add into the mix RDM’s for other needs of various tenants, and the IT administrators quickly found themselves coming up against the 256 LUN maximum per ESX host as defined in the vSphere Configuration Maximums guide. If every tenant in the multi-tenant VDI environment only required 1 datastore to server all of their vDesktops, then each ESX host could theoretically support up to 256 tenants.
In all likelihood, a service provider would never have this many different tenants on a single host, or they would categorize tenants into classes (e.g. Bronze, Silver, Gold), with the lesser classification using a shared datastore(s), and only the higher classifications on dedicated datastores. Nevertheless, I can assure you that in large service provider environments, 256 LUNs goes real quick.
The above diagram illustrates three tenants, red, green, and blue, respectively. Each tenant’s vDesktops reside on their own unique NFS mount (color-coded).
Again, according to the vSphere Configuration Maximums guide, the maximum number of NFS datastores per ESX host is 64, however, many organizations find managing permissions to mount points to be easier than dealing with zoning, LUN masking and VSAN/VLAN which are found in the block-level world. One of the other advantages, at least in terms of slightly older arrays, is that NFS locks at a file-level, whereas block-level storage, such as iSCSI, locks at the LUN level. Therefore, managing SCSI reservation and file lock thrashing is a real concern in environment with multiple-tenants per LUN or too many VM’s per LUN. Fortunately, VAAI-enabled arrays use various techniques, such as hardware assist, to lock at the block level instead of at the LUN level.
I had the pleasure of meeting Hu Yoshida, CTO, Hitachi Data Systems several months ago where I learned that the Hitachi Virtual Storage Platform has the ability to virtualize older storage arrays into VAAI-enabled virtual storage arrays, one technique of leveraging the power of VAAI for those on older equipment.
I personally think it’s easier for most organizations to grasp how multi-tenancy works with NFS as opposed to iSCSI or FC; I also think it’s easier for most organizations to manage an NFS environment.
With the management solution overcome, the networking sorted, and a storage direction chosen, it comes time to understanding how the vDesktops will actually be provisioned. I’ve seen many VDI solution that called themselves multi-tenant. Below are the primary types of provisioning as well as my thoughts.
Full Virtual Machines
Full clone virtual machines are virtual machines that do not employ any snapshotting solution from the hypervisor’s perspective and are simply 1:1 copies of a template or pre-existing virtual machine. For those familiar with VMware View, this will be the full clone deployment mechanism. The benefits of full clones are:
Since the virtual machines have no dependency on an underlying parent virtual machine or snapshot chain, they can live on any datastore, join any domain, and are completely self independent virtual machines.
In any successful true multi-tenant VDI solution I’ve seen to date, the use of full clones was employed. This is not to mean that storage array optimizations (such as cloning from the array perspective) have not been utilized, but for true virtual machine independence, full virtual machines are the easiest way to go.
Linked clones, something made popular with the release of VMware View Composer a few years ago, rely on a snapshot of a given virtual machine. The benefits are:
I have never seen a snapshotting technology used in a multi-tenant VDI solution that crossed tenant Active Directory domains due to the difficulties in managing service accounts, custom pre/post scripts, and service accounts. That’s not to say it isn’t possible, but for any multi-tenant VDI environment that requires Full Isolation, this is likely a deal breaker. Shared and even Partially Isolated multi-tenant VDI environments may be able to overcome this design limitation.
The limitations of View Composer and vCenter is outside the scope of this article but is a work in progress that will be published shortly, per a Twitter conversation I had with Alessandro Perilli.
Snapshotting from the storage array can be a way to get the storage saving benefits of Linked Clones without the potential headaches of managing Active Directory permissions. For example, it’s possible to use NetApp's Rapid Cloning Utility to clone vDesktops at the storage array and automate those cloned vDesktops being added into the broker infrastructure (e.g. VMware View Connection Server).
Despite this advantage, intelligent and automated snapshotting requires an investment in customization scripts, array integration via an available API, and (likely) integration into the management console for the multi-tenant VDI.
PXE booting, which is a technique that can be used with a vDesktop that does or does not have a virtual disk, is often found in Citrix-based solutions (Citrix Provisioning Services) and is a technique in which the vDesktop boots, receives an IP address, pulls down an operating system image, loads portions of it into RAM as necessary, and writes delta data (think back to Linked Clones) to RAM, virtual disk local to the vDesktop, Citrix PVS file share, or some other location.
In this solution, a Citrix PVS Store acts as a repository for .vhd files used by the end devices during boot. Multi-tenancy can be employed in this model as each tenant could be placed on a separate network within the multi-tenant VDI. Each tenant could potentially have their own IP scheme (think back to network isolation) and their own PXE environment. The Citrix PVS Store could be made available to a tenant’s network to serve the virtual disk file(s). Now a tenant’s entire environment could be configured to pull down a specific image (or have the ability to choose from a list) and the only thing required to change the base image is a quick re-configuration and reboot (so that an alternative base OS image is loaded).
PXE Booting is one viable option available today to meet the needs of a multi-tenant VDI, although it should be noted that, just like everything else listed in this article, the ability to manipulate the environment programmatically is of paramount concern. A true multi-tenant cloud does not need a small army of button-pushers behind the scenes to make it all work. It must be an automated (as much that makes business sense) solution integrated with available APIs and scripting interfaces.
It should also be noted that other on-boot solutions, such as Unidesk may also make sense from a technology perspective, giving tenants the ability to add layers as they see fit (or as they pay a premium).
API, API, API…..API!
If I could put dancing unicorns next to this subject line I would. To anyone that has had a beer or coffee discussion with me about VDI knows that I’m unhappy with the current state of available API’s as it relates to building a multi-tenant solution. I’m not going to go into a deep analysis of the available API’s and how they can or can’t be used (this is available as a paid service engagement) to build a proper multi-tenant solution, but suffice it to say, this is an area that needs great improvement in the market.
The available API’s often don’t provide all of the capabilities (or there simply isn’t a published API) needed to cobble this solution together. I’ve had the API conversation with some of the brightest API folks, like Steve Jin, during one of my many trips out to San Fran. While I’ve got no doubts that Steve could put this together in his sleep, the rest of us rely on documented and published API’s to put a lot of this together. Also of importance is that building a solution by hacking another vendor’s product may or may not be viewed favorably, and certainly will not be supported by the original vendor. Therefore, a published and supported API is required IF a VMware View or Citrix XenDesktop solution is to be the foundation for your multi-tenant solution.
VMware View, for example, does now offer PowerShell integration, which is a step in the right direction, but it also not a fully featured API by any means. Fortunately, it can be used as a manipulation mechanism by building a custom overlay or two.
Unfortunately, until the API world around VDI matures, those needing to put this solution together will either need to build something from scratch or be very good at cobbling; I’ve been mostly part of the later thus far.
Do I really need a desktop?
When I think multi-tenant, API, and the ability to quickly spin up instances with the only penalty being to my checking account, I think Amazon Web Services. Years ago I was involved in a side venture with Elvedin Trnjanin, Andy Murphy, Sam Brown, and a few other inner-circle colleagues, where we used AWS to provide a multi-tenant SHARED SESSION solution (note: NOT VDI, more terminal services) with a custom broker, advanced scripting, in-depth API integration, and a dose of ground-up unicorn tusk. At the time, we were one of a couple of companies offering this type of a solution and were officially listed as one of the original wave of Amazon Solution Providers. Unfortunately, that project was a few years too early (in my opinion) and we have all moved on (most to VMware). Now it appears as though this market is finally coming around.
If a full vDesktop is not needed, multi-tenancy using shared sessions (e.g. Terminal Services) is a bit easier to accomplish, in some regards. However, VDI provides a lot of benefits that likely will not be satisfied via a Terminal Services/Remote Desktop Services –based solution, as outlined in this vmetc.com Blog Post by Rich Brambley, such as:
I imagine that most solutions will want or require a full vDesktop to provide the most flexible, application-rich, and secure workspace for a tenant’s end users. For all of the multi-tenant VDI conversations I have been a part of, all but 1 have required a true VDI solution, as opposed to Microsoft RDS.
Companies to Watch
Throughout my experience with multi-tenant VDI, I’ve come to the realization that none of the major players are locked into this market segment (primarily Service Providers) quite yet, and perhaps never will be. To me, this area of opportunity could be a great play for a third-party to emerge. One such company is Sparxent, who is now offering a Tech Preview trial of their VirtualOffice offering. Of note, Sparxent appears to offer the Citrix HDX protocol as an option, per this video.
Another company, and likely viewed as the leader in multi-tenant VDI (at least when it comes to hosted desktops), is Desktone.
I’ve used Desktone several times in a trial capacity and have always been fairly pleased with the overall service offering. However, I’m more interested in getting my hands on their Desktone Platform for Service Providers. I’m hoping to have a review of this in the next couple of weeks, but I can say, from early glimpses this could be a viable starting platform for many Service Providers. Also of note, Desktone offers support for the Citrix HDX protocol as part of their hosted (and hopefully Platform) solution. Click here for a video overview of the Desktone solution.
Multi-Tenant Protocol Options
It should be noted, that while I’m a big fan of VMware View (the vast majority of the implementations I’ve done have been with View, and I also have a book co-authored with VDI legend, André Leibovici that’s in the final editing phases), all of the 3rd party solutions using a tier 1 vendor’s protocol have all used the Citrix HDX protocol. Favorable end user experience, whether terrestrial VDI or cloud VDI, is still a key requirement, and Citrix has (assumedly) done a good job early on in monetizing their protocol for use by selected third parties (e.g. Kaviza). Perhaps we will see some third-party multi-tenant VDI solutions leveraging the PCoIP protocol from Teradici, which I think offers a lot of advantages, especially for those with unique requirements that can be assisted through hardware. Also, with the PCoIP Server Offload card now in the public domain, these server offload cards could allow Service Providers to reach a more profitable user-per-server density than ever before.
There are certainly other protocols available, as well as the default Microsoft RDP, but those already familiar with VDI will know that HDX and PCoIP both offer a huge step forward over RDP and some of the lesser known 3rd party protocols (for the most part).
In talking with some of the smart Service Provider folks out there, like Steve Chambers from VCE, some buddies from HP, and a few smart cloud folks, this part of the market seems to be gaining interest as SP’s look to offer a cloud desktop offering. While the number of companies in, entering, or looking to enter this space may be exponentially smaller than those looking to do vanilla VDI, the number of seats Service Providers are looking to roll-out can eclipse the number of small VDI deployments I’d implement otherwise in a given year. Service Providers aren’t looking for dozens, hundreds, or even a few thousand seats – they are looking at tens of thousands. Granted, the overall solution must be widely scalable, at a mega level, which most current VDI solutions are NOT designed for (think vCenter limitations). Solutions like the vBlock from VCE and the Virtual System (VS) from HP, offer a modular hardware platform so the only thing left is for a software vendor(s) to develop a modular brokering and provisioning solution.