In my last posting, I wrote about my history with VDI, and some of the complications I dealt with as these things moved forward. In these cases, the problems we encountered were largely amplified exponentially when scaling the systems up. The difficulties we encountered, as I alluded to previously had mostly to do with storage.
Why is that? In many cases, particularly when implementing non-persistent desktops (those that would be destroyed upon log off, and regenerate upon fresh logins) we would see much load be placed on the storage environment. Often when many of these were being launched, we’d encounter what became known as a boot storm. To be fair most of the storage IO at the time, was generated by placing more discs into the disc group, or LUN. Mathematically, for example, a single 1500RPM disc produces at most 120 iops, so that if you aggregate 8 discs into one LUN, you receive a maximum throughput from the disc side of 960 iops. Compared to even the slowest of solid state discs today, that’s a mere pittance. I’ve seen SSD’s operate at as many as 80,000 iops. Or over 550MB/s. These discs were cost-prohibitive for the majority of the population, but pricing has dropped them to the point where today even the most casual of end user can purchase these discs for use in standard workstations.
Understand, please, that just throwing SSD at an issue like this isn’t necessarily a panacea to the issues we’ve discussed, but you can go a long way towards resolving things like boot storms with ample read-cache. You can go a long way toward resolving other issues by having ample write cache.
Storage environments, even monolithic storage providers, are essentially servers attached to discs. Mitigating many of these issues also requires adequate connection within the storage environment from server to disc. Also, ample RAM and processor in those servers or number of servers (Heads, Controllers, and Nodes are other names for these devices) are additional functional improvements for the IO issues faced. However, in some cases, and as I don’t focus on product, but solution here, one must establish the best method of solving these. Should you care to discuss discrete differentiations between architectures, please feel free to ask. Note: I will not recommend particular vendors products in this forum.
There have been many developments also in terms of server-side cache that aid in these kinds of boot storms. These typically involve placing either PCIe based solid state devices or true solid state discs in the VDI host servers, onto which the vdi guest images are loaded, and from which they are deployed. This alleviates the load on the storage environment itself.
The key in this from a mitigation perspective is not just hardware, but more often than not, the management software that allows a manager to allocate these resources most appropriately.
Remember, when architecting this kind of environment, the rule of the day is “Assess or Guess.” This means, that unless you have a good idea of what kind of IO will be required, you couldn’t possibly know what you will require. Optimizing the VM’s is key. Think about this: a good-sized environment with 10,000 desktops, running, for example Windows 7 at 50 iops per desktop as opposed to an optimized environment in which those same desktops are optimized down to 30 iops shows a differentiation of 200,000 iops at running state. Meanwhile, one must architect the storage mitigation at peak utilization. It’s really difficult to achieve these kinds of numbers on spinning discs alone.
I can go deeper into these issues if I find the audience is receptive.