olly - Fotolia

Tip

Avoid disastrous bottlenecks, scale storage for VDI with HCI

HCI offers many benefits for VDI deployments, particularly when it comes to scaling resources. There are a few caveats to watch out for when planning storage for VDI, however.

Consistent performance for every user is a crucial part of a successful virtual desktop deployment, and HCI makes it easier to predictably scale storage for VDI.

One of the biggest benefits of using HCI for a VDI deployment is that capacity scales linearly as you add more HCI nodes to accommodate more users. Once you determine the number of users that each node supports, you can simply scale HCI by adding enough nodes for your user count without fear of performance bottlenecks.

Storage bottlenecks spell disaster

Scaling CPU, RAM and network resources for VDI is simple -- just add more hypervisor nodes. Adding storage for VDI can be much more difficult. Storage bottlenecks can create disastrous performance if IT pros scale up from a small pilot project to a large VDI deployment.

In one customer example, a good pilot with a couple hundred users had good performance, and IT approved a rollout to production. But when a couple thousand users attempted to access virtual desktops for the first time, they experienced login times of hours.

The root cause was that the storage array wasn't correctly specified for this customer's production use and couldn't cope with the IOPS required when more than 500 users logged on. The storage array performance wasn't coupled to the compute capacity for desktops, and there wasn't a practical way to predict how additional users would affect storage performance.

HCI nodes
Get to know node maximums for different HCI products (updated August 2018).

How HCI helps

HCI includes storage inside the compute nodes, so adding more compute capacity also adds more storage capacity. The result is that IT pros can use a small set of pilot users to identify the capacity of an HCI node, and then scale up by adding more nodes until there is sufficient capacity.

For example, a 300-user pilot that performs well on a three-node HCI cluster indicates that a 3,000-user production deployment will require 30 of the same HCI nodes. The 30 nodes will deliver 10 times the storage capacity, as well as 10 times the CPU, memory and network capacity. Three thousand users on 30 HCI nodes will experience the same performance that 300 users experienced on three of the same HCI nodes.

So how exactly does HCI help IT scale storage for VDI? Each HCI node has a fixed amount of storage, usually a mix of high-performance flash and lower performance flash or hard disk storage. Each node sends any VM disk write operations to its flash array. The node also sends the same write operation to another node, which also writes to its flash. Half of each node's disk writes are for its own VMs, and half is for VMs on another host, no matter how many nodes there are in the cluster.

Disk reads work differently on different HCI platforms. In the best case scenario, each node does all the reads for its own VMs. In the worst case scenario, any node might respond to a VM read, but the more VMs there are, the more nodes you have, so read performance also scales.

Naturally, there is a shared resource that could be a bottleneck -- the network. If you use 1 Gigabit Ethernet for the HCI cluster network, then it will limit performance. The minimum HCI cluster network speed is 10 GbE, and organizations are adopting 25 GbE and 40 GbE to improve HCI performance.

Challenges of HCI scaling for VDI

There is a trap in predicting scale with HCI. A single-node proof of concept (POC) doesn't reflect a larger cluster's storage performance. Usually, HCI provides storage resilience by writing to at least two nodes. A single-node HCI deployment can't mirror VM data between two nodes.

POC users aren't the same as production users.

This mirrored writing effectively halves the maximum storage performance so that a single-node HCI deployment will have twice the potential storage performance of a production cluster. A single-node POC could run with only half of the SSDs or hard drives that IT will use in production, to give half the storage performance. Then, a multi-node pilot deployment gives definitive node capacity with production users doing production tasks on production HCI.

The second trap when you scale HCI for VDI is that POC users aren't the same as production users. IT usually undertakes a POC with a minimal number of real users and a lot of synthetic workloads, such as the Login VSI benchmark. If you set up Login VSI properly, it can indicate a ballpark user count per HCI node. The real test is a full set of users doing their day-to-day jobs, however. Real users behave very differently from synthetic test users, mainly because they behave much more randomly than any benchmark.

To successfully scale storage for VDI, make sure to base your users-per-node number on real users doing their actual jobs.

Dig Deeper on Virtual and remote desktop strategies