AR and VR for business: Considerations for virtualization, and where VMware sits

Rachel Berry looks at how VMware seems best poised to lead the way when it comes to AR/VR streaming, and she also talks with Ben Jones of The GRID Factory.

In Part 1 of our AR and VR for business update, we covered VMware’s progress in supporting the management of both AR/VR endpoints (e.g., headsets such as the Pico, Vive, Oculus) and also their work on developing some interesting support for applications. However, we still had some questions on whether the VMware graphics stack was ready for the demands of graphical demands of VR.

In this second part, we want to look deeper into this and talk to a VMware partner who is implementing customer solutions, and look at the feasibility of remoting and streaming VR applications through an EUC stack.

In terms of virtualizing and remoting, AR/VR applications are amongst the most demanding possible on graphics stacks. Remember that back in 2017, the Citrix efforts were focused on protocol support for VR, whilst VMware’s protocol didn’t seem up to the demands of VR. What has happened to make VMware solutions feasible in terms of the graphics stack and protocols? There’s a lot of hype in this area and, to be honest, a lot of articles that look like they are designed for marketing keyword searches and SEO. So, to get a deeper insight I talked to Ben Jones, co-founder and CTO of The GRID Factory.

Challenges of virtualizing and remoting AR/VR applications

Ben is well known in the EUC/GPU/graphical applications communities and specializes in delivering enterprise graphics solutions, often encompassing GPUs and/or virtualization. Ben’s new venture specializes in AR/VR, particularly virtualized and streamed solutions, and (as you can guess from the name of the company) he has a close working relationship with NVIDIA. Most of Ben’s current work is focused around NVIDIA CloudXR technologies, often used in conjunction with VMware technologies.

I was keen to understand the technical challenges of virtualizing and remoting AR/VR applications as the performance and user experience of these applications is extreme and way beyond the demands of even the most challenging 3-D/CAD/graphical use cases in VDI. 

Ben talked me through the standard marketing case studies for AR/VR usage, such as training and simulations (for a variety of industries), electrical engineers or car mechanics wearing AR glasses to view the plans of wiring overlaid to the pipes they are working on, retail, and design visualization for AEC (Architecture, Engineering, & Construction) and Automotive sectors, which VMware is focused on.

We quickly moved on to some technical depth regarding the demands of VR applications and particularly the challenges associated with remoting and virtualizing them.

Although there are some VR headsets that work with just over 70fps (frames per second), typically, VR headsets should be expected to require frame rates of at least 90fps, with some supporting 120fps, and higher-end headsets now available capable of 140fps+. Ten years ago, 12fps was considered acceptable to many in VDI deployments; today, defaults of 24 to 30fps are common with intensive GPU accelerated VDI users up to 60fps.

Frame rates are a reasonably well-known challenge in VR, however, Ben also explained how the overall system latency between the server and the user is now the main challenge. By latency, he was talking about the lag between the user’s interaction and the system delivering a response; that could be a user turning their head and how long until the environment they are seeing adjusts versus simple network latency (ping time). A typical VDI deployment would use the term “click to photon” (referring to the user’s mouse) to measure input latency. However, VR uses a concept sometimes described as “motion to photon” (referring to the time it takes before a user’s movement produces a change in the environment), which can be challenging to monitor.

In my experience, a typical GPU-enabled VDI system based on a Citrix or VMware stack has an overall user latency of 100 to 200ms, with 150ms considered acceptable for gaming, even with negligible network latency of ~1ms. So, what would VR need? Ben was rather reluctant to define a figure as what is acceptable can be very dependent on the user but also on the application/use case. When pushed (and pushed again) he mooted that 20ms was often a figure he was looking to achieve with VR apps. That’s a big difference versus a standard VDI system! Essentially in building these systems, Ben is having to go through every element of the stack and ensure each component is tuned/optimized and the best available in class, as well as optimizing the component interactions so that’s server, storage, CPU model, GPU, hypervisor, and networking.  

I also picked up some useful nomenclature about the classification of VR application genres, which can have significantly different demands on infrastructure and remoting—particularly the difference between “static” and “user-driven” content. A “static” environment is essentially a pre-made world the user can move around and view (perhaps generated from a CAD model in Unreal/Unity gaming engines), whereas user-driven environments are changed and created by the user and usually have far higher demands to recreate significant parts of the world and re-render dynamically (e.g., turning lights on or changing the materials of the building around oneself).

Why virtualize VR?

So, this is a hard, technical challenge. Why would you virtualize or remote VR, and is it worth the effort?

Basically, the arguments in favor of doing this are exactly the same as for traditional endpoints: keep the data in the datacenter, consolidate resources, deliver applications without onsite IT, central back-up and patching, and so on. Also, AR and VR applications require a lot of powerful hardware resources, and these requirements are going to continue to increase as AR and VR becomes more mainstream. Centralizing these resources in a datacenter or private cloud is the logical thing to do.

In the early days, there were a few demos floating around mooted as a collaboration “solution,” centered around a big server with a hypervisor installed running multiple VMs each with a headset/user attached. It was seen as expensive and also a use case that made little sense as you were replacing four bulky workstations in the visualization lab/cave with a bulkier server four times the size. However, Ben has previously collaborated with Cisco Systems in the U.K. and used their Webex video conferencing system to demonstrate VR environments to teams located around the world. This was a frame rate limited, conceptual demonstration, but the theory behind it creates some very interesting development opportunities.

For those looking for just simple collaboration with multi-user VR, there are already numerous solutions—for example, Autodesk BIM 360 as a central hub or the wireless Oculus Quest, driven by the software application and HMD vendors. Andy Thomas from AECOM has posted some examples of their in-house VR meeting recently on Twitter, which give an idea of this type of use and experience (also utilizes Autodesk 3ds Max and the Unreal gaming engine). This really is about bread-and-butter centralized enterprise IT.

So, what is Ben working on?

Ben is currently working on projects combining NVIDIA’s CloudXR protocol on top of a VMware stack and alongside VMware’s Blast Extreme or other protocols  (RDP/Citrix HDX/Teradici PCoIP Ultra/Mechdyne TGX). The key to a viable VR/AR EUC graphics stack has been the availability of this new third-party protocol, CloudXR from NVIDIA designed and optimized to just remote/stream VR/AR. The VDI desktop and other applications can continue to a standard protocol, whilst CloudXR only does the AR/VR apps; the AR/VR application essentially gets a separate pipe within the desktop to the AR/VR endpoint.  

Gabe prophetically mooted back in 2017 that he was “[….] inclined to think that VMware focusing on management is the proper path to take right now.” With Citrix obtaining frame rates of 90fps with their V2R project at the time, I myself was a bit skeptical as VMware looked nowhere near having protocol options that could do VR, and at the time I was reminded that Citrix and HDX had the technological lead.

With hindsight, it hadn’t occurred to me that a third-party parallel protocol would become available for VMware to leverage. Rather than invest in trying to hack a VDI protocol into something that could “sort of” support VR, VMware’s path avoiding a mini-protocol war now seems extremely smart. The Citrix team that developed V2R was disbanded a few years ago with the core of the team joining NVIDIA, so as a third-party, they certainly have some useful in-house expertise regarding AR/VR remoting and EUC technologies.

Why VMware?

Theoretically, the reliance on third-party AR/VR technologies means other EUC vendor-based solutions are possible, but Ben is focusing on VMware based stacks. I asked him why and he told me that the maturity of the vSphere hypervisor compared to others and features such as USB passthrough support are currently enabling Ben to obtain latency optimizations he simply hasn’t been able to achieve with competing hypervisors.

Additionally, the Workspace One integration and application work of the Spatial Computing group, with dedicated staff to these technologies, seems to be facilitating and enabling use cases and allowing companies like Ben’s to collaborate and get technology developments onto VMware’s roadmap.

What is also giving VMware increasing credibility in delivering AR/VR is their continued, consistent, long-term investment and focus in not just IoT but also technologies to roll out at scale and manage IoT infrastructure (we touched on this when looking at their Kubernetes efforts). At the moment, VMware does seem to have an extremely strong lead in the race for cloud AR/VR.

So, can you connect a headset to the cloud today and do remote AR/VR?

No, at the moment this is very much an on-premises use case. Ben did mention one with significant advantages, though, for repurposing hardware. A traditional headset to workstation setup has a lot of challenges, particularly when managing multiple users. A design or VFX shop may well need to have every developer and artist using the same rig and, as such, upgrading becomes very difficult. AR/VR applications also vary hugely in their graphical demands. So the use cases Ben is working on often follow a model virtualizing the hardware and providing a standardized VM with appropriate numbers of vGPUs and other resources, often using NVIDIA RTX6000/8000 hardware.

Ben, then, raised a particularly interesting point: Cloud providers such as Amazon AWS, Microsoft Azure, and Google GCP for streaming services such as Netflix or Apple TV don’t have control over the final pipe, the network to the end user. In a use case where end-to-end latency is vital, this final connection can be the challenge. Telcos (those telecommunication infrastructure companies), however, do own and monetize that vital piece of networking stack, they make their money out of providing fiber, cable, broadband, and data usage across it.

With Telcos best placed to optimize for AR/VR workloads, this raises the serious possibility that some may choose to get into the graphics delivery business beyond networking infrastructure, particularly as there’s a lucrative mass-market gaming use case possibility; so, for VMware there may well be a large market beyond traditional enterprise business.

Final thoughts

With the online NVIDIA GTC conference on the horizon, we’ll be watching out for any developments in this area closely. If you want to keep an eye on this area, both Ben (@MrGRID_) and VMware’s @ProjectVXR team’s Twitter handles may post some very interesting information in the not-too-distant future. Ben’s new company The GRID Factory is HQ’d in London, UK and if you do get a chance to grill him, I suggest asking him to explain why although people describe “streaming” VR it really isn’t streaming.

Dig Deeper on Application delivery