RemoteFX vGPU put out to pasture as Microsoft RDP grows up
RemoteFX vGPU had its run, but Microsoft looks to be done with it now and revealed plans for its deprecation.
In May 2018, Microsoft published a document on features they were removing or planned to replace beginning in Windows Server, version 1803. That document contained a list called: “Features we’re no longer developing.” The description read: “We are no longer actively developing these features and may remove them from a future update. Some features have been replaced with other features or functionality, while others are now available from different sources.”
Tucked away in this list of features to be deprecated was:
- “RemoteFX vGPU: We're developing new graphics acceleration options for virtualized environments. You can also use Discrete Device Assignment (DDA) as an alternative.”
BrianMadden.com has covered RemoteFX from its birth to its subsequent evolution, and we felt it only right to cover its twilight years whilst taking the chance to review some history and how things have changed in the market, and with technology rendering this once bright young thing obsolete.
Before we begin, we should specify that Microsoft “vGPU” is very different to NVIDIA “vGPU,” which came along later, it’s just Microsoft started using the term first. The term “RemoteFX” referred to a collection of technologies and features, with “vGPU” being the feature ability for multiple users to divert (originally just a few) DirectX commands to GPUs. These commands were mainly ones used by the OS to offload remote protocol compute demands.
On the other hand, NVIDIA uses “vGPU” to mean that apps in multiple VMs can actually use a slice of a GPU, via virtualization, with the VM “seeing” what it thinks is a physical GPU.
Why vGPU?
Let’s set the scene. Back in 2010, when RemoteFX vGPU was released, GPUs weren’t mainstream in server rooms. As Brian Madden observed when Microsoft announced what they were doing with Calista technology, “I mean sure, no one has GPUs in their servers today, but that's because there's never been a reason to.”
Back then, GPUs were the realm of games and beefy pro-vis workstations running CAD/CAE applications. The few virtualized solutions around were generally extremely high-end engineering customers, such as Boeing running one user per server/blade with GPUs on passthrough, and what were then premium pay-for protocols such as HDX 3D Pro or Teradici. There simply weren’t hypervisor GPU-sharing features available.
At the time, the only GPU options were PCIe passthrough, where a single VM got complete control of a whole GPU, with the GPU drivers installed in the VM. Citrix XenServer used the term “passthrough,” while VMware ESXi called it “vDGA.” Microsoft and Hyper-V however didn’t follow suit with its own version, though; whilst Citrix and VMware had enterprise on-premises customers pushing for virtualized CAD workstations, Microsoft was always more focused on mainstream Windows users and the cloud, where the problems with GPU passthrough usually outweighed the benefits.
Now, GPU passthrough presents some problems:
- For VDI, one VM to one physical GPU makes it an expensive option.
- Certain enterprise hypervisor features can’t be implemented; for example, VM migration (i.e., XenMotion or vMotion); as well as live snapshots (and hence backup and hypervisor administrator monitoring).
- Passthrough has security vulnerabilities, as there’s no hypervisor layer to protect the VM.
For Microsoft, I suspect the key blocker to passthrough was security.
Enter vGPU
RemoteFX vGPU was a good idea—with increasing OS and protocol demands on CPU, why not offload some of the work to a shared GPU for mass-market desktops? RemoteFX vGPU wasn’t designed to accelerate heavily graphical 3-D applications or provide workstation VMs, it was about leveraging GPUs securely to provide increased server scalability and improve protocol performance for regular office worker desktops and applications.
Without GPU virtualization sharing options available, Microsoft took what was then a leading-edge approach to get some benefit from a GPU without the problems of GPU passthrough, even if that was less than the raw hardware potential of the GPU. This involved them essentially adding API intercept technologies: When certain DirectX APIs were called—mainly those that supported the OS and protocol graphics demands—instead of going directly the to the native GPU driver, a software layer took over and converted them to calls to the GPU.
This meant RemoteFX vGPU VMs did not have a GPU available directly visible to an application to use, but outside of a few CAD/3-D apps, most applications didn’t need a GPU. Sure, the intercept layer meant the benefit of a GPU was reduced and it didn’t provide significant experience benefits per user in most configurations, but it was highly scalable with no restrictions on numbers of users.
VMware took a fairly similar course with their Virtual Shared Graphics Acceleration (vSGA) API intercept-based GPU-sharing technologies. However, with NVIDIA vGPU and AMD MxGPU now available, VMware stopped developing vSGA and it’s rare to see it in new deployments.
The evolution (and end of the road) for RemoteFX vGPU
This slide (from a presentation that I’ll talk about in a minute) includes a nice overview slide on Microsoft GPU technologies and their evolution:
Microsoft have essentially reached the end of the road with what can be done to improve RemoteFX vGPU. They did a reasonable job patching it up for Windows Server 2016 at a time where they didn’t have a viable alternative, improving performance and adding OpenCL/OpenGL, so applications calling those libraries got some benefit. With Windows Server 2016, they also finally added PCIe passthrough (as DDA, or Direct Device Assignment) as an increasing number of VDI users were using applications that demand to see a GPU.
But, as it is, Back when VMware did vSGA and Microsoft did RemoteFX, they made sense as there simply wasn’t an alternative. But now, with alternatives now available, shoving an extra software layer into a hardware offloading stack doesn’t make sense.
Rachel Berry
Why RemoteFX has had its day?
- API intercept technologies lag and takes a lot of effort to even stay behind; as new GPU drivers and versions of OpenCL/OpenGL/DirectX are released, you need to be constantly updating and testing your translation layer, which invariably picks up bugs and incompatibilities that are expensive to source and support.
- Using a software layer to leverage hardware causes a significant performance hit.
- There are now many alternatives for GPU sharing, such as NVIDIA GRID vGPU and AMD MxGPU on non-Microsoft hypervisors such as KVM, ESXi, and XenServer. RemoteFX and Hyper-V can’t provide a competitive answer to these, so it’s time for something new.
With Windows 2019, although RemoteFX vGPU will be available for upgrading existing users, it will not be available for fresh installs…. So, what are the alternatives?
Big advances in core RDP
What’s next? Well, the deprecation statement from Microsoft itself (which I referenced at the beginning of this article) is a bit odd. DDA (PCIe passthrough) isn’t really an alternative to RemoteFX, as the very existence of RemoteFX came as an attempt to go beyond the limitations and costs associated with passthrough. The rather mysterious “We're developing new graphics acceleration options for virtualized environments” statement points to something far more interesting, if you figure out where to look for some meaty information.
At the end of June 2018, Microsoft held “Windows Server Summit,” an online event around updates to Windows Server. The recordings of the broadcasts and decks are online, and squirreled away under the Hybrid track, there is a recording of a presentation by Ivan Mladenov (program manager, Microsoft) named “Hybrid track: Remote Desktop Services on-premises and Azure.” A large chunk is dedicated to the RDP protocol and RemoteFX deprecation, but this presentation reveals some really significant changes for RDP that could significantly extend Microsoft’s appeal.
Key things to watch out for:
A RemoteFX vGPU replacement
The presentation (watch from about 9:08 to 13:33) mentions a new Windows Server 2019 feature: GPU-P (P for partitioning).
The wording around GPU-P indicates this is likely to be very similar the Single Root I/O Virtualization (SR-IOV) approach taken to GPU sharing by AMD MxGPU. Whilst such technologies have been slower to market than other GPU sharing ones, the appeal of SR-IOV’s standardization and open scrutinized security model make it popular, particularly for those with large-scale cloud concerns and tenant segregation considerations.
Interestingly, GPU sharing and virtualization technologies have been branded and owned by the GPU vendors (NVIDIA GRID vGPU, AMD MxGPU, Intel GVT-g) whereas GPU-P is Microsoft and Hyper-V taking ownership of the technology nomenclature.
For Windows 2019, the presentation reveals Microsoft hopes to offer GPU-P as an alternative to RemoteFX, which is under evaluation for inclusion (i.e., clean installs of Server 2019 will not support RemoteFX vGPU). Users wishing to participate in trialling it should contact Microsoft via [email protected].
The presentation goes farther, saying that at some point GPU-P will be superseded by GPU-PV (PV for para-virtualised). GPU-PV will “closely match what RemoteFX does today,” but that’s all the details we have at this point.
Hybrid encoding strategy and bandwidth efficiency
A label long attached to RDP has been “good enough.” Recent improvements meant for many users with it configured out of the box for image quality that they could have a good experience (if you have the bandwidth), but not the bandwidth efficiency of the multiple codec approaches favored by Citrix HDX or Teradici.
a hybrid encoding strategy (i.e., different regions of the screen encoded by different encoders); they call it “region classification improvements”) The presentation slide could just as well have been a HDX or Teradici slide, as it detailed the significant bandwidth improvements that come with a hybrid approach.
With the high-end, mature protocols (notably Citrix HDX and Teradici) following a hybrid strategy (and now RDP, too) I hope we’ll see some improved GPU vendor support as at the moment, many GPU vendors API encoding solutions are restricted to full screen H.264 / AVC codecs.
Also detailed were some nice bandwidth efficiency features around adaptive display dependent on bandwidth and 4K downsampling and high DPI support.
RDP support for multiple GPUs (mGPU-E)
RemoteFX supported multiple GPUs, and now RDP in Server 2019 will support multiple GPUs (mGPU-E). In EUC, the main hypervisors have . There are three main cases where users request multiple-GPUs per VM:
- For VDI with a few (usually) specialist CAD/3-D/VFX applications that are genuinely designed to use multiple GPUs in parallel. Growing in number but historically applications where cloud and virtualisation has been slow.
- Where the VDI user simply wants a bigger GPU than is available on the market and wants multiple GPUs to look like a single massive one; this is usually requested for heavy users of big data sets with applications such as high-end CAD, Petrel or GIS. With the GPUs on datacenter cards getting bigger, this request is getting less frequent. Technically it would be a terribly inefficient solution trying to manage framebuffer between multiple cards. The answer is usually to use Workstation card with a really big framebuffer (although a few GPU vendors are reluctant to advise this as generally a cheaper option than their datacenter offerings, however the OEMs shipping the cards often support this and you’ll find such cards listed on the hypervisor hardware compatibility lists).
- For high-density RDSH, XenApp use cases where with 50 to 150 users per VM you simply can’t find a big enough GPU to give all users a significant amount of GPU. Citrix actually hacked around RDSH to implement this as an experimental feature for many years but it was pulled after a number of problems and it transpired it was being used very successfully by a few (very naughty!) customers unsupported in production especially in bare-metal scenarios.
Outside of VDI, using multiple GPUs benefits a vast number of CAE, rendering compute and AI/ML applications.
Enterprise features: printing and camera redirection
Beyond the graphics protocol and GPU features though, the Windows summit recording also covers printing and camera redirection features (useful for unified communication products like Skype for Business) for Server 2019. These are core features for value add for VMware and Citrix HDX, so pushes RDP and hence Hyper-V into yet more traditional VDI territory. There are also some HTML5 and web client improvements.
Summing up
Collectively, these features set Azure in a very good place to launch a product or service similar to the Amazon AWS Elastic GPU cloud. Rather than buying DaaS with a fixed single GPU, users could offload on a pay-as-you-use basis and possibly access “unlimited” GPU capacity.
Existing RDP enhancements mean it’s already good enough and efficient enough for even more Citrix/VMware users to consider alternatives, and with many of Citrix/VMware’s competitors (such as Workspot) using RDP, we may well see the traditional VDI products lose market share on many fronts. If the promised improvements are well implemented, this can only accelerate and I think we’ll need to stop using the “good-enough” language to describe RDP, as this is now a seriously mature and grown-up protocol.
With Windows 2019, it looks like Hyper-V will finally get a decent spectrum of GPU technologies. There are an awful lot of organizations using Hyper-V for the majority of their VMs that added a bit of XenServer or ESXi as they were the only options for NVIDIA vGPU or AMD MxGPU for VDI capable of running CAD/3-D applications. Especially in education with Microsoft’s kindly licensing programs, I do wonder if VMware/Citrix will have a job keeping many of their early GPU customers.
Microsoft’s Azure focus also means they are thinking beyond traditional EUC DaaS, the one VM to one GPU (or one part of a GPU) we’ve seen in VDI is starting to feel really rather dated. A model where the OS and protocol and the applications can all offload and access CPU and GPUs on-demand feels far more logical.
(Footnote: There’s been a lot of mutterings around RDSH being “replaced” by a multi-user Windows 10. From the protocol perspective, I imagine the developments above will be under the hood of whatever folks end up using to do RDSH-like use cases. The best place to catch up on the state of speculation, rumor, and released facts around that is almost certainly this timeline and collection of references, assembled by Bas Van Kaam.)