What would you use as a base for providing virtualized 3D apps?

Hi,

Some devs at work must use an application that requires a GPU to run. I got it to run under Tiger Lake Iris Xe, so it’s not really demanding- their documentation suggests Radeon HD 7750 / Nvidia GeForce GTX 550 Ti as minimal requirements.

To accommodate devs that prefer to use other hardware, it would be great to offer a service to spawn a virtual (container or VM) instance running the application. We can live with bad performance and even headless use (the application renders graphics- it can run headless, but it still requires a GPU), but being able to stream the UI Moonlight/Sunshine style would be very nice.

We are already using Incus, and I see there’s some talk about GPU virtualization around.

Is anyone doing something similar? What hardware are you using?

Ideally, I’d like to build something on top of consumer-grade hardware or Hetzner-cost-style dedicated servers. (Hetzner auctions GTX 1080-equipped servers, and now they provide “Nvidia RTX™ 4000 SFF Ada Generation” for 184€/month.)

I have fiddled with https://github.com/strongtz/i915-sriov-dkms, but so far I didn’t get it to work under Debian Stable. That’s apparently Intel-provided code, but I’d prefer something less off the beaten path, if possible.

Something that a cheap hosting provider can manage would be great. Buying some consumer-grade dedicated hardware is OK, if we can serve a few devs per box.

(Otherwise, I guess we can issue Steam Decks or other suitable hardware, but having centralized management would be nice.)

Also, is VirGL worth checking out for this kind of use case?

Also, is VirGL worth checking out for this kind of use case?

Incus out of the box uses virtio-gpu-gl-pci (or Virtio-GPU GL or VirGL).

As to the rest of your post, what kind of environment are you looking for? If any of your devs require Windows then that narrows your options down significantly to only GPU passthrough. If they are fine with Linux then you have the gamut of Virtio-GPU GL or Venus (GL and Vulkan respectively). I’m not going to repeat myself so you can check the thread here for more details: Incus DE experience

EDIT: If you want Virtio-GPU Venus then you’ll need to be running a host that ships the latest kernels (6.14 or later) and has Qemu 9.2 and newer, and Virglrenderer 1.0.0 and newer

Then comes the question, if you are using Linux instances, of whether you want Wayland support or not. IIRC Gnome has the furthest support for headless Mutter and KDE is some degree behind, but both are suboptimal environments; so this would rule out system container instances. You may want to experiment on this yourself however as my knowledge is some months out of date.

If you do want Wayland then you’re pigeonholed into virtual machines.

As to whether Moonlight works with Virtio-GPU is something you’ll have to explore as I haven’t used it on non-PCIe passthrough’d GPUs. Otherwise you have SPICE and RDP (as in Gnome’s RDP and KDE’s krdp offerings)

Ideally, I’d like to build something on top of consumer-grade hardware or Hetzner-cost-style dedicated servers.

You will very quickly run out of PCIe lanes on consumer platforms unless you do some form of clustering with one GPU and one NIC (for optimal remote desktop and networked storage performance) per machine; you’d want Threadripper, Epyc, or one of the Xeon platforms.

Maybe the state of the art has changed and if so someone else can chime in.

1 Like

I think we could use Windows if it made things easier, but Linux is preferable. I’ll check the post.

Interesting, but as long as it works, I can use both X11 and Wayland. Interesting that I might be able to use system containers with X11.

I don’t need performance. Also, I don’t need to have a lot of devs sharing one physical box. If I can have say, 4, that already sounds like a win. And for now, if it runs at 10fps, that’s fine.

But what I’m asking… should I be looking at Intel, Nvidia, AMD? I think Nvidia requires reflashing or a datacenter-grade GPU. Intel sounds like it might meet my requirements on the cheap. Unfortunately, I don’t have any AMD hardware to test. It would be lovely if integrated Ryzen graphics were virtualizable, but maybe AMD discrete GPUs are more friendly to virtualize/share than Nvidia?

@SirGiggles posted good links and I found this one framed the OpenGL and Vulkan stacks clearly:

My pick is to go the Vulkan tech pathway and Wayland if your apps work well on xWayland. Gnome is the leader on Wayland and Fedora seems to be cutting edge Wayland distro.

My pick would be AMD GPUs. Something recently supported in the drivers and of course known stable. Intel also seems to be focused on Linux.

It seems like you are wanting to create a Virtual Desktop Infrastructure (VDI) for staff. Is that correct?

I think Remote Desktop software has come a long way on Linux since the VNC-only days. RDP is a standard and it seems like fractional independent scaling works. Maybe Gnome/Wayland is the best choice here too although not sure.

Electron IDE apps have historically had compatibility issues with Wayland but this is changing.

Red Hat might be a good server platform for a VDI. They have probably the best documentation in Linux. I generally find that Linux documentation ommits critical information due to assumptions of the writers. It is a common mistake when people aren’t trained in technical writing. Red Hat seem to employ good technical writers.

Best technical writing I’ve come across was in Cisco documentation and training material. Outstanding.

Writing good documentation is actually very difficult and maintaining it moreso as volume and complexity builds.

If you have a server rack at work you could pick a hardware manufacturer like HP or Supermicro and invest in good ex-datacenter servers to build a long term VDI. I have an AMD Radeon Rx 5700 XT in a Supermicro Epyc ROME motherboard (H12SSL-I). It works flawlessly.

I would recommend against AMD GPUs actually (this goes for you too @nickwalt). reason being that there is a PCIe level hardware bug with resetting the device that makes it unrecoverable until a system restart; iirc this bug can happen on anything RDNA 2 and newer. I’ll see if I can’t find a link later on, but you can search up the user gnif2 on Level1Tech on the subject, they were leading a project to try and fix it but I think they gave up because of AMD stonewalling.

Ironically Nvidia is better ever since they got rid of error 43 in VMs.

I don’t need performance. Also, I don’t need to have a lot of devs sharing one physical box. If I can have say, 4, that already sounds like a win. And for now, if it runs at 10fps, that’s fine.

If you’re able to have multiple machines then that alleviates the lack of individual PCIe lanes on a single machine; this matters on consumer platforms because of how manufacturers lay out the physical slots, you generally get one full size x16 physical and electrical and one full size x16 physical that could be electrically x4 or x8 depending on the features of your platform.

But what I’m asking… should I be looking at Intel, Nvidia, AMD? I think Nvidia requires reflashing or a datacenter-grade GPU. Intel sounds like it might meet my requirements on the cheap. Unfortunately, I don’t have any AMD hardware to test. It would be lovely if integrated Ryzen graphics were virtualizable, but maybe AMD discrete GPUs are more friendly to virtualize/share than Nvidia?

Regarding CPU options, it really depends I think (and subject to benchmarking). Intel has lots of cores that aren’t SMT but the downside is the more complex scheduling needing which could result in inconsistent performance between guests. AMD has fewer cores but makes up for it with SMT, 3D V-cache (depending on the model), and AVX512.

I want to say you can’t go wrong with either, but I’d probably edge towards AMD because of the socket/platform longevity.

I’m also not aware of flashing methods on anything past the 20 series of cards, so you’re on your own in that area.

I think we could use Windows if it made things easier, but Linux is preferable. I’ll check the post.

Windows will kneecap you to PCIe passthrough so you’d need one machine per user. IIRC the virtio-gpu effort for Windows is happening but it’s snails pace, you can follow the PR here: [viogpu3d] Virtio GPU 3D acceleration for windows by max8rr8 · Pull Request #943 · virtio-win/kvm-guest-drivers-windows · GitHub

1 Like

An interesting set of developments for AMD and Nvidia. Is it possible that the AMD bug is caused by power saving state changes? I don’t have any power saving on my EPYC. It’s permanently on with no reductions on PCIe power.

Nvidia does seem to be finally paying attention to Linux but they are duplicitous and they are, I think, in a mindset that is against making Linux strong in the consumer and small to medium business on-prem market. They want everything and everyone in the cloud, centralised and dependent. Red Hat and Canonical are somewhat in that mindset also, unfortunately. This is the opposite of Linux, of course.

My pick for CPUs is definitely EPYC. If the OP didn’t want to invest in datacentre grade hardware (overbuilt heavy 2-4 RU rackmount servers) they could buy new Supermicro EPYC motherboards, either used or new EPYC CPUs and fit them into quality desktop cases with great cooling and PSUs. Then buy single slot GPUs (if they are powerful enough) and pack them into a Supermicro motherboard model that has all x16 PCIe slots (or maybe more x8 slots if Intel have cards with physical x8 connectors). Intel might be worth a look in the single slot market. My Rx 5700 XT is essentially 3 slots and blocks adjacent PCIe slots.

If you settle on Intel GPUs build one box and test. Make sure the GPU is properly recognised and supported in the server BIOS.

Staying with DDR4 might keep costs down — significantly. EPYC Milan would be the go here. Good performance, very mature and an abundance of ex-datacenter hardware to buy. Buy all the same, keep it standard and hold spares for a projected 5-8 years ROI.

If you don’t need raw compute such as DDR5 platform with a lot of PCIe lanes such as using x8 slots maybe there are platforms in either DDR4 EPYC or Threadripper HEDT (not Pro). It will come down to cost and Threadripper HEDT might still be significantly more expensive than ex-datacenter EPYC.

If EPYC Milan is too recent and costly you might find the later ROME platform excellent value.

Then just cluster these in Incus running VDI. What do you think SirGiggles?

An interesting set of developments for AMD and Nvidia. Is it possible that the AMD bug is caused by power saving state changes? I don’t have any power saving on my EPYC. It’s permanently on with no reductions on PCIe power.

You can check these thread for more details:

EDIT: to my knowledge, nothing has changed in regards to the RX 9000 series

EDIT2: I might have spoke too soon, there is one report of success: VFIO Pass through working on 9070XT - Virtualization - Level1Techs Forums

The TL;DR seems to be that Windows and Linux upload different firmware blobs to the GPU SoC that’s causing it to lockup whenever you unbind it.

As for the rest of your post, yeah honestly I can’t disagree with anything you said. Getting an Epyc platform (even a Zen 1 or Zen+ Epyc from ebay) would grant you tons of PCIe lanes and RAM availability, the only major tradeoff would be CPU performance depending on model.

1 Like

I have fiddled with GitHub - strongtz/i915-sriov-dkms: dkms module of Linux i915 driver with SR-IOV support, but so far I didn’t get it to work under Debian Stable. That’s apparently Intel-provided code, but I’d prefer something less off the beaten path, if possible.

Also as a follow-up to the Intel SR-IOV bit, it doesn’t seem like there’s any progress for anything before Panther Lake if the latest comment on this issue is to be believed: SR-IOV: mainlining? · Issue #33 · intel/linux-intel-lts · GitHub

1 Like

Was just reading this and of course for VDI we want to utilise SR-IOV on those GPUs. Which means single slot or dual slot x16 (depending on motherboard design) would be the go:

But it sounds like the OP doesn’t require many instances so a cluster of two (or max three) boxes might do the trick without SR-IOV. But if the platform is SR-IOV ready, including the cards, there is room to grow.

Yeah, EPYC with at least 3GHz per core might be necessary.

Dunno why Wendell and Level1Techs isn’t talking about Incus like they are on fire. Same for Servethehome.

I’d say it’s more of a single app. Everything else they can do locally, it’s just a single application requires a GPU and x86.

About the hardware… I’m basically looking for availability/ease. If a server provider has a dedicated server for a reasonable price, I’m in. If I can buy a PC that works, I’m in. If I can ask a PC store to assemble some readily-available parts, I’m in.

I can add some complexity in the software project, if someone requires compiling or fiddling, although if I could just get the hardware, slap Debian on it and Incus and be up and running, that would be ideal.

1 Like

This is the store I purchased my Epyc components from:
https://www.ebay.com/str/tugm4470

You could substitute the Supermicro motherboard with a Gigabyte. I found out about Tugm4470 on Serve The Home and Level1techs forums. I concur with the feedback that this Ebay store is solid and good value. All of their Epyc platform hardware is either ex-datacenter or new. My CPU and RAM was used and motherboard new.

The great thing about ex-datacenter hardware is that it has been (typically) given good cooling, dust free, moisture free, quality electricity supply and is past the threshold of most failures. Tugm4470 also tests every component before sending it out. You could work out a standard hardware configuration and order enough for a minimum size Incus cluster (three boxes) and a small stash of spare parts.

Consider used enterprise class NMVE SSD drives instead of new consumer class drives. Tugm should be able to advise on write cycles for a particular product. If you want to do consumer drives I think the Kingston KC3000 PCIE 4.0 drives are solid (they also worked flawlessly with VMware 8.

A bifurcation card that splits a x16 PCIe slot into four x4 is a great way to add four fast NVME drives, plus the two full speed drives that can be installed on most EPYC motherboards.

There may also be highly recommended used ex-datacenter hardware resellers in your country.