PCI/GPU passthrough for VMs - 3.23+

Apologies if this question has already been answered …

I have a use case where I need to manage containers and VMs on a cluster in an integrated fashion. Some of the VMs need access to a raw NVIDIA GPU device (without there being any drivers on the LXD server).

I see that 3.20+ brings the ability to configure a physical-nic device (using QEMU PCI passthrough) for a VM, and was wondering if similar capability is available or forthcoming for a generic (non-NIC) device.

Thanks,
-Vijay

Not currently but it’s on our short term roadmap.
Until then, it’s possible to directly do it by passing the right arguments directly to qemu using the raw.qemu config key. We had a few users do that successfully.

1 Like

Thanks for the quick reply!
Is there a pointer you can share for documentation on how to use the raw.qemu key … will also search these forums on my own.

Thanks,
-Vijay

@morphis can you share what you’ve been doing?

Never mind, figured this out …

In case someone else is interested, describing what I did below:

After doing the usual IOMMU and vfio stuff (to isolate the desired PCI device from the host), the device can be passed through to the host as shown in the commands below:

akriadmin@c4akri01:~$ lxc profile show vm
config:
  raw.qemu: -device vfio-pci,host=41:00.0
description: LXD profile for virtual machines
devices: {}
name: vm
used_by: []
akriadmin@c4akri01:~$ lxc launch --target c4akri01 --vm images:ubuntu/18.04 --profile default --profile vm
Creating the instance
Instance name is: loving-peacock
Starting loving-peacock
akriadmin@c4akri01:~$
1 Like

@Vijay_Karamcheti Yes, PCI passthrough via vfio is what I did as well.

If your card is in use by another driver (for example in case of a GPU) you have to unbind it first. For GPUs add the following to /etc/default/grub (NOTE: The ID is the <vendor>:<product> combination you can find with lspci -nn)

GRUB_CMDLINE_LINUX_DEFAULT="video=vesafb:off,efifb:off vfio-pci.ids=1002:67c7,1002:aaf0"

Where the PCI ids are the ones of your GPU. In this case the GPU has also an audio PCI device which has to be passed to VFIO within the same group, otherwise Qemu will fail to forward. Also add

vfio-pci 
vfio_iommu_type1

to your /etc/modules and create /etc/modprobe.d/vfio.conf with the following content

options vfio-pci ids=1002:67c7,1002:aaf0
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep drm pre: vfio-pci

Now run

$ update-grub
$ update-initramfs -u -k all

and finally reboot the system. Once back up the PCI device can now be forwarded via a raw.qemu: -device vfio-pci,host=01:00.0.

Hope this helps!

1 Like

Hi, I am well aversed to passing GPU through in KVM with Virt-Manager, but am now trying to understand how to do this with LXC VM’s. Could you emaplain your statment
’ and finally reboot the system. Once back up the PCI device can now be forwarded via a raw.qemu: -device vfio-pci,host=01:00.0 .’
please?

Kind regards.

With more recent LXD we have the pci device type which does this for you.

lxc config device add INSTANCE my-device pci address=01:00.0 will attach host PCI device 01:00.0 to the VM called INSTANCE.

If you’re dealing with a GPU or NIC, then it’s best to use the physical type for either the gpu or nic device types respectively. This will perform the same VFIO passthrough but will put them in a different spot on the PCIe bus and in the GPU case will also pass through related devices (built-in USB, sound card, … that may be found on the GPU PCB)

1 Like

Thanks stgaber, this is great, could you give me more info, and perhaps an example, for the ’ physical type for either the gpu or nic device types respectively’ please?

lxc config device add INSTANCE my-gpu gpu gputype=physical pci=01:00.0
lxc config device add INSTANCE my-nic nic nictype=physical pci=01:00.0

So they both work in a very similar way as you can tell.

1 Like

Thank you, that works great.
One more quick question about vGPU and mdev devices.
I have Nvidia T4 GPU’s, I have two in this particular host. Looking at this guide (Instances | LXD), it says to use the mdev profile, gained from running ‘lxc info --resources’. the output of that (seen below), I am using this:

lxc config device add winserver my-gpu gpu gputype=mdev mdev=nvidia-224

But this results in the error ‘Error: Failed to start device “my-gpu”: Invalid mdev profile “nvidia-224”’

What am I doing wrong here please?

GPUs:
  Card 0:
    NUMA node: 0
    Vendor: Matrox Electronics Systems Ltd. (102b)
    Product: Integrated Matrox G200eW3 Graphics Controller (0536)
    PCI address: 0000:03:00.0
    Driver: mgag200 (5.4.0-84-generic)
    DRM:
      ID: 0
      Card: card0 (226:0)
      Control: controlD64 (226:0)
  Card 1:
    NUMA node: 0
    Vendor: NVIDIA Corporation (10de)
    Product: TU104GL [Tesla T4] (1eb8)
    PCI address: 0000:5e:00.0
    Driver: nvidia (460.73.02)
    NVIDIA information:
      Architecture: 7.5
      Brand: Unknown
      Model: Tesla T4
      CUDA Version: 11.2
      NVRM Version: 460.73.02
      UUID: GPU-9746e6f9-2366-4f05-deb8-28a48eb78f9e
    SR-IOV information:
      Current number of VFs: 0
      Maximum number of VFs: 16
    Mdev profiles:
      - nvidia-222 (16 available)
          num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-223 (8 available)
          num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
      - nvidia-224 (8 available)
          num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
      - nvidia-225 (16 available)
          num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=16
      - nvidia-226 (8 available)
          num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=8
      - nvidia-227 (4 available)
          num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=4
      - nvidia-228 (2 available)
          num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=2
      - nvidia-229 (1 available)
          num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=1280x1024, max_instance=1
      - nvidia-230 (16 available)
          num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-231 (8 available)
          num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=8
      - nvidia-232 (4 available)
          num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=4
      - nvidia-233 (2 available)
          num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=2
      - nvidia-234 (1 available)
          num_heads=4, frl_config=60, framebuffer=16384M, max_resolution=7680x4320, max_instance=1
      - nvidia-252 (16 available)
          num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-319 (4 available)
          num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=4
      - nvidia-320 (2 available)
          num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=2
      - nvidia-321 (1 available)
          num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=4096x2160, max_instance=1
  Card 2:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: TU104GL [Tesla T4] (1eb8)
    PCI address: 0000:d8:00.0
    Driver: nvidia (460.73.02)
    NVIDIA information:
      Architecture: 7.5
      Brand: Unknown
      Model: Tesla T4
      CUDA Version: 11.2
      NVRM Version: 460.73.02
      UUID: GPU-e69f515c-a7fb-a7fc-b160-0e913a29c402
    SR-IOV information:
      Current number of VFs: 0
      Maximum number of VFs: 16
    Mdev profiles:
      - nvidia-222 (16 available)
          num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-223 (8 available)
          num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
      - nvidia-224 (8 available)
          num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
      - nvidia-225 (16 available)
          num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=16
      - nvidia-226 (8 available)
          num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=8
      - nvidia-227 (4 available)
          num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=4
      - nvidia-228 (2 available)
          num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=2
      - nvidia-229 (1 available)
          num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=1280x1024, max_instance=1
      - nvidia-230 (16 available)
          num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-231 (8 available)
          num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=8
      - nvidia-232 (4 available)
          num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=4
      - nvidia-233 (2 available)
          num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=2
      - nvidia-234 (1 available)
          num_heads=4, frl_config=60, framebuffer=16384M, max_resolution=7680x4320, max_instance=1
      - nvidia-252 (16 available)
          num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
      - nvidia-319 (4 available)
          num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=4
      - nvidia-320 (2 available)
          num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=2
      - nvidia-321 (1 available)
          num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=4096x2160, max_instance=1

You’ll need to be specific in this case as you also have a built-in Matrox card which doesn’t have mdev.

lxc config device add winserver my-gpu gpu gputype=mdev mdev=nvidia-224 vendorid=10de productid=1eb8

That should do the trick to select the particular type of card without telling LXD which one of the two to allocate from.

1 Like

Thanks stgraber, so with the nvidia-224 profile, I have a total of 8 availabel on this host, this will just use 1, leaving 7 free?

As I have two of the same GPU’s installed in this host, I had to specify the PCI address. I found using the below worked great.

lxc config device add winserver my-gpu gpu gputype=mdev mdev=nvidia-232 pci=5e:00.0

Thanks again.