Lxd config device add gpu device error?

Error: Failed to start device “sss”: Failed to override IOMMU group driver: Device took too long to activate at “/sys/bus/pci/drivers/vfio-pci/0000:01:00.0”
lxc version 5.0.0
type:virtual-machine
Cuda compilation tools, release 11.7

I think that should be in IOMMU group. What do I need to do

Hi @wwws , I suppose you are trying to achieve to add gpu to the container right, what command have you executed? I have not any nvidia gpu device so cant try but have you look at that video @stgraber explains briefly.
https://www.youtube.com/watch?v=1i45zTu42i0
Regards.

i’m try to execute lxc config device add v-9bb2696b33eb4b6ba93b2cd7ed821545 test gpu productid=1cb1 vendorid=10de and then start, and inform the failure,I’ll watch the video to see if it can help me,thanks!!!

Hello,has this problem been solved yet? I met this problem same as yours ,but I have no ideas to deal with that.
Hoping for your reply.
Kind regard.
lxd version 5.7


Can you show lxc info gpu?

So far this suggests you’re dealing with a virtual machine, for those to work with passthrough, you must:

  • Not be using the GPU at all on the host (the fact that nvidia-smi even lists it suggests it’s in use)
  • Have your GPU alone in its IOMMU group (usually requires firmware configuration to enable VT-d or similar)

We have daily tests validating such passthrough to virtual machines using NVIDIA GPUs, so we’re reasonably sure that it works. But it’s very important that nothing at all uses the GPU, which typically means that you don’t even want to have drivers for the GPU present on the host system.

Hi,stgraber.Thanks for your reply.Yeah, I am dealing with a virtual machine and try to check my host again after reading your suggestions.
And this is my vm info and my GPU info.
image


image

The problem is not resolved yet, and it has not been transparently transmitted to the VM

Hi, @wwws and @stgraber. I checked my host again and found that my host was not enabled IOMMU,which caused this problem.Here I’d like to share my solutions. My host is RHL.

  1. Make sure your host chipset supports Intel VT-d or AMD-Vi.(go to the Bios to confirm that)
  2. add at the end of GRUB_CMDLINE_LINUX_DEFAULT:
    vim /etc/default/grub
    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on vfio_pci.ids=10de:1eb8 kvm.ignore_msrs=1"
    NOTE: (1) Remember replace the vfio_pci.ids with the ones in your machine.
    (2) The above is one single line without line break
    (3) If using the kvm.ignore_msrs=1 causes issues, you can remove it.
    (4) (intel_iommu/amd_iommu)
  3. Regenerate the boot loader configuration
    Normal :
    sudo grub2-mkconfig -o /etc/grub2.cfg
    For UEFI-system :
    sudo grub2-mkconfig -o /etc/grub2-efi.cfg
    Then reboot your host.
  4. Enter your host again and try :
    dmesg | grep -E "DMAR|IOMMU"
    image
    If you see IOMMU is enabled , you’ve successfully enabled IOMMU.
  5. check if vfio-pci worked
    lspci -kn | grep -A 2 af:00.
    image
    NOTE: replace “af:00.” with your GPU’s PCI bus ID.

END.
Hope these may be useful to you.Kind regard.

You can see the driver appears in this path: /sys/bus/pci/drivers/vfio-pci/.Before that , this driver was missed, which makes the lxd report this error.
image

Thank you. The problem has been solved.I actually enabled IOMMU,The nvidia driver must be installed on the physical host and nvidia-utils needs to be installed in the lxc container.