HELP! launch ubuntu vm with nvidia.runtime=true failed

lxc launch images:ubuntu/20.04/cloud u1 -c nvidia.runtime=true --vm
Creating u1
Error: Failed instance creation: Failed creating instance record: Unknown configuration key: nvidia.runtime
~$ lxc --version
~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Error says it all :slight_smile:

nvidia.runtime is to control the NVIDIA container runtime, you’re getting a VM and so that option does not apply to it.

For VMs, you can pass in an entire GPU but this requires you to have another GPU for your system’s main video output and requires your hardware to allow for PCI passthrough to virtual machines.

Thank you! Now I know nvidia.runtime is container only parameter.

I am interested in lxd vm gpu passthrough, do I need to config IOMMU isolation for lxd gpu passthrough?

Yep, you’ll need IOMMU groups to be properly setup on your system, this usually involves a trip to your BIOS to make sure IOMMU/VT-d/AMD-equivalent are all enabled, then booting your kernel with something like iommu=pt intel_iommu=on amd_iommu=on

Once that’s done, you can use lspci -vmm to see what IOMMU group your GPU is on and make sure that nothing else is sharing that group, if that all looks good and you’re not using the GPU for the host system itself, then you can pass it through to your VM with LXD using gpu gputype=physical pci=ADDRESS

1 Like

Thank you, it takes me a while to figure it out, and now it works. the key here is make sure the vfio-pci is successfully loaded instead of nvidia driver and then after that put the new device id assign to lxc vm it works. its not easy at all takes me months to make it work :slight_smile:

Yeah, LXD does support unbinding from the current driver but that requires the existing driver to be happy to let go of the device and that’s not the case for the nvidia driver.

So you indeed need to jump through some hoops (distro dependent) to make sure the nvidia driver doesn’t bind the device on startup.

Can you please describe how you achieved that, what system did you use?
Would somthing like this be useful VFIO - “Virtual Function I/O” — The Linux Kernel documentation ?