Incus with NVidia Tesla M60

forky2 · April 15, 2025, 8:49am

Hello,

I’m just starting out with Incus and passthrough GPU and I am having a bit of trouble getting SR-IOV to work. I’ll be as brief as possible.

I’ve installed Ubuntu Server 24.04.2 on a Dell PowerEdge R730
An NVIDIA Tesla M60 card has been installed in the server; the card is definitely in Graphics mode, tested with gpumodeswitch utility
I’ve enabled Global SR-IOV in the BIOS of the server; I’ve disabled Secure Boot to prevent kernel lockdown
Incus 6.11 is installed from Zabbly stable APT repo
The latest NVIDIA-GRID-Ubuntu-KVM host drivers have been installed and nouveau blacklisted so that they can take effect
I have not as yet installed any GRID licensing server
The card is present and recognised as two GPUs:

$ nvidia-smi
Tue Apr 15 08:40:49 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03             Driver Version: 570.124.03     CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla M60                      On  |   00000000:84:00.0 Off |                  Off |
| N/A   37C    P8             24W /  150W |      19MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla M60                      On  |   00000000:85:00.0 Off |                  Off |
| N/A   33C    P8             24W /  150W |      19MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Unfortunately when I attempt to add the card to a VM instance with incus config device add <instance> gpu-1 gpu gputype=sriov pci=0000:84:00.0 and start the VM I get the message Failed to start device "gpu-1": Couldn't find a matching GPU with available VFs.

Indeed, I read that in directory /sys/bus/pci/devices/0000\:84\:00.0 I should expect to see a number of driver special files such as max_vfs but there’s nothing to indicate that SR-IOV is an option for this card.

Any ideas what additional steps I need to take to get SR-IOV working? If I attach one of the Tesla M60 GPUs as a physical passthrough to the VM then I can get the VM to boot, and with the guest NVIDIA drivers installed then the card will be used. I just need to see if I can get SR-IOV working for this proof of concept test.

Edit: Have I made the rookie mistake of believing the M60 to be SR-IOV compatible when it isn’t? I’ve just checked capabilities with lspci and nothing about SR-IOV shows up:

$ sudo lspci -v -s 84:00.0
84:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation GM204GL [Tesla M60]
        Flags: bus master, fast devsel, latency 0, IRQ 104, NUMA node 1, IOMMU group 9
        Memory at c9000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3ffe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 3fff0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 9000 [size=128]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia

I could have sworn I read that this card was SR-IOV capable; I thought it was the whole point of the card, i.e. for vGPU on a host with lots of workstation guests. Am I wrong?

Thanks.

forky2 · April 15, 2025, 12:43pm

Hello,

We figured it out. Leaving this for anyone else that has the same problem as me.

While - yes - this card does support vGPU, it does not support SR-IOV. Instead, vGPU is supported via Mdev (Mediated Device). mdevctl types should result in a list of usable profiles.

However in my case that didn’t work either. Eventually I came to the realisation that with the latest “Complete vGPU 18.0 package” from NVIDIA, the Tesla M60 is no longer supported. Instead I downloaded “Complete vGPU 16.9 package” and installed that.

Things are now working!

Thanks.