Hello,
I’m trying to get a VM running and presenting an Infiniband HCA to it using SR-IOV. I did manage to get it to work for a container, but not for a VM. I suspect the issue lies somewhere in tweaking to correct switches to disable probing of the VF’s on the host.
There’s a thread from 5 years ago at Trying to use SR-IOV with Mellanox Infiniband: All virtual functions on device "sriov" are already in use that seems to be about a related but opposite problem, namely that the VF’s must be probed before creating a container.
I guess one problem is that in the 5 years since there has been some kernel changes, for one thing there’s nowadays something called ‘autoprobing’. kernel/git/next/linux-next.git - The linux-next integration testing tree
This doc for Mellanox OFED mentions that for kernels above 4.12 one should disable autoprobing rather than the prebe_vf
Single Root IO Virtualization (SR-IOV) - NVIDIA Docs
So I suspect I need to disable autoprobing, but for some reason it doesn’t work for me:
# echo 0> /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe ; cat /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe
1
And if I try to do it the old-fashioned way with probe_vf
it seems that after disabling that I’m not able to create any VF’s, let alone probe them:
# echo N> /sys/module/mlx5_core/parameters/probe_vf
# cat /sys/module/mlx5_core/parameters/probe_vf
N
# echo 4>/sys/class/infiniband/mlx5_1/device/sriov_numvfs
# cat /sys/class/infiniband/mlx5_1/device/sriov_numvfs
0
This is on Ubuntu 22.04 (both host and guests), with lxd installed via snap (currently 5.20 it seems).
Any ideas what would be the magical incantation to make this work?