I’m trying to get a VM running and presenting an Infiniband HCA to it using SR-IOV. I did manage to get it to work for a container, but not for a VM. I suspect the issue lies somewhere in tweaking to correct switches to disable probing of the VF’s on the host.
There’s a thread from 5 years ago at Trying to use SR-IOV with Mellanox Infiniband: All virtual functions on device "sriov" are already in use that seems to be about a related but opposite problem, namely that the VF’s must be probed before creating a container.
I guess one problem is that in the 5 years since there has been some kernel changes, for one thing there’s nowadays something called ‘autoprobing’. kernel/git/next/linux-next.git - The linux-next integration testing tree
This doc for Mellanox OFED mentions that for kernels above 4.12 one should disable autoprobing rather than the
prebe_vf Single Root IO Virtualization (SR-IOV) - NVIDIA Docs
So I suspect I need to disable autoprobing, but for some reason it doesn’t work for me:
# echo 0> /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe ; cat /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe
And if I try to do it the old-fashioned way with
probe_vf it seems that after disabling that I’m not able to create any VF’s, let alone probe them:
# echo N> /sys/module/mlx5_core/parameters/probe_vf
# cat /sys/module/mlx5_core/parameters/probe_vf
# echo 4>/sys/class/infiniband/mlx5_1/device/sriov_numvfs
# cat /sys/class/infiniband/mlx5_1/device/sriov_numvfs
This is on Ubuntu 22.04 (both host and guests), with lxd installed via snap (currently 5.20 it seems).
Any ideas what would be the magical incantation to make this work?