SR-IOV with Infiniband HCA and VM, new kernel options

Hello,

I’m trying to get a VM running and presenting an Infiniband HCA to it using SR-IOV. I did manage to get it to work for a container, but not for a VM. I suspect the issue lies somewhere in tweaking to correct switches to disable probing of the VF’s on the host.

There’s a thread from 5 years ago at Trying to use SR-IOV with Mellanox Infiniband: All virtual functions on device "sriov" are already in use that seems to be about a related but opposite problem, namely that the VF’s must be probed before creating a container.

I guess one problem is that in the 5 years since there has been some kernel changes, for one thing there’s nowadays something called ‘autoprobing’. kernel/git/next/linux-next.git - The linux-next integration testing tree

This doc for Mellanox OFED mentions that for kernels above 4.12 one should disable autoprobing rather than the prebe_vf Single Root IO Virtualization (SR-IOV) - NVIDIA Docs

So I suspect I need to disable autoprobing, but for some reason it doesn’t work for me:

# echo 0> /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe ; cat /sys/class/infiniband/mlx5_1/device/sriov_drivers_autoprobe                              
                                                                                                                                                                                                             
1                                                                                                                                                                                                            

And if I try to do it the old-fashioned way with probe_vf it seems that after disabling that I’m not able to create any VF’s, let alone probe them:

# echo N> /sys/module/mlx5_core/parameters/probe_vf                                                                                                                                         
# cat /sys/module/mlx5_core/parameters/probe_vf                                                                                                    
N                                                                                                                                                                                                            
# echo 4>/sys/class/infiniband/mlx5_1/device/sriov_numvfs                                                                                                                                   
                                                                                                                                                                                                             
# cat /sys/class/infiniband/mlx5_1/device/sriov_numvfs
0

This is on Ubuntu 22.04 (both host and guests), with lxd installed via snap (currently 5.20 it seems).

Any ideas what would be the magical incantation to make this work?

I’ve usually been passing the number of VFs to setup and probe directly through the mlx5_core module options. Look at modinfo mlx5_core for a list of flags.

Also remember that you may need to use mlx_config to enable some specific features in the card’s firmware (which then requires a reboot).

I’m traveling and nowhere near my Infiniband test system so can’t be more specific :wink:

Ok, for some update.

The sriov_drivers_autoprobe thing works, and is what you should use. My error was that the 0> confused the shell to think it was about redirecting stderr. Doh! That is, put a space between, like 0 >, then disabling the autoprobing works.

For kernels that have the sriov_drivers_autoprobe file in sysfs, there should be no need to touch the probe_vf parameter, either via /sys/module/mlx5_core/parameters/probe_vf or by adding the probe_vf=0 option when loading the mlx5_core module.