Nvidia GPU passthrough not working on Gentoo host after running "sudo lxc config set Container-Name nvidia.driver.capabilities=all"

Sol33t303 · December 31, 2021, 4:40pm

I tried starting the containers both privileged and unprivileged. Starting privliged removed the lxc Gentoo-Nvidia 20211230170057.516 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(44, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW ) line in the logs, so I figured that was progress, but in that case, I’ll switch the containers back to unprivileged.

nvidia-smi works fine with the following output:

Sat Jan  1 03:34:57 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94       Driver Version: 470.94       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
|  0%   57C    P0    67W / 250W |    258MiB / 11175MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0C:00.0 N/A |                  N/A |
| 30%   34C    P8    N/A /  N/A |     10MiB /  2000MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:0D:00.0 N/A |                  N/A |
| 30%   31C    P8    N/A /  N/A |     10MiB /  2000MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3048      G   /usr/bin/X                         28MiB |
|    0   N/A  N/A      3160      G   X                                 193MiB |
|    0   N/A  N/A      3431      G   picom                              29MiB |
|    0   N/A  N/A      3499      G   kitty                               3MiB |
+-----------------------------------------------------------------------------+

I’m trying to pass in the second 690 to the container, it’s one of those old Nvidia GPUs that had two GPUs on the same card that was meant to be used for SLI.

nvidia-container-cli also appears to be working fine, although strangely has permission errors when run with sudo but runs fine as my user:

NVRM version:   470.94
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce GTX 1080 Ti
Brand:          GeForce
GPU UUID:       GPU-697d9489-ad9f-585e-3b5e-6d5a7e864f8a
Bus Location:   00000000:09:00.0
Architecture:   6.1

Device Index:   1
Device Minor:   1
Model:          NVIDIA GeForce GTX 690
Brand:          GeForce
GPU UUID:       GPU-215a709b-ab16-3406-e07f-358df47167ed
Bus Location:   00000000:0c:00.0
Architecture:   3.0

Device Index:   2
Device Minor:   2
Model:          NVIDIA GeForce GTX 690
Brand:          GeForce
GPU UUID:       GPU-c11b18a9-dacd-d424-5418-0d97f0bf3525
Bus Location:   00000000:0d:00.0
Architecture:   3.0