I tried starting the containers both privileged and unprivileged. Starting privliged removed the lxc Gentoo-Nvidia 20211230170057.516 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(44, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW ) line in the logs, so I figured that was progress, but in that case, I’ll switch the containers back to unprivileged.
nvidia-smi works fine with the following output:
Sat Jan 1 03:34:57 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:09:00.0 On | N/A |
| 0% 57C P0 67W / 250W | 258MiB / 11175MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:0C:00.0 N/A | N/A |
| 30% 34C P8 N/A / N/A | 10MiB / 2000MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:0D:00.0 N/A | N/A |
| 30% 31C P8 N/A / N/A | 10MiB / 2000MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3048 G /usr/bin/X 28MiB |
| 0 N/A N/A 3160 G X 193MiB |
| 0 N/A N/A 3431 G picom 29MiB |
| 0 N/A N/A 3499 G kitty 3MiB |
+-----------------------------------------------------------------------------+
I’m trying to pass in the second 690 to the container, it’s one of those old Nvidia GPUs that had two GPUs on the same card that was meant to be used for SLI.
nvidia-container-cli also appears to be working fine, although strangely has permission errors when run with sudo but runs fine as my user:
NVRM version: 470.94
CUDA version: 11.4
Device Index: 0
Device Minor: 0
Model: NVIDIA GeForce GTX 1080 Ti
Brand: GeForce
GPU UUID: GPU-697d9489-ad9f-585e-3b5e-6d5a7e864f8a
Bus Location: 00000000:09:00.0
Architecture: 6.1
Device Index: 1
Device Minor: 1
Model: NVIDIA GeForce GTX 690
Brand: GeForce
GPU UUID: GPU-215a709b-ab16-3406-e07f-358df47167ed
Bus Location: 00000000:0c:00.0
Architecture: 3.0
Device Index: 2
Device Minor: 2
Model: NVIDIA GeForce GTX 690
Brand: GeForce
GPU UUID: GPU-c11b18a9-dacd-d424-5418-0d97f0bf3525
Bus Location: 00000000:0d:00.0
Architecture: 3.0