friki67
(Friki67)
May 22, 2025, 1:06pm
1
Hello, after an os update, I’m suffering this:
lxc ollama 20250522124246.594 ERROR utils - ../src/lxc/utils.c:run_buffer:571 - Script exited with status 1
lxc ollama 20250522124246.594 ERROR conf - ../src/lxc/conf.c:lxc_setup:3944 - Failed to run mount hooks
lxc ollama 20250522124246.594 ERROR start - ../src/lxc/start.c:do_start:1273 - Failed to setup container "ollama"
lxc ollama 20250522124246.594 ERROR sync - ../src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc ollama 20250522124246.600 WARN network - ../src/lxc/network.c:lxc_delete_network_priv:3674 - Failed to rename interface with index 0 from "eth0" to its initial name "veth3d95dde7"
lxc ollama 20250522124246.600 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state "ABORTING" instead of "RUNNING"
lxc ollama 20250522124246.600 ERROR start - ../src/lxc/start.c:__lxc_start:2114 - Failed to spawn container "ollama"
lxc ollama 20250522124246.600 WARN start - ../src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 80014
lxc 20250522124246.679 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Conexión reinicializada por la máquina remota - Failed to receive response
lxc 20250522124246.679 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
In one of my computers. This is Rocky 9.5, incus 6.12.
nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 Off | 00000000:02:00.0 Off | N/A |
| 42% 32C P8 9W / 184W | 14MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 2060 Off | 00000000:03:00.0 Off | N/A |
| 29% 38C P8 9W / 184W | 10MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1793 G /usr/libexec/Xorg 11MiB |
+-----------------------------------------------------------------------------------------+
and nvidia drivers and container toolkit installed. It was working ok before update.
nvidia-container-runtime is 3.14.0…
lxc package is 6.0.3.
I’ve seen some posts about similar issues but no solution seems to work.
What could be happening?
osch
May 23, 2025, 1:45am
2
Can you post your container config? incus config show <container>
You might need to add a few environment variables:
environment.NVIDIA_DRIVER_CAPABILITIES: compute,utility
environment.NVIDIA_VISIBLE_DEVICES: all
environment.OLLAMA_HOST: 0.0.0.0:11434
These are the one I have in my ollama container to get it working, next to nvidia.runtime: true
friki67
(Friki67)
May 23, 2025, 6:52am
3
Hello! Thank you.
It is not a container config issue, it is NVIDIA bug! nvidia-container-toolkit 1.17.7 is broken.
see:
opened 09:09AM - 22 May 25 UTC
**Problem:**
All the containers that utilize the `nvidia-container-toolkit` dock… er runtime hang in *Created* state. Nothing suspicious in the `nvidia-container-toolkit`, but segfaults in `dmesg` output related to nvidia.
Downgrading to `libnvidia-container-1.17.6-1 nvidia-container-toolkit-1.17.6-1` solved the issue.
**Debugging information:**
archlinux kernel version: `6.14.6-arch1-1`
command used to verify: `docker run --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi`
nvidia-container-toolkit version: `1.17.7-1`
driver version: `Driver Version: 570.144`
CUDA version: `CUDA Version: 12.8`
Contents of `/var/log/nvidia-container-toolkit.log`:
```log
➜ cat nvidia-container-toolkit.log
-- WARNING, the following logs are for debugging purposes only --
I0522 08:51:48.493265 8342 nvc.c:396] initializing library context (version=1.17.7, build=1.17.7)
I0522 08:51:48.493402 8342 nvc.c:367] using root /
I0522 08:51:48.493417 8342 nvc.c:368] using ldcache /etc/ld.so.cache
I0522 08:51:48.493426 8342 nvc.c:369] using unprivileged user 65534:65534
I0522 08:51:48.493463 8342 nvc.c:413] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0522 08:51:48.493575 8342 nvc.c:415] dxcore initialization failed, continuing assuming a non-WSL environment
I0522 08:51:48.495141 8350 nvc.c:278] loading kernel module nvidia
I0522 08:51:48.495720 8350 nvc.c:282] running mknod for /dev/nvidiactl
I0522 08:51:48.495827 8350 nvc.c:286] running mknod for /dev/nvidia0
I0522 08:51:48.495899 8350 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0522 08:51:48.502778 8350 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0522 08:51:48.502819 8350 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0522 08:51:48.504274 8350 nvc.c:304] loading kernel module nvidia_uvm
I0522 08:51:48.504423 8350 nvc.c:308] running mknod for /dev/nvidia-uvm
I0522 08:51:48.504516 8350 nvc.c:313] loading kernel module nvidia_modeset
I0522 08:51:48.504661 8350 nvc.c:317] running mknod for /dev/nvidia-modeset
I0522 08:51:48.505512 8351 rpc.c:71] starting driver rpc service
I0522 08:51:48.519223 8353 rpc.c:71] starting nvcgo rpc service
I0522 08:51:48.520628 8342 nvc_container.c:244] configuring container with 'cuda-compat-mode=ldconfig compute utility supervised'
I0522 08:51:48.524221 8342 nvc_container.c:266] setting pid to 8336
I0522 08:51:48.524232 8342 nvc_container.c:267] setting rootfs to /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged
I0522 08:51:48.524237 8342 nvc_container.c:268] setting owner to 0:0
I0522 08:51:48.524242 8342 nvc_container.c:269] setting bins directory to /usr/bin
I0522 08:51:48.524247 8342 nvc_container.c:270] setting libs directory to /usr/lib/x86_64-linux-gnu
I0522 08:51:48.524251 8342 nvc_container.c:271] setting libs32 directory to /usr/lib/i386-linux-gnu
I0522 08:51:48.524256 8342 nvc_container.c:272] setting cudart directory to /usr/local/cuda
I0522 08:51:48.524260 8342 nvc_container.c:273] setting ldconfig to @/sbin/ldconfig (host relative)
I0522 08:51:48.524265 8342 nvc_container.c:274] setting mount namespace to /proc/8336/ns/mnt
I0522 08:51:48.524269 8342 nvc_container.c:276] detected cgroupv2
I0522 08:51:48.524274 8342 nvc_container.c:277] setting devices cgroup to /sys/fs/cgroup/system.slice/docker-3e20ad7cbf9644d5ca9de967a569218476b515ef2b6d3368885f7b89bcba44c6.scope
I0522 08:51:48.524285 8342 nvc_info.c:807] requesting driver information with ''
I0522 08:51:48.525028 8342 nvc_info.c:175] selecting /usr/lib/libnvoptix.so.570.144
I0522 08:51:48.525108 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-tls.so.570.144
I0522 08:51:48.525152 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-rtcore.so.570.144
I0522 08:51:48.525192 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-ptxjitcompiler.so.570.144
I0522 08:51:48.525230 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-pkcs11.so.570.144
I0522 08:51:48.525268 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-pkcs11-openssl3.so.570.144
I0522 08:51:48.525307 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-opticalflow.so.570.144
I0522 08:51:48.525348 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-opencl.so.570.144
I0522 08:51:48.525389 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-nvvm.so.570.144
I0522 08:51:48.525429 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-ngx.so.570.144
I0522 08:51:48.525474 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-ml.so.570.144
I0522 08:51:48.525512 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-gpucomp.so.570.144
I0522 08:51:48.525550 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-glvkspirv.so.570.144
I0522 08:51:48.525588 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-glsi.so.570.144
I0522 08:51:48.525642 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-glcore.so.570.144
I0522 08:51:48.525681 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-fbc.so.570.144
I0522 08:51:48.525721 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-encode.so.570.144
I0522 08:51:48.525760 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-eglcore.so.570.144
I0522 08:51:48.525807 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-cfg.so.570.144
I0522 08:51:48.525849 8342 nvc_info.c:175] selecting /usr/lib/libnvidia-allocator.so.570.144
I0522 08:51:48.525890 8342 nvc_info.c:175] selecting /usr/lib/libnvcuvid.so.570.144
I0522 08:51:48.526362 8342 nvc_info.c:175] selecting /usr/lib/libcudadebugger.so.570.144
I0522 08:51:48.526402 8342 nvc_info.c:175] selecting /usr/lib/libcuda.so.570.144
I0522 08:51:48.526949 8342 nvc_info.c:175] selecting /usr/lib/libGLX_nvidia.so.570.144
I0522 08:51:48.526995 8342 nvc_info.c:175] selecting /usr/lib/libGLESv2_nvidia.so.570.144
I0522 08:51:48.527037 8342 nvc_info.c:175] selecting /usr/lib/libGLESv1_CM_nvidia.so.570.144
I0522 08:51:48.527093 8342 nvc_info.c:175] selecting /usr/lib/libEGL_nvidia.so.570.144
W0522 08:51:48.527360 8342 nvc_info.c:411] missing library libnvidia-nscq.so
W0522 08:51:48.527366 8342 nvc_info.c:411] missing library libnvidia-fatbinaryloader.so
W0522 08:51:48.527371 8342 nvc_info.c:411] missing library libnvidia-compiler.so
W0522 08:51:48.527376 8342 nvc_info.c:411] missing library libvdpau_nvidia.so
W0522 08:51:48.527381 8342 nvc_info.c:411] missing library libnvidia-ifr.so
W0522 08:51:48.527385 8342 nvc_info.c:411] missing library libnvidia-cbl.so
W0522 08:51:48.527390 8342 nvc_info.c:415] missing compat32 library libnvidia-ml.so
W0522 08:51:48.527394 8342 nvc_info.c:415] missing compat32 library libnvidia-cfg.so
W0522 08:51:48.527399 8342 nvc_info.c:415] missing compat32 library libnvidia-nscq.so
W0522 08:51:48.527404 8342 nvc_info.c:415] missing compat32 library libcuda.so
W0522 08:51:48.527408 8342 nvc_info.c:415] missing compat32 library libcudadebugger.so
W0522 08:51:48.527413 8342 nvc_info.c:415] missing compat32 library libnvidia-opencl.so
W0522 08:51:48.527417 8342 nvc_info.c:415] missing compat32 library libnvidia-gpucomp.so
W0522 08:51:48.527422 8342 nvc_info.c:415] missing compat32 library libnvidia-ptxjitcompiler.so
W0522 08:51:48.527426 8342 nvc_info.c:415] missing compat32 library libnvidia-fatbinaryloader.so
W0522 08:51:48.527430 8342 nvc_info.c:415] missing compat32 library libnvidia-allocator.so
W0522 08:51:48.527435 8342 nvc_info.c:415] missing compat32 library libnvidia-compiler.so
W0522 08:51:48.527439 8342 nvc_info.c:415] missing compat32 library libnvidia-pkcs11.so
W0522 08:51:48.527444 8342 nvc_info.c:415] missing compat32 library libnvidia-pkcs11-openssl3.so
W0522 08:51:48.527448 8342 nvc_info.c:415] missing compat32 library libnvidia-nvvm.so
W0522 08:51:48.527453 8342 nvc_info.c:415] missing compat32 library libnvidia-ngx.so
W0522 08:51:48.527457 8342 nvc_info.c:415] missing compat32 library libvdpau_nvidia.so
W0522 08:51:48.527462 8342 nvc_info.c:415] missing compat32 library libnvidia-encode.so
W0522 08:51:48.527466 8342 nvc_info.c:415] missing compat32 library libnvidia-opticalflow.so
W0522 08:51:48.527471 8342 nvc_info.c:415] missing compat32 library libnvcuvid.so
W0522 08:51:48.527475 8342 nvc_info.c:415] missing compat32 library libnvidia-eglcore.so
W0522 08:51:48.527480 8342 nvc_info.c:415] missing compat32 library libnvidia-glcore.so
W0522 08:51:48.527484 8342 nvc_info.c:415] missing compat32 library libnvidia-tls.so
W0522 08:51:48.527489 8342 nvc_info.c:415] missing compat32 library libnvidia-glsi.so
W0522 08:51:48.527493 8342 nvc_info.c:415] missing compat32 library libnvidia-fbc.so
W0522 08:51:48.527498 8342 nvc_info.c:415] missing compat32 library libnvidia-ifr.so
W0522 08:51:48.527502 8342 nvc_info.c:415] missing compat32 library libnvidia-rtcore.so
W0522 08:51:48.527507 8342 nvc_info.c:415] missing compat32 library libnvoptix.so
W0522 08:51:48.527511 8342 nvc_info.c:415] missing compat32 library libGLX_nvidia.so
W0522 08:51:48.527521 8342 nvc_info.c:415] missing compat32 library libEGL_nvidia.so
W0522 08:51:48.527526 8342 nvc_info.c:415] missing compat32 library libGLESv2_nvidia.so
W0522 08:51:48.527530 8342 nvc_info.c:415] missing compat32 library libGLESv1_CM_nvidia.so
W0522 08:51:48.527535 8342 nvc_info.c:415] missing compat32 library libnvidia-glvkspirv.so
W0522 08:51:48.527539 8342 nvc_info.c:415] missing compat32 library libnvidia-cbl.so
I0522 08:51:48.527761 8342 nvc_info.c:301] selecting /usr/bin/nvidia-smi
I0522 08:51:48.527783 8342 nvc_info.c:301] selecting /usr/bin/nvidia-debugdump
I0522 08:51:48.527803 8342 nvc_info.c:301] selecting /usr/bin/nvidia-persistenced
I0522 08:51:48.527837 8342 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-control
I0522 08:51:48.527857 8342 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-server
W0522 08:51:48.527907 8342 nvc_info.c:437] missing binary nv-fabricmanager
I0522 08:51:48.527970 8342 nvc_info.c:497] listing firmware path /lib/firmware/nvidia/570.144/gsp_ga10x.bin
I0522 08:51:48.527976 8342 nvc_info.c:497] listing firmware path /lib/firmware/nvidia/570.144/gsp_tu10x.bin
I0522 08:51:48.528005 8342 nvc_info.c:570] listing device /dev/nvidiactl
I0522 08:51:48.528010 8342 nvc_info.c:570] listing device /dev/nvidia-uvm
I0522 08:51:48.528014 8342 nvc_info.c:570] listing device /dev/nvidia-uvm-tools
I0522 08:51:48.528019 8342 nvc_info.c:570] listing device /dev/nvidia-modeset
W0522 08:51:48.528048 8342 nvc_info.c:359] missing ipc path /var/run/nvidia-persistenced/socket
W0522 08:51:48.528077 8342 nvc_info.c:359] missing ipc path /var/run/nvidia-fabricmanager/socket
W0522 08:51:48.528096 8342 nvc_info.c:359] missing ipc path /tmp/nvidia-mps
I0522 08:51:48.528102 8342 nvc_info.c:863] requesting device information with ''
I0522 08:51:48.536803 8342 nvc_info.c:754] listing device /dev/nvidia0 (GPU-22989643-e2e9-8a6f-03cd-d58a1ef72651 at 00000000:01:00.0)
I0522 08:51:48.536869 8342 nvc_mount.c:369] mounting tmpfs at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/proc/driver/nvidia
W0522 08:51:48.537153 8342 utils.c:547] The path /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin already exists with the required mode; skipping create
I0522 08:51:48.537431 8342 nvc_mount.c:89] mounting /usr/bin/nvidia-smi at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin/nvidia-smi with flags 0x7
I0522 08:51:48.537517 8342 nvc_mount.c:89] mounting /usr/bin/nvidia-debugdump at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin/nvidia-debugdump with flags 0x7
I0522 08:51:48.537591 8342 nvc_mount.c:89] mounting /usr/bin/nvidia-persistenced at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin/nvidia-persistenced with flags 0x7
I0522 08:51:48.537678 8342 nvc_mount.c:89] mounting /usr/bin/nvidia-cuda-mps-control at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin/nvidia-cuda-mps-control with flags 0x7
I0522 08:51:48.537749 8342 nvc_mount.c:89] mounting /usr/bin/nvidia-cuda-mps-server at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/bin/nvidia-cuda-mps-server with flags 0x7
W0522 08:51:48.537794 8342 utils.c:547] The path /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu already exists with the required mode; skipping create
I0522 08:51:48.537958 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-ml.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.570.144 with flags 0x7
I0522 08:51:48.538036 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-cfg.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.570.144 with flags 0x7
I0522 08:51:48.538126 8342 nvc_mount.c:89] mounting /usr/lib/libcuda.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libcuda.so.570.144 with flags 0x7
I0522 08:51:48.538205 8342 nvc_mount.c:89] mounting /usr/lib/libcudadebugger.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libcudadebugger.so.570.144 with flags 0x7
I0522 08:51:48.538286 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-opencl.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.570.144 with flags 0x7
I0522 08:51:48.538364 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-gpucomp.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.570.144 with flags 0x7
I0522 08:51:48.538440 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-ptxjitcompiler.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.570.144 with flags 0x7
I0522 08:51:48.538514 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-allocator.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.570.144 with flags 0x7
I0522 08:51:48.538597 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-pkcs11.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.570.144 with flags 0x7
I0522 08:51:48.538672 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-pkcs11-openssl3.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.570.144 with flags 0x7
I0522 08:51:48.538747 8342 nvc_mount.c:89] mounting /usr/lib/libnvidia-nvvm.so.570.144 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.570.144 with flags 0x7
I0522 08:51:48.538775 8342 nvc_mount.c:530] creating symlink /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0522 08:51:48.539022 8342 nvc_mount.c:89] mounting /usr/lib/firmware/nvidia/570.144/gsp_ga10x.bin at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/firmware/nvidia/570.144/gsp_ga10x.bin with flags 0x7
I0522 08:51:48.539135 8342 nvc_mount.c:89] mounting /usr/lib/firmware/nvidia/570.144/gsp_tu10x.bin at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/usr/lib/firmware/nvidia/570.144/gsp_tu10x.bin with flags 0x7
I0522 08:51:48.539200 8342 nvc_mount.c:233] mounting /dev/nvidiactl at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/dev/nvidiactl
I0522 08:51:48.540462 8342 nvc_mount.c:233] mounting /dev/nvidia-uvm at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/dev/nvidia-uvm
I0522 08:51:48.541317 8342 nvc_mount.c:233] mounting /dev/nvidia-uvm-tools at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/dev/nvidia-uvm-tools
I0522 08:51:48.542168 8342 nvc_mount.c:233] mounting /dev/nvidia0 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/dev/nvidia0
I0522 08:51:48.542266 8342 nvc_mount.c:443] mounting /proc/driver/nvidia/gpus/0000:01:00.0 at /var/lib/docker/overlay2/2b152effaf9adbe3a438dc4adf63d33b5ea5f6b2fe3ddce85cdd5cc294e9bdd6/merged/proc/driver/nvidia/gpus/0000:01:00.0
```
Output of `sudo dmesg | grep -i nvidia`:
```
[ 3.558564] nvidia: loading out-of-tree module taints kernel.
[ 3.558569] nvidia: module license 'NVIDIA' taints kernel.
[ 3.558570] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 3.558571] nvidia: module license taints kernel.
[ 3.667113] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 3.668274] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 3.714578] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 570.144 Thu Apr 10 20:33:29 UTC 2025
[ 3.754669] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 570.144 Thu Apr 10 20:03:03 UTC 2025
[ 3.755221] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 3.758064] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 3.834344] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input18
[ 3.834369] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input19
[ 3.834388] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input20
[ 3.834407] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input21
[ 4.793164] [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 1
[ 4.851085] nvidia 0000:01:00.0: vgaarb: deactivate vga console
[ 4.888339] fbcon: nvidia-drmdrmfb (fb0) is primary device
[ 5.016722] nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
[ 182.001204] nvidia-containe[5657]: segfault at 0 ip 000070190de67398 sp 00007ffc29492650 error 4 in libnvidia-container.so.1.17.7[a398,70190de61000+15000] likely on CPU 6 (core 12, socket 0)
[ 361.047016] nvidia-containe[8342]: segfault at 0 ip 000075e350ef2398 sp 00007ffc72d367b0 error 4 in libnvidia-container.so.1.17.7[a398,75e350eec000+15000] likely on CPU 6 (core 12, socket 0)
[ 495.751434] nvidia-containe[9191]: segfault at 0 ip 00007c4daea3a398 sp 00007fff973468e0 error 4 in libnvidia-container.so.1.17.7[a398,7c4daea34000+15000] likely on CPU 6 (core 12, socket 0)
[ 598.239672] nvidia-containe[10665]: segfault at 0 ip 00007238e95be398 sp 00007ffe85c1f660 error 4 in libnvidia-container.so.1.17.7[a398,7238e95b8000+15000] likely on CPU 28 (core 44, socket 0)
[ 680.944921] nvidia-containe[11729]: segfault at 0 ip 00007c6caf51f398 sp 00007ffea1dee790 error 4 in libnvidia-container.so.1.17.7[a398,7c6caf519000+15000] likely on CPU 10 (core 20, socket 0)
```
opened 11:41AM - 19 May 25 UTC
As reported in this enroot issue https://github.com/NVIDIA/enroot/issues/232, th… e 1.17.7-1 version of the container toolkit breaks the use of enroot containers with slurm. Downgrading to a previous toolkit version (1.17.6-1) resolves the issue.
So you have to revert back to 1.17.6, or wait for the next release.
Best regards
sieveLau
(Sieve Lau)
May 25, 2025, 3:51pm
4
You are my hero! I’v spent a whole day trying to downgrade this and that but nothing work until I saw your post. Thanks!
friki67
(Friki67)
May 27, 2025, 8:22am
5
I’m glad to hear it. Good luck!