Cannot get nvidia card working in container

I’m tearing my hair out trying to solve this! It was working fine last week before I wiped the machine and reinstalled.

OS: ubuntu 18.04

nvidia-smi shows this:

mishac@host:~$ nvidia-smi
Sat Apr 20 10:44:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| 45%   28C    P0    N/A /  75W |      0MiB /  1997MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

minimal example showing it not working:

mishac@host:~$ lxc launch  ubuntu:18.04 testo                                                                                                                                          
Creating testo
Starting testo
mishac@host:~$ lxc config device add testo gpu gpu
Device gpu added to testo
mishac@host:~$ lxc config set testo nvidia.runtime true
mishac@host:~$ lxc restart testo
mishac@host:~$ lxc exec testo bash
root@testo:~# nvidia-smi
No devices were found

/dev/dri on host:

mishac@host:~$ ls -la /dev/dri
total 0
drwxr-xr-x  3 root root       100 Apr 20 10:24 .
drwxr-xr-x 24 root root      6300 Apr 20 10:24 ..
drwxr-xr-x  2 root root        80 Apr 20 10:24 by-path
crw-rw----  1 root video 226,   0 Apr 20 10:24 card0
crw-rw----  1 root video 226, 128 Apr 20 10:24 renderD128

nvidia devices on host:

mishac@host:~$ ls -la /dev/nvid*
crw-rw-rw- 1 root root 195,   0 Apr 20 10:24 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 20 10:24 /dev/nvidiactl
crw-rw-rw- 1 root root 236,   0 Apr 20 10:24 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236,   1 Apr 20 10:24 /dev/nvidia-uvm-tools

in container:

root@testo:~# ls -la /dev/dri
total 2
drwxr-xr-x 2 root root       80 Apr 20 14:50 .
drwxr-xr-x 9 root root      580 Apr 20 14:50 ..
crw-rw---- 1 root root 226,   0 Apr 20 14:50 card0
crw-rw---- 1 root root 226, 128 Apr 20 14:50 renderD128

root@testo:~# ls -la /dev/nvidi*
crw-rw-rw- 1 nobody nogroup 236,   0 Apr 20 14:24 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 236,   1 Apr 20 14:24 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   root    195,   0 Apr 20 14:50 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Apr 20 14:24 /dev/nvidiactl

I’ve tried installing/uninstalling CUDA on the container, on the host, and tried installing/uninstalling/reinstalling the nvidia-container-runtime on the host, as well as several different driver versions (390, 415, 418) as well as several kernel versions (currently on the HWE kernel, 4.18, but the same happend on the stock kernel, 4.15)

Not sure where to go from here :frowning:

Did you ever figure this out? I’m having the same problem, Ubuntu 18.04 and LXD 3.15. I just cannot get the nvidia gpu to work inside the container, whenever I run nvidia-smi it just says No devices were found

Hi!

If you get

ubuntu@mycontainer:~$ nvidia-smi 
No devices were found

then most likely the container does not see the correct GPU device. That is, there is some issue with lxc config device add mycontainer gpu gpu.

For example, you may have two GPUs, and the command (as it is) may pick the other GPU by default. In that case, you would need to specify the NVidia GPU. See how to specify the GPU id or PCI ID in LXD.

To figure out whether the GPU device (lxc config device add mycontainer gpu gpu) was added successfully, see the file /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf. There should be a line specifically for Nvidia (lxc.mount.entry = /var/snap/lxd/common/lxd/devices/mycontainer/unix.gpu.dev-nvidia0 dev/nvidia0 none bind,create=file 0 0).

Thanks for the reply. It seems to pass the correct GPU, I get the same error when I use the PCI ID. I also see the card in the container

$ ll /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 234,   0 Aug  6 12:00 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 234,   1 Aug  6 12:00 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   video   195,   0 Aug  6 12:05 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Aug  6 12:00 /dev/nvidiactl

$ ll /dev/dri/*
crw-rw---- 1 root video 226,   0 Aug  6 12:05 /dev/dri/card0
crw-rw---- 1 root video 226, 128 Aug  6 12:05 /dev/dri/renderD128

And I also see the entry in the lxc.conf file.

Can you install the strace package and run strace -f nvidia-smi inside the container?
The output may show what’s going on.

Here’s the output:

$ strace -f nvidia-smi
execve("/usr/bin/nvidia-smi", ["nvidia-smi"], 0x7ffd19eabfd8 /* 21 vars */) = 0
brk(NULL)                               = 0x8fe000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f784f145000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f784f143000
mmap(NULL, 2221184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784ed05000
mprotect(0x7f784ed1f000, 2093056, PROT_NONE) = 0
mmap(0x7f784ef1e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f784ef1e000
mmap(0x7f784ef20000, 13440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784ef20000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14560, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784eb01000
mprotect(0x7f784eb04000, 2093056, PROT_NONE) = 0
mmap(0x7f784ed03000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f784ed03000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784e710000
mprotect(0x7f784e8f7000, 2097152, PROT_NONE) = 0
mmap(0x7f784eaf7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f784eaf7000
mmap(0x7f784eafd000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784eafd000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31680, ...}) = 0
mmap(NULL, 2128864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784e508000
mprotect(0x7f784e50f000, 2093056, PROT_NONE) = 0
mmap(0x7f784e70e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f784e70e000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f784f141000
arch_prctl(ARCH_SET_FS, 0x7f784f141b80) = 0
mprotect(0x7f784eaf7000, 16384, PROT_READ) = 0
mprotect(0x7f784ef1e000, 4096, PROT_READ) = 0
mprotect(0x7f784e70e000, 4096, PROT_READ) = 0
mprotect(0x7f784ed03000, 4096, PROT_READ) = 0
mprotect(0x7f784f14b000, 4096, PROT_READ) = 0
munmap(0x7f784f145000, 23230)           = 0
set_tid_address(0x7f784f141e50)         = 546
set_robust_list(0x7f784f141e60, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f784ed0acb0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f784ed17890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f784ed0ad50, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f784ed17890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
futex(0x7f784ed040c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(NULL)                               = 0x8fe000
brk(0x91f000)                           = 0x91f000
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f784f145000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\355\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1528376, ...}) = 0
mmap(NULL, 6352680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784def9000
mprotect(0x7f784e058000, 2093056, PROT_NONE) = 0
mmap(0x7f784e257000, 94208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15e000) = 0x7f784e257000
mmap(0x7f784e26e000, 2727720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784e26e000
close(3)                                = 0
munmap(0x7f784f145000, 23230)           = 0
getpid()                                = 546
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "binfmt_misc 24576 1 - Live 0x000"..., 1024) = 1024
read(3, "x0000000000000000\nkvm_intel 2375"..., 1024) = 1024
read(3, "000000000000000\nsnd_seq_midi 204"..., 1024) = 1024
read(3, "84 1 asus_wmi, Live 0x0000000000"..., 1024) = 1024
read(3, "0000000\nnf_log_common 16384 2 nf"..., 1024) = 1024
read(3, "00000000000\nnf_defrag_ipv4 16384"..., 1024) = 1024
read(3, " 77824 2 zfs,zcommon, Live 0x000"..., 1024) = 1024
close(3)                                = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(3, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(3)                                = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffd61ce6ac0) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "8000000\n", 99)                = 8
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xca, 0x4), 0x7f784e506cc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0xa00), 0x7f784e506d20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffd61ce6be0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(4, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(4)                                = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
getpid()                                = 546
fstat(1, {st_mode=S_IFREG|0644, st_size=7522, ...}) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce9210) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffd61ce9240) = 0
close(3)                                = 0
write(1, "No devices were found\n", 22No devices were found
) = 22
exit_group(6)                           = ?
+++ exited with 6 +++

what happens if you run with sudo ?

Same thing, No devices were found.

and you get this also in the strace ?

Unfortunately I never did figure it out. I ended up passing the GPU to a KVM virtual machine instead, which actually did work.

1 Like

Yes, same line when I try as root:

openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)

Here the full strace as root:

execve("/usr/bin/nvidia-smi", ["nvidia-smi"], 0x7fffe172dda8 /* 21 vars */) = 0
brk(NULL)                               = 0x102e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f24d63be000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f24d63bc000
mmap(NULL, 2221184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5f7e000
mprotect(0x7f24d5f98000, 2093056, PROT_NONE) = 0
mmap(0x7f24d6197000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f24d6197000
mmap(0x7f24d6199000, 13440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d6199000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14560, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5d7a000
mprotect(0x7f24d5d7d000, 2093056, PROT_NONE) = 0
mmap(0x7f24d5f7c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f24d5f7c000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5989000
mprotect(0x7f24d5b70000, 2097152, PROT_NONE) = 0
mmap(0x7f24d5d70000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f24d5d70000
mmap(0x7f24d5d76000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d5d76000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31680, ...}) = 0
mmap(NULL, 2128864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5781000
mprotect(0x7f24d5788000, 2093056, PROT_NONE) = 0
mmap(0x7f24d5987000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f24d5987000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f24d63ba000
arch_prctl(ARCH_SET_FS, 0x7f24d63bab80) = 0
mprotect(0x7f24d5d70000, 16384, PROT_READ) = 0
mprotect(0x7f24d6197000, 4096, PROT_READ) = 0
mprotect(0x7f24d5987000, 4096, PROT_READ) = 0
mprotect(0x7f24d5f7c000, 4096, PROT_READ) = 0
mprotect(0x7f24d63c4000, 4096, PROT_READ) = 0
munmap(0x7f24d63be000, 23230)           = 0
set_tid_address(0x7f24d63bae50)         = 430
set_robust_list(0x7f24d63bae60, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f24d5f83cb0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f24d5f90890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f24d5f83d50, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f24d5f90890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
futex(0x7f24d5f7d0c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(NULL)                               = 0x102e000
brk(0x104f000)                          = 0x104f000
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f24d63be000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\355\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1528376, ...}) = 0
mmap(NULL, 6352680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5172000
mprotect(0x7f24d52d1000, 2093056, PROT_NONE) = 0
mmap(0x7f24d54d0000, 94208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15e000) = 0x7f24d54d0000
mmap(0x7f24d54e7000, 2727720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d54e7000
close(3)                                = 0
munmap(0x7f24d63be000, 23230)           = 0
getpid()                                = 430
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "unix_diag 16384 0 - Live 0x00000"..., 1024) = 1024
read(3, "0000000000000000\nmac80211 815104"..., 1024) = 1024
read(3, ", Live 0x0000000000000000\nbtrtl "..., 1024) = 1024
read(3, "idia_uvm 798720 0 - Live 0x00000"..., 1024) = 1024
read(3, "0000000\nnf_log_common 16384 2 nf"..., 1024) = 1024
read(3, "ntrack_ftp, Live 0x0000000000000"..., 1024) = 1024
read(3, " 77824 2 zfs,zcommon, Live 0x000"..., 1024) = 1024
close(3)                                = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(3, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(3)                                = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffd734dc400) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "8000000\n", 99)                = 8
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xca, 0x4), 0x7f24d577fcc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0xa00), 0x7f24d577fd20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffd734dc520) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(4, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(4)                                = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
getpid()                                = 430
fstat(1, {st_mode=S_IFREG|0644, st_size=7524, ...}) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734deb50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffd734deb80) = 0
close(3)                                = 0
write(1, "No devices were found\n", 22No devices were found
) = 22
exit_group(6)                           = ?
+++ exited with 6 +++

Ah, we’ve seen that before on some systems where /dev/nvidiaX on the host isn’t 0666, is that the case here?

The permissions on an Ubuntu system look like:

root@vm10:~# ls -lh /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Aug  8 15:38 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Aug  8 15:38 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Aug  8 15:38 /dev/nvidiactl
crw-rw-rw- 1 root root 242,   0 Aug  8 15:38 /dev/nvidia-uvm
crw-rw-rw- 1 root root 242,   1 Aug  8 15:38 /dev/nvidia-uvm-tools

On my Ubuntu 18.04 host:

$ ll /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Aug  9 09:53 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Aug  9 09:53 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Aug  9 09:53 /dev/nvidia-modeset
crw-rw-rw- 1 root root 235,   0 Aug  9 09:53 /dev/nvidia-uvm
crw-rw-rw- 1 root root 235,   1 Aug  9 09:53 /dev/nvidia-uvm-tools

In the container:

$ ll /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 235,   0 Aug  9 07:53 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 235,   1 Aug  9 07:53 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   video   195,   0 Aug  9 07:53 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Aug  9 07:53 /dev/nvidiactl