Cannot get nvidia card working in container

I’m tearing my hair out trying to solve this! It was working fine last week before I wiped the machine and reinstalled.

OS: ubuntu 18.04

nvidia-smi shows this:

mishac@host:~$ nvidia-smi
Sat Apr 20 10:44:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| 45%   28C    P0    N/A /  75W |      0MiB /  1997MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

minimal example showing it not working:

mishac@host:~$ lxc launch  ubuntu:18.04 testo                                                                                                                                          
Creating testo
Starting testo
mishac@host:~$ lxc config device add testo gpu gpu
Device gpu added to testo
mishac@host:~$ lxc config set testo nvidia.runtime true
mishac@host:~$ lxc restart testo
mishac@host:~$ lxc exec testo bash
root@testo:~# nvidia-smi
No devices were found

/dev/dri on host:

mishac@host:~$ ls -la /dev/dri
total 0
drwxr-xr-x  3 root root       100 Apr 20 10:24 .
drwxr-xr-x 24 root root      6300 Apr 20 10:24 ..
drwxr-xr-x  2 root root        80 Apr 20 10:24 by-path
crw-rw----  1 root video 226,   0 Apr 20 10:24 card0
crw-rw----  1 root video 226, 128 Apr 20 10:24 renderD128

nvidia devices on host:

mishac@host:~$ ls -la /dev/nvid*
crw-rw-rw- 1 root root 195,   0 Apr 20 10:24 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 20 10:24 /dev/nvidiactl
crw-rw-rw- 1 root root 236,   0 Apr 20 10:24 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236,   1 Apr 20 10:24 /dev/nvidia-uvm-tools

in container:

root@testo:~# ls -la /dev/dri
total 2
drwxr-xr-x 2 root root       80 Apr 20 14:50 .
drwxr-xr-x 9 root root      580 Apr 20 14:50 ..
crw-rw---- 1 root root 226,   0 Apr 20 14:50 card0
crw-rw---- 1 root root 226, 128 Apr 20 14:50 renderD128

root@testo:~# ls -la /dev/nvidi*
crw-rw-rw- 1 nobody nogroup 236,   0 Apr 20 14:24 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 236,   1 Apr 20 14:24 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   root    195,   0 Apr 20 14:50 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Apr 20 14:24 /dev/nvidiactl

I’ve tried installing/uninstalling CUDA on the container, on the host, and tried installing/uninstalling/reinstalling the nvidia-container-runtime on the host, as well as several different driver versions (390, 415, 418) as well as several kernel versions (currently on the HWE kernel, 4.18, but the same happend on the stock kernel, 4.15)

Not sure where to go from here :frowning:

Did you ever figure this out? I’m having the same problem, Ubuntu 18.04 and LXD 3.15. I just cannot get the nvidia gpu to work inside the container, whenever I run nvidia-smi it just says No devices were found

Hi!

If you get

ubuntu@mycontainer:~$ nvidia-smi 
No devices were found

then most likely the container does not see the correct GPU device. That is, there is some issue with lxc config device add mycontainer gpu gpu.

For example, you may have two GPUs, and the command (as it is) may pick the other GPU by default. In that case, you would need to specify the NVidia GPU. See how to specify the GPU id or PCI ID in LXD.

To figure out whether the GPU device (lxc config device add mycontainer gpu gpu) was added successfully, see the file /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf. There should be a line specifically for Nvidia (lxc.mount.entry = /var/snap/lxd/common/lxd/devices/mycontainer/unix.gpu.dev-nvidia0 dev/nvidia0 none bind,create=file 0 0).

Thanks for the reply. It seems to pass the correct GPU, I get the same error when I use the PCI ID. I also see the card in the container

$ ll /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 234,   0 Aug  6 12:00 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 234,   1 Aug  6 12:00 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   video   195,   0 Aug  6 12:05 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Aug  6 12:00 /dev/nvidiactl

$ ll /dev/dri/*
crw-rw---- 1 root video 226,   0 Aug  6 12:05 /dev/dri/card0
crw-rw---- 1 root video 226, 128 Aug  6 12:05 /dev/dri/renderD128

And I also see the entry in the lxc.conf file.

Can you install the strace package and run strace -f nvidia-smi inside the container?
The output may show what’s going on.

Here’s the output:

$ strace -f nvidia-smi
execve("/usr/bin/nvidia-smi", ["nvidia-smi"], 0x7ffd19eabfd8 /* 21 vars */) = 0
brk(NULL)                               = 0x8fe000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f784f145000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f784f143000
mmap(NULL, 2221184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784ed05000
mprotect(0x7f784ed1f000, 2093056, PROT_NONE) = 0
mmap(0x7f784ef1e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f784ef1e000
mmap(0x7f784ef20000, 13440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784ef20000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14560, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784eb01000
mprotect(0x7f784eb04000, 2093056, PROT_NONE) = 0
mmap(0x7f784ed03000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f784ed03000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784e710000
mprotect(0x7f784e8f7000, 2097152, PROT_NONE) = 0
mmap(0x7f784eaf7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f784eaf7000
mmap(0x7f784eafd000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784eafd000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31680, ...}) = 0
mmap(NULL, 2128864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784e508000
mprotect(0x7f784e50f000, 2093056, PROT_NONE) = 0
mmap(0x7f784e70e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f784e70e000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f784f141000
arch_prctl(ARCH_SET_FS, 0x7f784f141b80) = 0
mprotect(0x7f784eaf7000, 16384, PROT_READ) = 0
mprotect(0x7f784ef1e000, 4096, PROT_READ) = 0
mprotect(0x7f784e70e000, 4096, PROT_READ) = 0
mprotect(0x7f784ed03000, 4096, PROT_READ) = 0
mprotect(0x7f784f14b000, 4096, PROT_READ) = 0
munmap(0x7f784f145000, 23230)           = 0
set_tid_address(0x7f784f141e50)         = 546
set_robust_list(0x7f784f141e60, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f784ed0acb0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f784ed17890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f784ed0ad50, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f784ed17890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
futex(0x7f784ed040c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(NULL)                               = 0x8fe000
brk(0x91f000)                           = 0x91f000
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f784f145000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\355\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1528376, ...}) = 0
mmap(NULL, 6352680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f784def9000
mprotect(0x7f784e058000, 2093056, PROT_NONE) = 0
mmap(0x7f784e257000, 94208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15e000) = 0x7f784e257000
mmap(0x7f784e26e000, 2727720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f784e26e000
close(3)                                = 0
munmap(0x7f784f145000, 23230)           = 0
getpid()                                = 546
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "binfmt_misc 24576 1 - Live 0x000"..., 1024) = 1024
read(3, "x0000000000000000\nkvm_intel 2375"..., 1024) = 1024
read(3, "000000000000000\nsnd_seq_midi 204"..., 1024) = 1024
read(3, "84 1 asus_wmi, Live 0x0000000000"..., 1024) = 1024
read(3, "0000000\nnf_log_common 16384 2 nf"..., 1024) = 1024
read(3, "00000000000\nnf_defrag_ipv4 16384"..., 1024) = 1024
read(3, " 77824 2 zfs,zcommon, Live 0x000"..., 1024) = 1024
close(3)                                = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(3, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(3)                                = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffd61ce6ac0) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "8000000\n", 99)                = 8
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xca, 0x4), 0x7f784e506cc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0xa00), 0x7f784e506d20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffd61ce6be0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(4, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(4)                                = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce6bc0) = 0
getpid()                                = 546
fstat(1, {st_mode=S_IFREG|0644, st_size=7522, ...}) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd61ce9210) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffd61ce9240) = 0
close(3)                                = 0
write(1, "No devices were found\n", 22No devices were found
) = 22
exit_group(6)                           = ?
+++ exited with 6 +++

what happens if you run with sudo ?

Same thing, No devices were found.

and you get this also in the strace ?

Unfortunately I never did figure it out. I ended up passing the GPU to a KVM virtual machine instead, which actually did work.

1 Like

Yes, same line when I try as root:

openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)

Here the full strace as root:

execve("/usr/bin/nvidia-smi", ["nvidia-smi"], 0x7fffe172dda8 /* 21 vars */) = 0
brk(NULL)                               = 0x102e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f24d63be000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f24d63bc000
mmap(NULL, 2221184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5f7e000
mprotect(0x7f24d5f98000, 2093056, PROT_NONE) = 0
mmap(0x7f24d6197000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0x7f24d6197000
mmap(0x7f24d6199000, 13440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d6199000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14560, ...}) = 0
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5d7a000
mprotect(0x7f24d5d7d000, 2093056, PROT_NONE) = 0
mmap(0x7f24d5f7c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f24d5f7c000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5989000
mprotect(0x7f24d5b70000, 2097152, PROT_NONE) = 0
mmap(0x7f24d5d70000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f24d5d70000
mmap(0x7f24d5d76000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d5d76000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\"\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=31680, ...}) = 0
mmap(NULL, 2128864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5781000
mprotect(0x7f24d5788000, 2093056, PROT_NONE) = 0
mmap(0x7f24d5987000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f24d5987000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f24d63ba000
arch_prctl(ARCH_SET_FS, 0x7f24d63bab80) = 0
mprotect(0x7f24d5d70000, 16384, PROT_READ) = 0
mprotect(0x7f24d6197000, 4096, PROT_READ) = 0
mprotect(0x7f24d5987000, 4096, PROT_READ) = 0
mprotect(0x7f24d5f7c000, 4096, PROT_READ) = 0
mprotect(0x7f24d63c4000, 4096, PROT_READ) = 0
munmap(0x7f24d63be000, 23230)           = 0
set_tid_address(0x7f24d63bae50)         = 430
set_robust_list(0x7f24d63bae60, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f24d5f83cb0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f24d5f90890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f24d5f83d50, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f24d5f90890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
futex(0x7f24d5f7d0c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
brk(NULL)                               = 0x102e000
brk(0x104f000)                          = 0x104f000
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23230, ...}) = 0
mmap(NULL, 23230, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f24d63be000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\355\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=1528376, ...}) = 0
mmap(NULL, 6352680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f24d5172000
mprotect(0x7f24d52d1000, 2093056, PROT_NONE) = 0
mmap(0x7f24d54d0000, 94208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15e000) = 0x7f24d54d0000
mmap(0x7f24d54e7000, 2727720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f24d54e7000
close(3)                                = 0
munmap(0x7f24d63be000, 23230)           = 0
getpid()                                = 430
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "unix_diag 16384 0 - Live 0x00000"..., 1024) = 1024
read(3, "0000000000000000\nmac80211 815104"..., 1024) = 1024
read(3, ", Live 0x0000000000000000\nbtrtl "..., 1024) = 1024
read(3, "idia_uvm 798720 0 - Live 0x00000"..., 1024) = 1024
read(3, "0000000\nnf_log_common 16384 2 nf"..., 1024) = 1024
read(3, "ntrack_ftp, Live 0x0000000000000"..., 1024) = 1024
read(3, " 77824 2 zfs,zcommon, Live 0x000"..., 1024) = 1024
close(3)                                = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(3, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(3)                                = 0
openat(AT_FDCWD, "/dev/nvidiactl", O_RDWR) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd2, 0x48), 0x7ffd734dc400) = 0
openat(AT_FDCWD, "/sys/devices/system/memory/block_size_bytes", O_RDONLY) = 4
read(4, "8000000\n", 99)                = 8
close(4)                                = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd6, 0x8), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xca, 0x4), 0x7f24d577fcc0) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xc8, 0xa00), 0x7f24d577fd20) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2b, 0x20), 0x7ffd734dc520) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
openat(AT_FDCWD, "/proc/driver/nvidia/params", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=649, ...}) = 0
read(4, "Mobile: 4294967295\nResmanDebugLe"..., 4096) = 649
close(4)                                = 0
openat(AT_FDCWD, "/dev/nvidia0", O_RDWR) = -1 EACCES (Permission denied)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734dc500) = 0
getpid()                                = 430
fstat(1, {st_mode=S_IFREG|0644, st_size=7524, ...}) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x2a, 0x20), 0x7ffd734deb50) = 0
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0x29, 0x10), 0x7ffd734deb80) = 0
close(3)                                = 0
write(1, "No devices were found\n", 22No devices were found
) = 22
exit_group(6)                           = ?
+++ exited with 6 +++

Ah, we’ve seen that before on some systems where /dev/nvidiaX on the host isn’t 0666, is that the case here?

The permissions on an Ubuntu system look like:

root@vm10:~# ls -lh /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Aug  8 15:38 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Aug  8 15:38 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Aug  8 15:38 /dev/nvidiactl
crw-rw-rw- 1 root root 242,   0 Aug  8 15:38 /dev/nvidia-uvm
crw-rw-rw- 1 root root 242,   1 Aug  8 15:38 /dev/nvidia-uvm-tools

On my Ubuntu 18.04 host:

$ ll /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Aug  9 09:53 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Aug  9 09:53 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Aug  9 09:53 /dev/nvidia-modeset
crw-rw-rw- 1 root root 235,   0 Aug  9 09:53 /dev/nvidia-uvm
crw-rw-rw- 1 root root 235,   1 Aug  9 09:53 /dev/nvidia-uvm-tools

In the container:

$ ll /dev/nvidia*
crw-rw-rw- 1 nobody nogroup 235,   0 Aug  9 07:53 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 235,   1 Aug  9 07:53 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root   video   195,   0 Aug  9 07:53 /dev/nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Aug  9 07:53 /dev/nvidiactl

I dont suppose anyone ever came up with a solution for this? It’s almost a year later and I’m on a newer kernel and on LXD 3.21 and I’m still getting “no devices found” when doing nvidia-smi in the container.

I don’t have an answer as to why your install is failing but I’m able to get containers to see both GPUs on the host. The hosts are provisioned with MAAS using the 18.04 image(if that makes a diff). During the provisioning, the LXD package which comes with 18.04 is removed and the latest snap is downloaded and installed. Then lxd init is run passing in a preseed yaml file and a profile for the containers is created for the container creation step. The following script is run to install cuda.

  #!/bin/bash
  #
  # script to install cuda
  #
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
  mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
  apt install gnupg-curl
  apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
  add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
  apt-get update
  apt-get -y install cuda
  nvidia-smi > /home/ubuntu/nvidia-smi.out

The host ends up looking like:

ubuntu@lxd110h00:~$ lxc version
Client version: 3.21
Server version: 3.21
ubuntu@lxd110h00:~$ which lxc
/snap/bin/lxc
ubuntu@lxd110h00:~$ uname -a
Linux lxd110h00 4.15.0-76-generic #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@lxd110h00:~$ cat /etc/rel
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION=“Ubuntu 18.04.4 LTS”
NAME=“Ubuntu”
VERSION=“18.04.4 LTS (Bionic Beaver)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 18.04.4 LTS”
VERSION_ID=“18.04”
HOME_URL=“https://www.ubuntu.com/
SUPPORT_URL=“https://help.ubuntu.com/
BUG_REPORT_URL=“Bugs : Ubuntu
PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
ubuntu@lxd110h00:~$ nvidia-smi
Sun Feb 23 18:24:25 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208… On | 00000000:3B:00.0 Off | N/A |
| 0% 31C P8 20W / 250W | 0MiB / 11019MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 208… On | 00000000:AF:00.0 Off | N/A |
| 0% 29C P8 20W / 250W | 0MiB / 11019MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
ubuntu@lxd110h00:~$ lsmod | grep nvid
nvidia_uvm 929792 0
nvidia_drm 45056 0
nvidia_modeset 1110016 3 nvidia_drm
nvidia 19890176 33 nvidia_uvm,nvidia_modeset
drm_kms_helper 172032 2 ast,nvidia_drm
ipmi_msghandler 53248 4 ipmi_devintf,ipmi_si,nvidia,ipmi_ssif
drm 401408 5 drm_kms_helper,ast,nvidia_drm,ttm
ubuntu@lxd110h00:~$ ps xau | grep nvid
root 582 0.0 0.0 0 0 ? S Feb15 0:00 [nvidia-modeset/]
root 583 0.0 0.0 0 0 ? S Feb15 0:00 [nvidia-modeset/]
root 1499 0.0 0.0 8872 1448 ? Ss Feb15 0:01 /usr/bin/nvidia-persistenced --verbose
root 1502 0.0 0.0 0 0 ? S Feb15 1:41 [irq/255-nvidia]
root 1503 0.0 0.0 0 0 ? S Feb15 0:00 [nvidia]
root 1616 0.0 0.0 0 0 ? S Feb15 2:10 [irq/280-nvidia]
root 1617 0.0 0.0 0 0 ? S Feb15 0:00 [nvidia]
ubuntu 10486 0.0 0.0 14856 1116 pts/0 S+ 18:49 0:00 grep --color=auto nvid

The container profile used to create a Ubuntu 16.04 container looks like:

ubuntu@lxd110h00:~$ lxc profile show corr
config:
user.user-data: |
#cloud-config
package_upgrade: true
packages:
- emacs24-nox
- gnupg-curl
- ssh
timezone: UTC
runcmd:
- [wget, “https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin”]
- [mv, cuda-ubuntu1604.pin, /etc/apt/preferences.d/cuda-repository-pin-600]
- [apt-key, adv, --fetch-keys, “http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub”]
- [add-apt-repository, “deb Index of /compute/cuda/repos/ubuntu1604/x86_64 /”]
- [apt-get, update]
- [apt-get, -y, install, cuda]
description: Correlator profile
devices:
eth0:
name: eth0
nictype: bridged
parent: br2
type: nic
gpu:
type: gpu
root:
path: /
pool: local
type: disk
name: corr
used_by:

  • /1.0/instances/corr00

Note the cloud-init commands are pulling down cuda 16.04. Conatainer creation:

lxc launch ubuntu:16.04 corr00 -p corr

Monitor clout-init inside the container:

lxc exec corr00 bash
tail -f /var/log/cloud-init-output.log

when completed(takes about 700sec on my system) run:

root@corr00:~# nvidia-smi
Sun Feb 23 17:21:25 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208… Off | 00000000:3B:00.0 Off | N/A |
| 0% 31C P8 20W / 250W | 0MiB / 11019MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 208… Off | 00000000:AF:00.0 Off | N/A |
| 0% 29C P8 20W / 250W | 0MiB / 11019MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Hope this helps. It’s still WIP on my end.