Now we are uncovering the situation on why I’m so confused here.
My system has two GPUs installed…
And LXD see’s both of them…
$ lxc info --resources
...
GPUs:
Card 0:
NUMA node: 0
Vendor: Intel Corporation (8086)
Product: HD Graphics 530 (191b)
PCI address: 0000:00:02.0
Driver: i915 (4.15.0-70-generic)
DRM:
ID: 0
Card: card0 (226:0)
Control: controlD64 (226:0)
Render: renderD128 (226:128)
Card 1:
NUMA node: 0
Vendor: NVIDIA Corporation (10de)
Product: GM107GLM [Quadro M1000M] (13b1)
PCI address: 0000:01:00.0
Driver: nvidia (418.87.01)
DRM:
ID: 1
Card: card1 (226:1)
Render: renderD129 (226:129)
NVIDIA information:
Architecture: 5.0
Brand: Quadro
Model: Quadro M1000M
CUDA Version: 10.1
NVRM Version: 418.87.01
UUID: GPU-1899bb61-a5bf-8db5-d80a-63f55ac1bfc1
...
This is why inside my Nvidia profile I specify the card via PCI address.
However, on my host system, I only see /tmp/.X11-unix/X0…
$ ls -alF /tmp/.X11-unix/
total 16
drwxrwxrwt 2 root root 4096 Nov 24 09:59 ./
drwxrwxrwt 16 root root 12288 Nov 24 10:14 ../
srwxrwxrwx 1 root root 0 Nov 24 09:59 X0=
In case it matters… I am using an non open-source Nvidia driver.
nvidia-418
Version 418.87.01-0ubuntu1
NVIDIA binary driver
I decided to rebuild my container, after deleting the old one of course…
$ lxc launch --profile default --profile gui --profile nvidia ubuntu:16.04 gui
Verified that xclock still works with root, and ubuntu users.
Verified that nvidia-smi still works as well.
$ lxc exec gui -- nvidia-smi
Sun Nov 24 15:39:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M1000M Off | 00000000:01:00.0 On | N/A |
| N/A 32C P8 N/A / N/A | 374MiB / 2002MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Also verified a quick cuda demo program…
$ lxc file push /usr/local/cuda-10.1/extras/demo_suite/bandwidthTest gui/root/
$ lxc exec gui /root/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Quadro M1000M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12112.9
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12045.4
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 61677.8
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
However, when I run glxinfo -B it fails… =(
$ lxc exec gui -- glxinfo -B
name of display: :0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Error: couldn't find RGB GLX visual or fbconfig
Reading this thread gave me the idea to attempt to manually mount the Nvidia library paths…
But first let’s see what my container currently has…
Checking for GPU Drivers…
$ lxc exec gui -- ls -alF /dev | grep nvidia
crw-rw-rw- 1 nobody nogroup 195, 254 Nov 24 14:59 nvidia-modeset
crw-rw-rw- 1 nobody nogroup 236, 0 Nov 24 14:59 nvidia-uvm
crw-rw-rw- 1 root root 195, 0 Nov 24 15:38 nvidia0
crw-rw-rw- 1 nobody nogroup 195, 255 Nov 24 14:59 nvidiactl
That’s good, but not sure if the nobody/nogroup would cause any problems…
Now let’s check the lib paths…
$ lxc exec gui -- ls -alF /usr/lib | grep nvidia
Returns nothing… =(
Looking at what’s on the host now…
avondollen@Host $ ls -alF /usr/lib | grep nvidia
-rw-r--r-- 1 root root 1475856 Nov 13 16:28 libnvidia-gtk2.so.440.33.01
-rw-r--r-- 1 root root 1484528 Nov 13 16:28 libnvidia-gtk3.so.440.33.01
lrwxrwxrwx 1 root root 53 Oct 31 17:42 libvdpau_nvidia.so -> /etc/alternatives/x86_64-linux-gnu_libvdpau_nvidia.so
drwxr-xr-x 2 root root 4096 Oct 31 17:42 nvidia/
drwxr-xr-x 6 root root 12288 Oct 31 17:42 nvidia-418/
drwxr-xr-x 2 root root 4096 Oct 31 17:41 nvidia-418-prime/
drwxr-xr-x 2 root root 4096 Nov 24 2017 nvidia-prime-applet/
Pretty confident that this is the source of my problem.
So I decided to try and mount the /usr/lib/nvidia-418 folder to the container similar to how we
mounted the /tmp/.X11-unix folder, including the permissions, but it did not work, just caused my Cinnamon Desktop to crash…