[ Snap: LXD 3.18 ] Difficulty running OpenGL applications in container

avondollen · November 24, 2019, 3:54pm

Now we are uncovering the situation on why I’m so confused here.

My system has two GPUs installed…
And LXD see’s both of them…

$ lxc info --resources
...
GPUs:
  Card 0:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: HD Graphics 530 (191b)
    PCI address: 0000:00:02.0
    Driver: i915 (4.15.0-70-generic)
    DRM:
      ID: 0
      Card: card0 (226:0)
      Control: controlD64 (226:0)
      Render: renderD128 (226:128)
  Card 1:
    NUMA node: 0
    Vendor: NVIDIA Corporation (10de)
    Product: GM107GLM [Quadro M1000M] (13b1)
    PCI address: 0000:01:00.0
    Driver: nvidia (418.87.01)
    DRM:
      ID: 1
      Card: card1 (226:1)
      Render: renderD129 (226:129)
    NVIDIA information:
      Architecture: 5.0
      Brand: Quadro
      Model: Quadro M1000M
      CUDA Version: 10.1
      NVRM Version: 418.87.01
      UUID: GPU-1899bb61-a5bf-8db5-d80a-63f55ac1bfc1
...

This is why inside my Nvidia profile I specify the card via PCI address.
However, on my host system, I only see /tmp/.X11-unix/X0…

$ ls -alF /tmp/.X11-unix/
total 16
drwxrwxrwt  2 root root  4096 Nov 24 09:59 ./
drwxrwxrwt 16 root root 12288 Nov 24 10:14 ../
srwxrwxrwx  1 root root     0 Nov 24 09:59 X0=

In case it matters… I am using an non open-source Nvidia driver.

nvidia-418
Version 418.87.01-0ubuntu1
NVIDIA binary driver

I decided to rebuild my container, after deleting the old one of course…

$ lxc launch --profile default --profile gui --profile nvidia ubuntu:16.04 gui

Verified that xclock still works with root, and ubuntu users.
Verified that nvidia-smi still works as well.

$ lxc exec gui -- nvidia-smi
Sun Nov 24 15:39:21 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1000M       Off  | 00000000:01:00.0  On |                  N/A |
| N/A   32C    P8    N/A /  N/A |    374MiB /  2002MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Also verified a quick cuda demo program…

$ lxc file push /usr/local/cuda-10.1/extras/demo_suite/bandwidthTest gui/root/
$ lxc exec gui /root/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro M1000M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			12112.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			12045.4

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			61677.8

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

However, when I run glxinfo -B it fails… =(

$ lxc exec gui -- glxinfo -B
name of display: :0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Error: couldn't find RGB GLX visual or fbconfig

Reading this thread gave me the idea to attempt to manually mount the Nvidia library paths…

But first let’s see what my container currently has…

Checking for GPU Drivers…

$ lxc exec gui -- ls -alF /dev | grep nvidia
crw-rw-rw-  1 nobody nogroup 195, 254 Nov 24 14:59 nvidia-modeset
crw-rw-rw-  1 nobody nogroup 236,   0 Nov 24 14:59 nvidia-uvm
crw-rw-rw-  1 root   root    195,   0 Nov 24 15:38 nvidia0
crw-rw-rw-  1 nobody nogroup 195, 255 Nov 24 14:59 nvidiactl

That’s good, but not sure if the nobody/nogroup would cause any problems…

Now let’s check the lib paths…

$ lxc exec gui -- ls -alF /usr/lib | grep nvidia

Returns nothing… =(

Looking at what’s on the host now…

avondollen@Host $ ls -alF /usr/lib | grep nvidia
-rw-r--r--   1 root root  1475856 Nov 13 16:28 libnvidia-gtk2.so.440.33.01
-rw-r--r--   1 root root  1484528 Nov 13 16:28 libnvidia-gtk3.so.440.33.01
lrwxrwxrwx   1 root root       53 Oct 31 17:42 libvdpau_nvidia.so -> /etc/alternatives/x86_64-linux-gnu_libvdpau_nvidia.so
drwxr-xr-x   2 root root     4096 Oct 31 17:42 nvidia/
drwxr-xr-x   6 root root    12288 Oct 31 17:42 nvidia-418/
drwxr-xr-x   2 root root     4096 Oct 31 17:41 nvidia-418-prime/
drwxr-xr-x   2 root root     4096 Nov 24  2017 nvidia-prime-applet/

Pretty confident that this is the source of my problem.
So I decided to try and mount the /usr/lib/nvidia-418 folder to the container similar to how we
mounted the /tmp/.X11-unix folder, including the permissions, but it did not work, just caused my Cinnamon Desktop to crash…