Background
I’m attempting to containerize my development environment, for ROS development.
For the past year I’ve been using Docker for this task, but I don’t like the layered file-system
approach that docker utilizes. Since I want my container instance to have a longer life, I feel that using LXC for an OS level containerization is the best solution for me.
I’m hoping to have one “ROS-TOOLS” container, and a series of “Project Containers”, each project container will have a catkin workspace in it, while ROS-TOOLS will have mapviz, rviz, roswtf, and gazebo. This will minimize any rosdep conflicts that may arise between my projects.
Some ROS tools require GPU acceleration to function, like viewing lidar data in rviz…
While I’ve been successful in getting the GPU pass-through to work, and getting a cuda program to execute correctly. I’ve been unable to get OpenGL to work…
System Information
OS: Linux Mint 18.3 Sylvia
LXC Version: Client version: 3.18, Server version: 3.18
Which lxc: /snap/bin/lxc
How I’m initializing my container instance
lxc launch --profile default --profile gui --profile nvidia ubuntu:16.04 gui
Profiles
-
default
$ lxc profile show default config: {} description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk name: default used_by: - /1.0/containers/gui
-
gui
$ lxc profile show gui config: environment.DISPLAY: :0 raw.idmap: both 1000 1000 user.user-data: | #cloud-config package_upgrade: true packages: - x11-apps - mesa-utils description: Enables X forwarding to host devices: X0: path: /tmp/.X11-unix source: /tmp/.X11-unix type: disk name: gui used_by: - /1.0/containers/gui
-
nvidia
$ lxc profile show nvidia config: nvidia.driver.capabilities: graphics, compute, display, utility, video nvidia.runtime: "true" description: Enables GPU Pass-through for container devices: Quadro-M100M: pci: "0000:01:00.0" type: gpu name: nvidia used_by: - /1.0/containers/gui
Permissions
$ cat /etc/subuid
avondollen:1000000:1000
avondollen:100000:65536
avondollen:1000:1
lxd:231072:65536
root:1000:1
root:231072:65536
$ cat /etc/subgid
avondollen:1000000:1000
avondollen:100000:65536
avondollen:1000:1
lxd:231072:65536
root:1000:1
root:231072:65536
What Works
$ lxc exec gui -- nvidia-smi
Fri Nov 22 21:30:39 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M1000M Off | 00000000:01:00.0 On | N/A |
| N/A 36C P8 N/A / N/A | 742MiB / 2002MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
What Doesn’t Work
$ lxc exec gui -- sudo --login --user ubuntu glxinfo -B
name of display: :0
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Error: couldn't find RGB GLX visual or fbconfig
Investigation…
Okay so we know that the Nvidia driver is present since ‘nvidia-smi’ is working.
Let’s take a look to see if the Nvidia OpenGL library is being properly linked.
avondollen@Host ~ $ find /usr -iname "*libGL.so*" -exec ls -l -- {} +
lrwxrwxrwx 1 root root 10 Sep 25 18:53 /usr/lib32/nvidia-418/libGL.so -> libGL.so.1
lrwxrwxrwx 1 root root 18 Sep 25 18:53 /usr/lib32/nvidia-418/libGL.so.1 -> libGL.so.418.87.01
-rw-r--r-- 1 root root 1275664 Sep 25 01:36 /usr/lib32/nvidia-418/libGL.so.418.87.01
lrwxrwxrwx 1 root root 14 Jun 14 2018 /usr/lib/i386-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
-rw-r--r-- 1 root root 457256 Jun 14 2018 /usr/lib/i386-linux-gnu/mesa/libGL.so.1.2.0
lrwxrwxrwx 1 root root 10 Sep 25 18:53 /usr/lib/nvidia-418/libGL.so -> libGL.so.1
lrwxrwxrwx 1 root root 18 Sep 25 18:53 /usr/lib/nvidia-418/libGL.so.1 -> libGL.so.418.87.01
-rw-r--r-- 1 root root 1275664 Sep 25 01:36 /usr/lib/nvidia-418/libGL.so.418.87.01
lrwxrwxrwx 1 root root 13 Jun 14 2018 /usr/lib/x86_64-linux-gnu/libGL.so -> mesa/libGL.so
lrwxrwxrwx 1 root root 14 Jun 14 2018 /usr/lib/x86_64-linux-gnu/mesa/libGL.so -> libGL.so.1.2.0
lrwxrwxrwx 1 root root 14 Jun 14 2018 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
-rw-r--r-- 1 root root 471680 Jun 14 2018 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
$ lxc exec gui -- find /usr -iname "*libGL.so*" -exec ls -l -- {} +
lrwxrwxrwx 1 root root 14 Jun 14 2018 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
-rw-r--r-- 1 root root 471680 Jun 14 2018 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
Hmm… I may be mistaken, but it appears that the container does not have access to Nvidia OpenGL libs. Which is odd considering that I have “graphics” flag set for nvidia.driver.capabilities…
Viewing the nvidia-container-runtime documentation is see that the “graphics” flag is for running OpenGL and Vulkan applications.
Considering that ‘compute’ and ‘utility’ flags seem to work, since nvidia-smi and cuda applications are working inside my container I feel that there my be either a bug, and or the graphics flag is just not supported in lxd as of now.
I’ll greatly appreciate any help in clearing this up.