Nvidia/opengl in the container

Hi,
I have been using so far ubuntu 18.04 as host while my containers were either 16.04 or 18.04.
To get nvidia+opengl working in the containers I had to install the same nvidia drivers I had on the host (plus the right profile).
So far, so good even though I would like to get rid of the drivers installation.

Still using 18.04 as host, lxd 3.11 (snap) I am trying to get the nvidia+opengl working by using the nvidia runtime.
I have two test containers: a 18.04 and a 16.04, both of them have

  • nvidia.runtime “true”
  • nvidia.driver.capabilities “compute, display, graphics, utility, video”

The 18.04 container works, I can run opengl applications
The 16.04 container does not work (nvidia-smi works), it seems to have only the opengles/egl libraries while the opengl are missing.

I don’t really have a clear idea if it is a configuration issue or something else, I could not find much online.
Any pointer would be helpful,
thanks.

Compare /proc/self/mountinfo in both containers, that will show you everything that nvidia-container has passed into the container.

I’d normally expect the list to be a pretty good match, if nothing is missing in there, then the problem may have to do with the libraries not being usable on 16.04 due to other libraries being at different versions.

If that’s the case, then installing the driver in the 16.04 containers may be your only obvious way out of this.

thanks for your reply.
I’ve updated the host with the an nvidia driver version compatible with both 16.04 and 18.04 (418.56), no luck.

Comparing /proc/self/mountinfo did not show any difference.

Anyway, I was wrong in my previous message:

FALSE, i got exactly the same libraries on both containers
The 16.04 container does not work (nvidia-smi works), it seems to have only the opengles/egl libraries while the opengl are missing.

this is not the case!
However, what is missing in the 16.04 container is /usr/share/lxc/hooks/nvidia.

thanks

That file being missing is perfectly fine, assuming you’re not running nested LXC containers anyway :slight_smile:

You may have some luck figuring out what’s going on by running a very simple program that uses your GPU (and fails on 16.04) under strace -f, that may show you what library can’t be found or what’s missing.

haven’t got the time yet to keep digging into this but my interest is not dead :smiley: