Thanks @simos for adding in your response.
I installed nvidia-container-toolkit following this nvidia guide for Ubuntu: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
If I don’t add the nvidia-container-toolkit then it displays this error while running a docker container with GPUs:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
I ran a couple of more experiments as suggested, below is the outcome on both of them:
1st experiment:
Added security and nvidia parameters during launch:
lxc launch ubuntu plex -c security.nesting=true -c security.privileged=true -c nvidia.runtime=true -c nvidia.driver.capabilities=compute,utility
Got this output:
Creating plex
The local image 'ubuntu' couldn't be found, trying 'ubuntu:' instead.
Starting plex
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart plex /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/plex/lxc.conf
This failed to start during the launch itself.
2nd experiment:
Launched container with only security parameters and then added nvidia config after that as follows:
lxc launch ubuntu plex -c security.nesting=true -c security.privileged=true
lxc config set plex nvidia.runtime true
lxc config set plex nvidia.driver.capabilities compute
lxc config device add plex gpu gpu id=0
lxc exec plex nvidia-smi
On running nvidia-smi
in the LXC container, it asked me to install nvidia driver:
Command 'nvidia-smi' not found, but can be installed with:
apt install nvidia-utils-390 # version 390.138-0ubuntu0.20.04.1, or
apt install nvidia-utils-440 # version 440.100-0ubuntu0.20.04.1
apt install nvidia-utils-450 # version 450.80.02-0ubuntu0.20.04.2
apt install nvidia-utils-450-server # version 450.80.02-0ubuntu0.20.04.3
apt install nvidia-340 # version 340.108-0ubuntu2
apt install nvidia-utils-435 # version 435.21-0ubuntu7
apt install nvidia-utils-418-server # version 418.152.00-0ubuntu0.20.04.1
apt install nvidia-utils-440-server # version 440.95.01-0ubuntu0.20.04.1
apt install nvidia-utils-455 # version 455.38-0ubuntu0.20.04.1
I installed as follows in the LXC container:
sudo apt-get update
sudo apt install nvidia-utils-450
On running nvidia-smi
again, it generated the following error:
Failed to initialize NVML: Unknown Error
Then I rebooted the lxc container to make sure the driver is installed properly, it then failed to reboot and then when I run lxc start plex
, it displayed the same reboot error:
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart plex /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/plex/lxc.conf:
When I remove security privilege setting then the LXC container is able to start again. But then on running nvidia-smi
command in the LXC container, I get this error:
Failed to initialize NVML: Driver/library version mismatch
Also, docker with GPUs doesn’t work then.