I can’t create right nvidia docker in lxd. Or what config should I write?
I have config security.nesting: "true"
, nvidia.runtime: "true"
and
gpu:
type: gpu
But I can’t run nvidia docker as expected.
It shows
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
And I can run hello-world
docker as expected.
There are some other questions in web. linuxcontainer , linuxcontainer2 , nvidia-docker , nvidia-docker2 .
friki67
(Friki67)
July 7, 2023, 12:08pm
2
Hello. Have you installer the nvidia-container runtime in your host? Say, nvidia-smi is working in your host? And into the container?
nvidia-smi is normal in host and container. How to install nvidia-container runtime? What package should I install in host or container?
friki67
(Friki67)
July 7, 2023, 2:21pm
4
If nvidia-smi works in container it is installed in host.
I really had a hard time trying to get Nvidia working inside docker in a LXC container.
I changed my focus and installed the app directly in the LXC. In my case they where Jellyfin and Plex, so they where easy to set up.
Good luck.
Not sure if this helps but I got stuck forever trying to get Nvidia docker to run inside a non-privileged lxc, the fix for me was to change set “no-cgroups = true” in the Nvidia docker config file. /etc/nvidia-container-runtime/config.toml
I’m not sure why this information is so hard to find online but hope it helps.
1 Like
I tried it, but it didn’t work. so sad
Well I’m stumped, I built a brand new lxc container to make sure it worked for me could you share your /etc/nvidia-container-runtime/config.toml and lxc config so I can see if there are any obvious differences.
lxd version is 5.15
docker version
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:35:18 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:35:18 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
/etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
config:
boot.autostart: "true"
nvidia.runtime: "true"
security.nesting: "true"
description: Default LXD profile
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
gpu:
type: gpu
now error is
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0:
error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’ nvidia-container-cli: mount error: stat failed: /proc/driver/nvidia/gpus/0000:04:00.0: no such file or directory: unknown.
And nvidia-smi
is normal in lxc and host.
itzsimpl
(Iztok Lebar Bajec)
August 7, 2023, 1:04pm
9
I solved the issue by installing Podman which supports the CDI interface, as per Can rootless and rooted both docker use GPU without changing no-cgroups? · Issue #85 · NVIDIA/nvidia-container-toolkit · GitHub so should Docker in release 25.
By also installing the docker-compose-plugin
and setting the DOCKER_HOST variable to
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock
as per https://github.com/containers/podman/blob/main/docs/tutorials/socket_activation.md , one can use podman commands and docker compose commands (backed by podman).
Running a Podman command the GPUs will get mounted in the nested container as they should.
$ podman run --rm --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: Quadro RTX 6000 (UUID: GPU-a155687d-84cf-fa3b-c8dd-63861c195f9d)
Sadly when using the standard approach to enable a GPU in docker-compose.yml
they still don’t get passed through.