How to build nvidia docker inside lxd(lxc container)

I can’t create right nvidia docker in lxd. Or what config should I write?

I have config security.nesting: "true" , nvidia.runtime: "true" and

gpu:
  type: gpu

But I can’t run nvidia docker as expected.
It shows

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

And I can run hello-world docker as expected.
There are some other questions in web. linuxcontainer, linuxcontainer2, nvidia-docker, nvidia-docker2.

Hello. Have you installer the nvidia-container runtime in your host? Say, nvidia-smi is working in your host? And into the container?

nvidia-smi is normal in host and container. How to install nvidia-container runtime? What package should I install in host or container?

If nvidia-smi works in container it is installed in host.

I really had a hard time trying to get Nvidia working inside docker in a LXC container.

I changed my focus and installed the app directly in the LXC. In my case they where Jellyfin and Plex, so they where easy to set up.

Good luck.

Not sure if this helps but I got stuck forever trying to get Nvidia docker to run inside a non-privileged lxc, the fix for me was to change set “no-cgroups = true” in the Nvidia docker config file. /etc/nvidia-container-runtime/config.toml

I’m not sure why this information is so hard to find online but hope it helps.

1 Like

I tried it, but it didn’t work. so sad

Well I’m stumped, I built a brand new lxc container to make sure it worked for me could you share your /etc/nvidia-container-runtime/config.toml and lxc config so I can see if there are any obvious differences.

  • lxd version is 5.15

  • docker version

Client: Docker Engine - Community
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:35:18 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:18 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • /etc/nvidia-container-runtime/config.toml
disable-require = false

#swarm-resource = "DOCKER_RESOURCE_GPU"

#accept-nvidia-visible-devices-envvar-when-unprivileged = true

#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]

#root = "/run/nvidia/driver"

#path = "/usr/bin/nvidia-container-cli"

environment = []

#debug = "/var/log/nvidia-container-toolkit.log"

#ldcache = "/etc/ld.so.cache"

load-kmods = true

no-cgroups = true

#user = "root:video"

ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]

#debug = "/var/log/nvidia-container-runtime.log"

log-level = "info"

# Specify the runtimes to consider. This list is processed in order and the PATH

# searched for matching executables unless the entry is an absolute path.

runtimes = [

"docker-runc",

"runc",

]

mode = "auto"

[nvidia-container-runtime.modes.csv]

mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
  • lxc profile is
config:
  boot.autostart: "true"
  nvidia.runtime: "true"
  security.nesting: "true"
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  gpu:
    type: gpu

now error is

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0:
error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’ nvidia-container-cli: mount error: stat failed: /proc/driver/nvidia/gpus/0000:04:00.0: no such file or directory: unknown.

And nvidia-smi is normal in lxc and host.

I solved the issue by installing Podman which supports the CDI interface, as per Can rootless and rooted both docker use GPU without changing no-cgroups? · Issue #85 · NVIDIA/nvidia-container-toolkit · GitHub so should Docker in release 25.

By also installing the docker-compose-plugin and setting the DOCKER_HOST variable to
export DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock
as per https://github.com/containers/podman/blob/main/docs/tutorials/socket_activation.md, one can use podman commands and docker compose commands (backed by podman).

Running a Podman command the GPUs will get mounted in the nested container as they should.

$ podman run --rm --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: Quadro RTX 6000 (UUID: GPU-a155687d-84cf-fa3b-c8dd-63861c195f9d)

Sadly when using the standard approach to enable a GPU in docker-compose.yml they still don’t get passed through.

See also: