Docker into Incus and NVIDIA

Hello. I’m just migrated from LXD (see Migrating from LXD 6.1? - #2 by simos)

I’m not able to make docker containers into my incus container to work with my GPU.

This was discussed here: Incus, docker and NVIDIA GPU - #6 by C0rn3j and I’ve been working like this in LXD: lxd container configured like this:

I’ve recreated the Incus containers because I suffered a strange docker failure after migration (docker cant run because of some kernel modules, see the post of my migration)

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Rockylinux 9 amd64 (20240830_02:06)
  image.os: Rockylinux
  image.release: "9"
  image.requirements.cdrom_agent: "true"
  image.serial: "20240830_02:06"
  image.type: squashfs
  image.variant: default
  raw.idmap: both 1000 1000
  security.nesting: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"
  volatile.base_image: b535a4a982ade5ecbf8e2926f5704d4bc9a268f59d2965517fa95cc5adda0f1c
  volatile.cloud-init.instance-id: a5d1c5ed-0d50-4849-9d3c-d274eb1039df
  volatile.eth0.host_name: vethcb9cf0e0
  volatile.eth0.hwaddr: 00:16:3e:a3:ea:75
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":64535}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":64535},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":64535}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: dc48cdeb-04dc-4eae-8eb1-f53ea8fce22d
  volatile.uuid.generation: dc48cdeb-04dc-4eae-8eb1-f53ea8fce22d
devices:
  config:
    path: /mnt/whishper
    source: /mnt/main/docker/whishper
    type: disk
  gpu:
    id: "0"
    type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""

Installed nvidia driver into the container, and so nvidia-container-toolkit.

In the rocky container /etc/docker/daemon.json configured for the nvidia runtime too.

I need to set “no-cgroups = true”in the Nvidia docker config file./etc/nvidia-container-runtime/config.toml`

And it worked.

Now the docker containers are not able to access gpu. The strange thing for me is that I can run nvidia-smi in the incus container, AND in the docker container, and it works!

Could you see what am I doing wrong?

What’s the host operating system?


NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

The container must be the same, because driver and kernel modules versions must coincide to this to work.

I don’t really know why this is happening.

I recreated again one of the containers, installed docker, nvidia driver and container toolkit and how the app worked. But I’m unable to do it whit the other docker image.

Anyway, thank you, now one of them is working as a docker image inside the container, and the other is running directly in Incus as a container app.

A coment identifies and fixed this: Install NVIDIA KVM driver on the host machine, how to use CUDA in MIG instance - #4 by osch