Nvidia.runtime error on nixos host

Hello,
I am trying to use nvidia gpu in incus container with the nixos as the host. If i try to setup nvidia.runtume: true i get an error,

Config parsing error: Initialize LXC: The NVIDIA LXC hook couldn't be found
Press enter to open the editor again or ctrl+c to abort change

I have enabled hardware.nvidia-container-toolkit.enable in nixos.

My nvidia nix config is as follows:

{ config, pkgs, ... }:
{
  # Nvidia specific
  nixpkgs.config.allowUnfree = true;
  environment.systemPackages = with pkgs; [
    # cudaPackages_12.cudatoolkit
  ];
  # Some programs need SUID wrappers, can be configured further or are



  # REGION NVIDIA / CUDA

  # Enable OpenGL
  hardware.graphics = {
    enable = true;
  };

  hardware.graphics.enable32Bit = true;
  hardware.nvidia-container-toolkit.enable = true;


  # Load nvidia driver for Xorg and Wayland
  services.xserver.videoDrivers = [ "nvidia" ];

  # see https://nixos.wiki/wiki/Nvidia#CUDA_and_using_your_GPU_for_compute
  hardware.nvidia = {
    # Modesetting is required.
    modesetting.enable = true;

    # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
    powerManagement.enable = true;
    powerManagement.finegrained = false;

    open = false;

    # Enable the Nvidia settings menu,
          # accessible via `nvidia-settings`.
    nvidiaSettings = true;

    package = config.boot.kernelPackages.nvidiaPackages.production;
  };
  # ENDREGION

The configuration.nix has the following for incus,

 virtualisation.incus.package = pkgs.incus;
 virtualisation.incus.enable = true;
 networking.nftables.enable = true; 
 
 systemd.services.incus.path = [ pkgs.libnvidia-container ];

Any idea how to fix this?

@adamcstephens may know

Is the guest nixos, or what distro?

The guest is Ubuntu 24.04 LTS and Host NixOS 24.11

Looks like we may be missing something for supporting this on non-NixOS guests. This seems to avoid the initial error. Can you try it and report back if everything works as expected? I’ll add to the incus module if so.

systemd.services.incus.environment.INCUS_LXC_HOOK = "${config.virtualisation.incus.lxcPackage}/share/lxc/hooks";

Adding libnvidia-container to the path is unnecessary as it is already in the service path.

I’d not seen hardware.nvidia-container-toolkit before. I’d be curious to know if this is required for the incus nvidia integration to work, or not. Mind trying both?

1 Like

The hook change got me passed the error. Can that be made as a default?

The following failed though,

$incus launch images:ubuntu/24.04 c1
Launching c1

$ incus config device add c1 gpu gpu id=0
Device gpu added to c1

$ incus config set c1 nvidia.driver.capabilities=all nvidia.runtime="true"

$ incus exec c1 -- nvidia-smi
Error: Command not found

I installed nvidia-utils-550 inside the container and then nvidia-smi started to work.

My plan is to run docker inside incus container. For docker to pick the GPU i had to install nvidia-container-toolkit following this.

In addition following things were also required:

  1. fix-gpu-passthrough.service
# cat /etc/systemd/system/fix-gpu-passthrough.service 
[Unit]
Description=Creates Symlink required for LXC/Nvidia to Docker passthrough
Before=docker.service

[Service]
User=root
Group=root
ExecStart=/bin/bash -c 'mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:02:00.0'
Type=oneshot

[Install]
WantedBy=multi-user.target
  1. Fix /etc/nvidia-container-runtime/config.toml
# cat /etc/nvidia-container-runtime/config.toml
disable-require = false

[nvidia-container-cli]
environment = []
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
no-cgroups = true

[nvidia-container-runtime]
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

With these changes i am able to use GPU in a docker container inside incus container on a nixOS host.

Yeah, I’ll get the hook env var added as default.

The other two files are inside the incus container?

Yes, steps 1 and 2 were inside the container.

What surprised me was that i had to install nvidia-utils-550 inside the container to get nvidia-smi working. I thought nvidia.runtime would expose it automatically, but that wasnt the case here.

1 Like

@adamcstephens I came across this snippet to use nvidia for LXD on the internet. Can this be modified for incus?

let
  libnvidia-container = pkgs.callPackage "${inputs.nixpkgs}/pkgs/by-name/li/libnvidia-container/package.nix" {};
in {
  systemd.services.lxd = {
    environment = let
      path =
        pkgs.lib.makeBinPath
        (with pkgs; [which libnvidia-container util-linux]);
      hook =
        (
          pkgs.srcOnly {
            name = "lxc-hooks";
            src = "${pkgs.lxc}/share/lxc/hooks";
            nativeBuildInputs = [pkgs.makeWrapper];
          }
        )
        .overrideAttrs (
          oldAttrs: {
            installPhase = ''
              ${oldAttrs.installPhase}
              wrapProgram $out/nvidia --prefix PATH : ${path}
            '';
          }
        );
    in {
      LXD_LXC_HOOK = "${hook}";
    };
  };

  virtualisation.lxd = {
    enable = true;
    ui.enable = true;
    # This turns on a few sysctl settings that the LXD documentation recommends
    # for running in production.
    recommendedSysctlSettings = true;

    package = pkgs.lxd-lts.override {
      lxd-unwrapped-lts = pkgs.lxd-unwrapped-lts.overrideAttrs (
        oldAttrs: {
          postPatch = ''
            ${oldAttrs.postPatch}
            substituteInPlace lxd/instance/drivers/driver_lxc.go \
              --replace "nvidia-container-cli" "${libnvidia-container}/bin/nvidia-container-cli"
          '';
        }
      );
    };
  }

I’m not seeing how that will improve anything. We’re already putting libnvidia-container in the path for incus.

I thought you got it working, is that not the case?

Yes, i did get it working.

However, i had to install nvidia-utils-550 inside the container to get nvidia-smi. This video from @stgraber shows that with just nvidia.runtime set we should have nvidia-smi inside the container. So i guess incus on NixOS is not setting it up correctly?

Not sure if this is the reason.

I’m not sure where stgraber got nvidia-smi from, but using the nixos host nvidia-smi isn’t going to work as desired even if you copied it:

✗ ldd $(which nvidia-smi)
	linux-vdso.so.1 (0x00007f394ddcc000)
	libpthread.so.0 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libpthread.so.0 (0x00007f394ddc1000)
	libm.so.6 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libm.so.6 (0x00007f394dcda000)
	libdl.so.2 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libdl.so.2 (0x00007f394dcd5000)
	libc.so.6 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libc.so.6 (0x00007f394dada000)
	librt.so.1 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/librt.so.1 (0x00007f394dad5000)
	/nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib64/ld-linux-x86-64.so.2 (0x00007f394ddce000)

@stgraber How does nvidia-smi gets exposed in the container?

It’s bind-mounted into place by the nvidia LXC hook.

@adamcstephens See @stgraber based above, how can we make the nvidia LXC hook expose nvidia-smi in NixOS?

I copied nvidia-smi from host to the container but it doesnt work even though the dependencies seem to be satisfied.

$ incus file push $(which nvidia-smi) c1/root/
$ incus exec c1 -- ldd /root/nvidia-smi
        linux-vdso.so.1 (0x00007fc327ffe000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc327ff0000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc327f07000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc327f02000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc327cee000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc327ce9000)
        /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fc328000000)

$ incus exec c1 -- /root/nvidia-smi
Error: Command not found

Ahh, I only looked at incus and didn’t see anything about nvidia-smi. We can try patching or wrapping the LXC hook.

Command not found can have multiple meanings. One of which is missing dynamic libraries. If you obtain a shell/exec-bash do you get the same error when trying to run the manually copied nvidia-smi?

This is what i get,

$ incus exec c1 bash
root@c1:~# /root/nvidia-smi 
bash: /root/nvidia-smi: cannot execute: required file not found

root@c1:~# ldd /root/nvidia-smi 
        linux-vdso.so.1 (0x00007f029438c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f029437e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0294295000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0294290000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f029407c000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0294077000)
        /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f029438e000)

root@c1:~# file /root/nvidia-smi 
/root/nvidia-smi: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=ee25fbd45a994e3ca42bf4186574808865915235, stripped

Then I strongly suspect that even if we do fix bind mounting nvidia-smi into the container that it still won’t work. Could be a glibc incompatibility or something. The downside of dynamically linked libraries.

Is this unique to NixOS? I had no such issue with ArchLinux as the host and Ubuntu as the container.

As you can see from your ldd where it’s looking for /nix/store..., we do linking differently on NixOS, yes. I’ll still try and fix the hook so it can find nvidia-smi anyway, but I’m not hopeful it’ll do what you want.

But this is only one minor part of the integration. Does the GPU work as intended, besides the missing nvidia-smi cli tool?