Nvidia.runtime error on nixos host

Great. The GPU works fine after following the steps mentioned previously.

1 Like

libnvidia-container hardcodes an expectation that nvidia-smi is in /usr/bin, which is not a valid NixOS assumption. There have been some tweaks in our libnvidia-container package recently, but I’m not sure yet if they’ll fix the problem of libnvidia-container failing to find the binaries. I’ll check back in when I can confirm either way, but the changes made will unlikely be backported to stable 24.11.

@stgraber one thing I noticed when debugging this is that the nvidia hook was failing to create /var/lib/incus/storage-pools/default/containers/noble-molly/hook, when I created it and made the permissions wide open, I notice that the hook is running as the container’s root UID and not the host’s. This prevents libnvidia-container from writing its log file.

Have you seen this before?

I suspect that’s normal, the hook was written for LXC and so expects a path like /var/lib/lxc/NAME where it can have write access.

Under Incus we’ve tightened permissions a fair bit more so that’s causing this issue.
Is that fatal though or just prevents logging?

It only prevents logging from what I’ve seen. The logging was helpful for some of the troubleshooting I’m doing, but I can just mkdir/chown during that.

@stgraber @adamcstephens I am trying to setup a container on another host. If i specify nvidia.runtime: "true" container doesnt start.

$incus start dockerblr
Error: Failed to run: /nix/store/2ypj6mwrs14wzwf18avqx0nm5n8r41vg-incus-6.11.0/bin/incusd forkstart dockerblr /var/lib/incus/containers /run/incus/dockerblr/lxc.conf: exit status 1
Try `incus info --show-log dockerblr` for more info

$incus info --show-log dockerblr
Error: Invalid PID 'ļæ½'

My incus is setup as following,

  #incus
  virtualisation.incus.package = pkgs.incus;
  virtualisation.incus.enable = true;
  systemd.services.incus.environment.INCUS_LXC_HOOK =
    "${config.virtualisation.incus.lxcPackage}/share/lxc/hooks";

Once i remove nvidia.runtime the container starts up fine.

Sorry, I isn’t have the bandwidth to look into this further right now. I don’t use this feature and it’s difficult or impossible for us to write NixOS tests for given the hardware requirement.

I’d invite you to file an issue on the nixpkgs repo to track the problem, preferably with any more detail you can provide. Unfortunately, unless you’re willing/able to do the deep investigation yourself, I suspect little progress will be made.