trumee
January 12, 2025, 4:43am
1
Hello,
I am trying to use nvidia gpu in incus container with the nixos as the host. If i try to setup nvidia.runtume: true
i get an error,
Config parsing error: Initialize LXC: The NVIDIA LXC hook couldn't be found
Press enter to open the editor again or ctrl+c to abort change
I have enabled hardware.nvidia-container-toolkit.enable in nixos.
My nvidia nix config is as follows:
{ config, pkgs, ... }:
{
# Nvidia specific
nixpkgs.config.allowUnfree = true;
environment.systemPackages = with pkgs; [
# cudaPackages_12.cudatoolkit
];
# Some programs need SUID wrappers, can be configured further or are
# REGION NVIDIA / CUDA
# Enable OpenGL
hardware.graphics = {
enable = true;
};
hardware.graphics.enable32Bit = true;
hardware.nvidia-container-toolkit.enable = true;
# Load nvidia driver for Xorg and Wayland
services.xserver.videoDrivers = [ "nvidia" ];
# see https://nixos.wiki/wiki/Nvidia#CUDA_and_using_your_GPU_for_compute
hardware.nvidia = {
# Modesetting is required.
modesetting.enable = true;
# Nvidia power management. Experimental, and can cause sleep/suspend to fail.
powerManagement.enable = true;
powerManagement.finegrained = false;
open = false;
# Enable the Nvidia settings menu,
# accessible via `nvidia-settings`.
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.production;
};
# ENDREGION
The configuration.nix has the following for incus,
virtualisation.incus.package = pkgs.incus;
virtualisation.incus.enable = true;
networking.nftables.enable = true;
systemd.services.incus.path = [ pkgs.libnvidia-container ];
Any idea how to fix this?
Is the guest nixos, or what distro?
trumee
January 12, 2025, 3:50pm
4
The guest is Ubuntu 24.04 LTS and Host NixOS 24.11
Looks like we may be missing something for supporting this on non-NixOS guests. This seems to avoid the initial error. Can you try it and report back if everything works as expected? I’ll add to the incus module if so.
systemd.services.incus.environment.INCUS_LXC_HOOK = "${config.virtualisation.incus.lxcPackage}/share/lxc/hooks";
Adding libnvidia-container
to the path is unnecessary as it is already in the service path .
I’d not seen hardware.nvidia-container-toolkit
before. I’d be curious to know if this is required for the incus nvidia integration to work, or not. Mind trying both?
1 Like
trumee
January 13, 2025, 4:25am
7
The hook change got me passed the error. Can that be made as a default?
The following failed though,
$incus launch images:ubuntu/24.04 c1
Launching c1
$ incus config device add c1 gpu gpu id=0
Device gpu added to c1
$ incus config set c1 nvidia.driver.capabilities=all nvidia.runtime="true"
$ incus exec c1 -- nvidia-smi
Error: Command not found
I installed nvidia-utils-550
inside the container and then nvidia-smi
started to work.
My plan is to run docker inside incus container. For docker to pick the GPU i had to install nvidia-container-toolkit
following this .
In addition following things were also required:
fix-gpu-passthrough.service
# cat /etc/systemd/system/fix-gpu-passthrough.service
[Unit]
Description=Creates Symlink required for LXC/Nvidia to Docker passthrough
Before=docker.service
[Service]
User=root
Group=root
ExecStart=/bin/bash -c 'mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:02:00.0'
Type=oneshot
[Install]
WantedBy=multi-user.target
Fix /etc/nvidia-container-runtime/config.toml
# cat /etc/nvidia-container-runtime/config.toml
disable-require = false
[nvidia-container-cli]
environment = []
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
no-cgroups = true
[nvidia-container-runtime]
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
With these changes i am able to use GPU in a docker container inside incus container on a nixOS host.
Yeah, I’ll get the hook env var added as default.
The other two files are inside the incus container?
trumee
January 14, 2025, 3:54am
9
Yes, steps 1 and 2 were inside the container.
What surprised me was that i had to install nvidia-utils-550
inside the container to get nvidia-smi
working. I thought nvidia.runtime
would expose it automatically, but that wasnt the case here.
1 Like
trumee
January 21, 2025, 3:47am
10
@adamcstephens I came across this snippet to use nvidia for LXD on the internet. Can this be modified for incus?
let
libnvidia-container = pkgs.callPackage "${inputs.nixpkgs}/pkgs/by-name/li/libnvidia-container/package.nix" {};
in {
systemd.services.lxd = {
environment = let
path =
pkgs.lib.makeBinPath
(with pkgs; [which libnvidia-container util-linux]);
hook =
(
pkgs.srcOnly {
name = "lxc-hooks";
src = "${pkgs.lxc}/share/lxc/hooks";
nativeBuildInputs = [pkgs.makeWrapper];
}
)
.overrideAttrs (
oldAttrs: {
installPhase = ''
${oldAttrs.installPhase}
wrapProgram $out/nvidia --prefix PATH : ${path}
'';
}
);
in {
LXD_LXC_HOOK = "${hook}";
};
};
virtualisation.lxd = {
enable = true;
ui.enable = true;
# This turns on a few sysctl settings that the LXD documentation recommends
# for running in production.
recommendedSysctlSettings = true;
package = pkgs.lxd-lts.override {
lxd-unwrapped-lts = pkgs.lxd-unwrapped-lts.overrideAttrs (
oldAttrs: {
postPatch = ''
${oldAttrs.postPatch}
substituteInPlace lxd/instance/drivers/driver_lxc.go \
--replace "nvidia-container-cli" "${libnvidia-container}/bin/nvidia-container-cli"
'';
}
);
};
}
I’m not seeing how that will improve anything. We’re already putting libnvidia-container in the path for incus.
I thought you got it working, is that not the case?
trumee
January 21, 2025, 4:51pm
12
Yes, i did get it working.
However, i had to install nvidia-utils-550
inside the container to get nvidia-smi
. This video from @stgraber shows that with just nvidia.runtime
set we should have nvidia-smi
inside the container. So i guess incus on NixOS is not setting it up correctly?
Not sure if this is the reason.
I’m not sure where stgraber got nvidia-smi from, but using the nixos host nvidia-smi isn’t going to work as desired even if you copied it:
✗ ldd $(which nvidia-smi)
linux-vdso.so.1 (0x00007f394ddcc000)
libpthread.so.0 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libpthread.so.0 (0x00007f394ddc1000)
libm.so.6 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libm.so.6 (0x00007f394dcda000)
libdl.so.2 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libdl.so.2 (0x00007f394dcd5000)
libc.so.6 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/libc.so.6 (0x00007f394dada000)
librt.so.1 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/librt.so.1 (0x00007f394dad5000)
/nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib64/ld-linux-x86-64.so.2 (0x00007f394ddce000)
trumee
January 29, 2025, 4:11am
14
@stgraber How does nvidia-smi
gets exposed in the container?
stgraber
(Stéphane Graber)
January 29, 2025, 3:59pm
15
It’s bind-mounted into place by the nvidia LXC hook.
trumee
January 29, 2025, 5:16pm
16
@adamcstephens See @stgraber based above, how can we make the nvidia LXC hook expose nvidia-smi in NixOS?
I copied nvidia-smi from host to the container but it doesnt work even though the dependencies seem to be satisfied.
$ incus file push $(which nvidia-smi) c1/root/
$ incus exec c1 -- ldd /root/nvidia-smi
linux-vdso.so.1 (0x00007fc327ffe000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc327ff0000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc327f07000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc327f02000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc327cee000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc327ce9000)
/nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fc328000000)
$ incus exec c1 -- /root/nvidia-smi
Error: Command not found
Ahh, I only looked at incus and didn’t see anything about nvidia-smi. We can try patching or wrapping the LXC hook.
Command not found can have multiple meanings. One of which is missing dynamic libraries. If you obtain a shell/exec-bash do you get the same error when trying to run the manually copied nvidia-smi?
trumee
January 29, 2025, 6:45pm
18
This is what i get,
$ incus exec c1 bash
root@c1:~# /root/nvidia-smi
bash: /root/nvidia-smi: cannot execute: required file not found
root@c1:~# ldd /root/nvidia-smi
linux-vdso.so.1 (0x00007f029438c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f029437e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0294295000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0294290000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f029407c000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0294077000)
/nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f029438e000)
root@c1:~# file /root/nvidia-smi
/root/nvidia-smi: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/nqb2ns2d1lahnd5ncwmn6k84qfd7vx2k-glibc-2.40-36/lib/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=ee25fbd45a994e3ca42bf4186574808865915235, stripped
Then I strongly suspect that even if we do fix bind mounting nvidia-smi into the container that it still won’t work. Could be a glibc incompatibility or something. The downside of dynamically linked libraries.
trumee
January 29, 2025, 6:52pm
20
Is this unique to NixOS? I had no such issue with ArchLinux as the host and Ubuntu as the container.
As you can see from your ldd
where it’s looking for /nix/store...
, we do linking differently on NixOS, yes. I’ll still try and fix the hook so it can find nvidia-smi anyway, but I’m not hopeful it’ll do what you want.
But this is only one minor part of the integration. Does the GPU work as intended, besides the missing nvidia-smi cli tool?