Nvidia nvenc in lxd privileged container

,

Hi,

I have a running lxd container with nfs inside. nesting and privileged is true.
nvidia.runtime true is a mutual exclusive for the both settings above on true.

Can someone tell me how I get a running container with nfs and nvenc together?

Best, Matthias

You probably should be mounting NFS from the host, then just pass the resulting mount point to the container, that way you don’t need privileged which will fix the nvidia part.

Is there a alternative option to that?
I would like to keep the NFS mounts Inside the containers

You can try a FUSE implementation of NFS: GitHub - sahlberg/fuse-nfs: A FUSE module for NFSv3/4

I treied it with unpriviledged container config.

nvidia-smi is running and I have a ro nfs share.

I want a r/w nfs share

This don’t work

lxc config device add tdarr-node media-files disk source=/media/ path=/media/
lxc config set tdarr-node raw.idmap "both 0 0"

lxc info --show-log tdarr-node
Name: tdarr-node
Status: STOPPED
Type: container
Architecture: x86_64
Location: lxd01
Created: 2024/01/03 04:42 UTC
Last Used: 2024/01/03 05:49 UTC

Log:

lxc tdarr-node 20240103054938.771 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc tdarr-node 20240103054938.771 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc tdarr-node 20240103054938.775 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc tdarr-node 20240103054938.775 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc tdarr-node 20240103054938.930 ERROR    conf - ../src/src/lxc/conf.c:run_buffer:322 - Script exited with status 1
lxc tdarr-node 20240103054938.930 ERROR    conf - ../src/src/lxc/conf.c:lxc_setup:4437 - Failed to run mount hooks
lxc tdarr-node 20240103054938.930 ERROR    start - ../src/src/lxc/start.c:do_start:1272 - Failed to setup container "tdarr-node"
lxc tdarr-node 20240103054938.931 ERROR    sync - ../src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc tdarr-node 20240103054938.936 WARN     network - ../src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "vethc2c14874"
lxc tdarr-node 20240103054938.936 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc tdarr-node 20240103054938.936 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "tdarr-node"
lxc tdarr-node 20240103054938.936 WARN     start - ../src/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 974065
lxc 20240103054939.312 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240103054939.312 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

Tried it with shift too:

lxc config device add tdarr-node media-files disk source=/media/ path=/media/ shift=true
Device media-files added to tdarr-node
lxc start tdarr-node
Error: Failed to setup device mount "media-files": idmapping abilities are required but aren't supported on system
Try `lxc info --show-log tdarr-node` for more info

What else can I do?

I think the topic is obsolete.

As long as I can’t get nvidia decode get up and running the topic itself don’t make sense :frowning: for my use-case

Have someone an idea to fix

apt install libnvidia-decode-535
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libnvidia-compute-535
The following NEW packages will be installed:
  libnvidia-compute-535 libnvidia-decode-535
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 1893 kB/42.6 MB of archives.
After this operation, 186 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 libnvidia-decode-535 amd64 535.129.03-0ubuntu0.22.04.1 [1893 kB]
Fetched 1893 kB in 1s (1826 kB/s)
(Reading database ... 36344 files and directories currently installed.)
Preparing to unpack .../libnvidia-compute-535_535.129.03-0ubuntu0.22.04.1_amd64.deb ...
Unpacking libnvidia-compute-535:amd64 (535.129.03-0ubuntu0.22.04.1) ...
dpkg: error processing archive /var/cache/apt/archives/libnvidia-compute-535_535.129.03-0ubuntu0.22.04.1_amd64.deb (--unpack):
 unable to make backup link of './usr/lib/x86_64-linux-gnu/libcuda.so.535.129.03' before installing new version: Invalid cross-device link
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Selecting previously unselected package libnvidia-decode-535:amd64.
Preparing to unpack .../libnvidia-decode-535_535.129.03-0ubuntu0.22.04.1_amd64.deb ...
Unpacking libnvidia-decode-535:amd64 (535.129.03-0ubuntu0.22.04.1) ...
Errors were encountered while processing:
 /var/cache/apt/archives/libnvidia-compute-535_535.129.03-0ubuntu0.22.04.1_amd64.deb
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)

You should never install NVIDIA packages in the container, that’s the whole reason for nvidia.runtime to exist, libraries and tools must be passed from the host.

You should get all libraries when setting nvidia.driver.capabilities to all

That was the fix
Thank you.
Maybe I come back to the NFS share issue again.

It is working like a charm. Thank you. All issues are resolved.