Container does not start with nvidia.runtime: "true"

Hello,

I have an ArchLinux host and container with gpu device. Unfortunately, the container does not start with nvidia.runtime set as true.

Here is the log,

$ lxc info --show-log plex
Name: plex
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2019/09/14 18:15 IST
Last Used: 2022/02/16 12:18 IST

Log:

lxc plex 20220216064823.792 WARN     conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc plex 20220216064823.792 WARN     conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc plex 20220216064823.794 WARN     conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc plex 20220216064823.794 WARN     conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc plex 20220216064825.298 ERROR    conf - conf.c:run_buffer:321 - Script exited with status 1
lxc plex 20220216064825.298 ERROR    conf - conf.c:lxc_setup:4395 - Failed to run mount hooks
lxc plex 20220216064825.298 ERROR    start - start.c:do_start:1275 - Failed to setup container "plex"
lxc plex 20220216064825.298 ERROR    sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc plex 20220216064825.309 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth88b30092"
lxc plex 20220216064825.310 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc plex 20220216064825.310 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "plex"
lxc plex 20220216064825.310 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 17 for process 172500
lxc plex 20220216064830.640 WARN     conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc plex 20220216064830.641 WARN     conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc 20220216064830.759 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220216064830.760 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

On the host i have the following installed,

# pacman -Q|grep nvidia
libnvidia-container 1.8.0-2
libnvidia-container-tools 1.8.0-2
nvidia-container-toolkit 1.8.0-3
nvidia-dkms 510.54-1
nvidia-utils 510.54-1

Any idea what could be the issue?

Could be related to Nvidia.runtime enabled containers suddenly not able to start?

1 Like