I am running several GUI enabled containers following the famous guide by simos. For almost one year I had not problems running GUI apps as well as headless CUDA programs, but today none of the nvidia.runtime enabled containers can start. I did no manual apt upgrade
recently, perhaps there was an update to the LXD snap in the background. The error log lxc info --show-log
for my container called cuda
shows the following:
lxc cuda 20220215163906.960 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc cuda 20220215163906.960 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc cuda 20220215163906.966 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc cuda 20220215163906.966 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc cuda 20220215163906.969 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc cuda 20220215163906.227 ERROR conf - conf.c:run_buffer:321 - Script exited with status 1
lxc cuda 20220215163906.227 ERROR conf - conf.c:lxc_setup:4395 - Failed to run mount hooks
lxc cuda 20220215163906.227 ERROR start - start.c:do_start:1275 - Failed to setup container "cuda"
lxc cuda 20220215163906.227 ERROR sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc cuda 20220215163906.231 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth5b1a67f5"
lxc cuda 20220215163906.231 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc cuda 20220215163906.231 ERROR start - start.c:__lxc_start:2074 - Failed to spawn container "cuda"
lxc cuda 20220215163906.231 WARN start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 43 for process 20297
lxc cuda 20220215163911.317 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing
lxc cuda 20220215163911.317 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing
lxc 20220215163911.354 ERROR af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220215163911.354 ERROR commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
My LXD version (snap) is 4.23
, NVidia driver 470.103.01
all running at Ubuntu 20.04.3
with Linux 5.13.0-28-generic x86_64
kernel.
I will be glad for any advice how to fix this.