I am running several GUI enabled containers following the famous guide by simos. For almost one year I had not problems running GUI apps as well as headless CUDA programs, but today none of the nvidia.runtime enabled containers can start. I did no manual
apt upgrade recently, perhaps there was an update to the LXD snap in the background. The error log
lxc info --show-log for my container called
cuda shows the following:
lxc cuda 20220215163906.960 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing lxc cuda 20220215163906.960 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing lxc cuda 20220215163906.966 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing lxc cuda 20220215163906.966 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing lxc cuda 20220215163906.969 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW ) lxc cuda 20220215163906.227 ERROR conf - conf.c:run_buffer:321 - Script exited with status 1 lxc cuda 20220215163906.227 ERROR conf - conf.c:lxc_setup:4395 - Failed to run mount hooks lxc cuda 20220215163906.227 ERROR start - start.c:do_start:1275 - Failed to setup container "cuda" lxc cuda 20220215163906.227 ERROR sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4) lxc cuda 20220215163906.231 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth5b1a67f5" lxc cuda 20220215163906.231 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING" lxc cuda 20220215163906.231 ERROR start - start.c:__lxc_start:2074 - Failed to spawn container "cuda" lxc cuda 20220215163906.231 WARN start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 43 for process 20297 lxc cuda 20220215163911.317 WARN conf - conf.c:lxc_map_ids:3588 - newuidmap binary is missing lxc cuda 20220215163911.317 WARN conf - conf.c:lxc_map_ids:3594 - newgidmap binary is missing lxc 20220215163911.354 ERROR af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20220215163911.354 ERROR commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
My LXD version (snap) is
4.23, NVidia driver
470.103.01 all running at
Ubuntu 20.04.3 with
Linux 5.13.0-28-generic x86_64 kernel.
I will be glad for any advice how to fix this.