New containers wont start: No such process - Failed to move network device

Hello.

Suddenly I can’t create new containers (or at least they won’t start of come online). This has worked before.

This is how it looks when I try to start them:

$ lxc launch ubuntu:focal tunis1:c1 
Creating c1
Starting c1
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart c1 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c1/lxc.conf: exit status 1
Try `lxc info --show-log tunis1:c1` for more info
$ lxc info --show-log tunis1:c1
Name: c1
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2023/05/08 14:12 CEST
Last Used: 2023/05/08 14:12 CEST

Log:

lxc c1 20230508121234.787 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc c1 20230508121234.787 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc c1 20230508121234.788 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3621 - newuidmap binary is missing
lxc c1 20230508121234.788 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3627 - newgidmap binary is missing
lxc c1 20230508121234.788 WARN     cgfsng - ../src/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc c1 20230508121234.791 WARN     start - ../src/src/lxc/start.c:lxc_spawn:1832 - Bad file descriptor - Failed to allocate new network namespace id
lxc c1 20230508121234.797 ERROR    network - ../src/src/lxc/network.c:lxc_network_move_created_netdev_priv:3549 - No such process - Failed to move network device "veth685074f6" with ifindex 188 to network namespace 3348002 and rename to physYIbNno
lxc c1 20230508121234.798 ERROR    start - ../src/src/lxc/start.c:lxc_spawn:1840 - Failed to create the network
lxc c1 20230508121234.801 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc c1 20230508121234.801 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "c1"
lxc c1 20230508121234.801 WARN     start - ../src/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 41 for process 3348002
lxc 20230508121234.821 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20230508121234.821 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

The containers seems to have been partially started, but something prevents them (new containers) from starting:

$ lxc list tunis1:
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+
|     NAME      |  STATE  |            IPV4            |                         IPV6                         |   TYPE    | SNAPSHOTS |
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+
| c1            | STOPPED |                            |                                                      | CONTAINER | 0         |
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+
| juju-3eef05-0 | RUNNING | 192.168.151.46 (eth0)      |                                                      | CONTAINER | 0         |
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+
| juju-f2d3e9-0 | STOPPED |                            |                                                      | CONTAINER | 0         |
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+
| juju-f2d3e9-1 | STOPPED |                            |                                                      | CONTAINER | 0         |
+---------------+---------+----------------------------+------------------------------------------------------+-----------+-----------+

Whats going on? How can I fix this?

$ lxc version tunis1:
Client version: 5.13
Server version: 5.13

Hello,

this looks familiar. Are you by any chance running Mullvad VPN? If so, please take a look at HELP ! HELP ! HELP ! Cgroup2 related issue on ubuntu jammy with mullvad and PrivateInternetAccess VPN - #14 by tomp.

Otherwise could you please start LXD in debug mode, and share the logs? You can enable debug mode with snap set lxd daemon.debug=true.

2 Likes

Hey @monstermunchkin,

Thanks for answering! No vpn is used. We tried to enable debug mode on lxd but we don’t see any differences in the logs. Maybe we are looking at the wrong place.

Some things that could matter in the troubleshoot is that we have configured bonding between two interfaces (active-backup) and a bridge that uses that bond in netplan, that lxd uses. We also have a failover solution using two routers with VRRP configured.

When this problem occurred we noticed that the backup router was active instead of the primary. We have restored so the primary router is active again. I don’t think that’s the problem but it’s worth mentioning.

Anyone got an idea on what the problem might be?

SOLVED!

Found the problem!
It was a limit value in default profile that had been set to an incorrect value. This was the reason why containers wouldn’t start, they only had 16b or 16kb of ram.

Thanks again @monstermunchkin for your help.

2 Likes