Issue with IPvlan and static IPv6

Hello,

configuring a new LXD server where some instances will have a public IPv6 address assigned. From research there are quite a few options but each of them needs special setup from my understanding. In one of my attempts I tried to use IPVLan as this seems like the simplest option to assign a static IPv6 address. In the beginning it all seemed to work until you try to restart one instance and I get the following stack trace:

Name: backup
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2023/03/03 16:57 AWST
Last Used: 2023/03/06 13:52 AWST

Log:

lxc backup 20230306055236.172 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:748 - File exists - Creating the final cgroup 10(lxc.monitor.backup) failed
lxc backup 20230306055236.172 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:cgroup_tree_create:808 - File exists - Failed to create monitor cgroup 10(lxc.monitor.backup)
lxc backup 20230306055236.269 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:748 - File exists - Creating the final cgroup 10(lxc.payload.backup) failed
lxc backup 20230306055236.269 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:cgroup_tree_create:808 - File exists - Failed to create payload cgroup 10(lxc.payload.backup)
lxc backup 20230306055236.275 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(44, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc backup 20230306055236.275 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:fchowmodat:1619 - No such file or directory - Failed to fchownat(44, memory.reclaim, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc backup 20230306055236.276 ERROR    network - ../lxc-5.0.2/src/lxc/network.c:lxc_setup_l2proxy:3304 - File exists - Failed to add ipv6 dest "2a01:xxx:yyy:533a::200" for network device "lo"
lxc backup 20230306055236.276 ERROR    network - ../lxc-5.0.2/src/lxc/network.c:lxc_create_network_priv:3423 - File exists - Failed to setup l2proxy
lxc backup 20230306055236.276 ERROR    start - ../lxc-5.0.2/src/lxc/start.c:lxc_spawn:1840 - Failed to create the network
lxc backup 20230306055236.277 ERROR    lxccontainer - ../lxc-5.0.2/src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc backup 20230306055236.277 ERROR    start - ../lxc-5.0.2/src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "backup"
lxc backup 20230306055236.277 WARN     start - ../lxc-5.0.2/src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 45 for process 15979
lxc backup 20230306055241.402 WARN     cgfsng - ../lxc-5.0.2/src/lxc/cgroups/cgfsng.c:cgroup_tree_remove:490 - No such file or directory - Failed to destroy 23(lxc.payload.backup-1)
lxc 20230306055241.609 ERROR    af_unix - ../lxc-5.0.2/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20230306055241.609 ERROR    commands - ../lxc-5.0.2/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

Steps are as following:

  • create a container => lxc launch …
  • verified all working
  • stopped container
  • edit config to add second device
    eth1:
    ipv6.address: 2a01:xxx:yyy:533a::200
    nictype: ipvlan
    parent: eth0
    type: nic
  • start container with new config
  • can ping public IP from different locations

So far all is good and working fine, no problems until here.

  • stop container
  • review some settings (no changes to the container config)
  • start container

    lxc start backup
    Error: Failed to run: /usr/sbin/lxd forkstart backup /var/lib/lxd/containers /var/log/lxd/backup/lxc.conf: exit status 1
    Try lxc info --show-log backup for more info

Lxc info reports the stack trace above. From this point on the instance can’t be started again until you either reboot or remove the IPVLan configuration.
One other site effect I noticed is that other container also start failing to start and/or sometimes ‘lxc ls’ starts hanging.

Looking at the stack trace I wonder what kind of file hasn’t been cleaned up and where it is located. Any hint or pointers where to start looking are much appreciated.

Thanks

LXD version?

Thought that the logs would tell which LXD version :wink:

System is a fresh OpenRC Gentoo build LXD 5.0.2 and the same for the container.

Spend a moment to check the LXC code and it seems to me that it fails to call netlink_open?
Properly is wasn’t cleanup correctly during container shutdown?

Happy to perform some more testing or test builds if required.

Are you able to try the LXD 5.12 version (https://github.com/lxc/lxd/tree/lxd-5.12) as it had some changes to the ipvlan NIC in it.

Be ware that upgrading your current LXD install to 5.12 will mean you cannot downgrade back to the LXD 5.0.x series. So worth trying in a separate (virtual?) machine for testing.

Hello Tomp,

will give this a try as soon as I have some cycles.

Need to setup a new VM anyway so testing the changes will be part of it.

Thanks