Second Node in LXD Cluster All lxc commands hang after reboot

Hi, just looking for suggestions on where to start looking for what would cause the lxd cluster second node to have all lxc commands hanging after reboot of node. All is good when first built, but after reboot all commands such as lxc cluster list, lxc list, lxc storage list, etc. all hang on the second node.

[ubuntu@o83sv2 ~]$ lxc cluster list
^C
[ubuntu@o83sv2 ~]$ lxc --version
4.13

[ubuntu@o83sv2 ~]$ sudo snap list
Name Version Rev Tracking Publisher Notes
core18 20210309 1997 latest/stable canonical✓ base
lxd 4.13 20222 latest/stable canonical✓ -
snapd 2.49.2 11588 latest/stable canonical✓ snapd
[ubuntu@o83sv2 ~]$ cat /etc/oracle-release
Oracle Linux Server release 8.3
[ubuntu@o83sv2 ~]$

On the first node, with the database on it, everything runs fine:
>
> [ubuntu@o83sv1 ~]$ lxc cluster list
> ±-------±---------------------------±---------±-------±------------------±-------------±---------------+
> | NAME | URL | DATABASE | STATE | MESSAGE | ARCHITECTURE | FAILURE DOMAIN |
> ±-------±---------------------------±---------±-------±------------------±-------------±---------------+
> | o83sv1 | https://10.209.53.1:8443 | YES | ONLINE | Fully operational | x86_64 | default |
> ±-------±---------------------------±---------±-------±------------------±-------------±---------------+
> | o83sv2 | https://10.209.53.201:8443 | NO | ONLINE | Fully operational | x86_64 | default |
> ±-------±---------------------------±---------±-------±------------------±-------------±---------------+
> [ubuntu@o83sv1 ~]$ lxc storage list
> ±------±-------±----------------±--------±--------+
> | NAME | DRIVER | DESCRIPTION | USED BY | STATE |
> ±------±-------±----------------±--------±--------+
> | local | zfs | o83sv1-olxc-001 | 9 | CREATED |
> ±------±-------±----------------±--------±--------+
> [ubuntu@o83sv1 ~]$

However, even on node 1, the “lxc list” command takes a very long time to return values, due to issue on node 2.

Thanks

Sounds like the second node is failing to reach the first one and connecting to the database. Anything useful in lxd.log on the second node?

Thanks for the reply!

The issue was an MTU setting on the OpenvSwitches. Because the clustering in this case goes over [gre|geneve|vxlan] tunnel ports on the OpenvSwitches, the MTU on the OpenvSwitches that have the tunnel ports connected needed a reduced MTU setting. Software has been updated to include the required MTU setting. With this change, everything is working.