Haven’t root caused the problem. The title is just a speculation. I am here to get more troubleshoot/debug instructions.
Setup: 3 identical fresh vultr ubuntu 18.04 nodes with snap lxd 4.0 installed a few days ago (after 4.0 released)
After fresh installs, cluster is fully operational. I can launch containers in all 3 nodes, and inside each container, I can resolve all .lxd domains and connect to all containers.
A few days later, the fan networks cross nodes are broken:
- Inside each container, it can only resolve .lxd domains of the containers on the same node. When resolving .lxd domains of containers on another node, it blocks for 3~4 seconds and returns empty result.
- Inside each container, pinging containers on a different node results in
Destination Host Unreachable
. However pinging containers on the same node is fine. - From host,
dig name.lxd @240.80.0.1
works fine for containers on the same node; blocks for 3~4 seconds and returns empty result for containers on another node. lxc list
on all nodes works and showing all containers running with ipv4 address assignedlxc cluster list
on all nodes works and showing all nodes fully operationallxc exec name -- bash
DOES work from any node to any container- Restarting containers does NOT resolve the issue.
systemctl reload snap.lxd.daemon
a single node does NOT resolve the issue.systemctl reload snap.lxd.daemon
all nodes does NOT resolve the issue.systemctl restart snap.lxd.daemon
a single node does NOT resolve the issue.systemctl restart snap.lxd.daemon
2 nodes DOES resolve the connection/dns between these 2 nodes. Connection/dns from/to the third node is stillDestination Host Unreachable
/ empty resultsystemctl restart snap.lxd.daemon
all nodes DOES resolve the issue completely
However, after a few hours, the network broke again. I didn’t find anything other than syslog which provides some hint. Only thing happens within these few hours is snap refresh:
Apr 16 05:17:01 node0 CRON[6829]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 16 05:52:35 node0 snapd[2100]: storehelpers.go:438: cannot refresh: snap has no updates available: “core”, “core18”
Apr 16 05:52:40 node0 systemd[1]: Reloading.
Apr 16 05:52:41 node0 systemd[1]: Starting Message of the Day…
Apr 16 05:52:41 node0 systemd[1]: Mounting Mount unit for lxd, revision 14611…
Apr 16 05:52:41 node0 systemd[1]: Mounted Mount unit for lxd, revision 14611.
Apr 16 05:52:41 node0 systemd[1]: Closed Socket unix for snap application lxd.daemon.
Apr 16 05:52:41 node0 systemd[1]: Stopping Service for snap application lxd.daemon…
Apr 16 05:52:41 node0 50-motd-news[20321]: * Kubernetes 1.18 GA is now available! See https://microk8s.io for docs or
Apr 16 05:52:41 node0 50-motd-news[20321]: install it with:
Apr 16 05:52:41 node0 50-motd-news[20321]: sudo snap install microk8s --channel=1.18 --classic
Apr 16 05:52:41 node0 50-motd-news[20321]: * Multipass 1.1 adds proxy support for developers behind enterprise
Apr 16 05:52:41 node0 50-motd-news[20321]: firewalls. Rapid prototyping for cloud operations just got easier.
Apr 16 05:52:41 node0 50-motd-news[20321]: https://multipass.run/
Apr 16 05:52:41 node0 systemd[1]: Started Message of the Day.
Apr 16 05:52:42 node0 lxd.daemon[20360]: => Stop reason is: snap refresh
Apr 16 05:52:42 node0 lxd.daemon[20360]: => Stopping LXD
Apr 16 05:52:42 node0 lxd.daemon[25944]: t=2020-04-16T05:52:42+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:48276->10.64.96.0.36:8443: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: t=2020-04-16T05:52:43+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:48282->10.64.96.0.36:8443: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: t=2020-04-16T05:52:43+0000 lvl=warn msg=“Dqlite server proxy TLS → Unix: read tcp 10.64.96.0.80:8443->10.64.96.0.36:43846: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: => LXD exited cleanly
Apr 16 05:52:43 node0 systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 16 05:52:44 node0 systemd[1]: Reloading.
Apr 16 05:52:44 node0 kernel: [291235.614260] kauditd_printk_skb: 5 callbacks suppressed
Apr 16 05:52:44 node0 kernel: [291235.614261] audit: type=1400 audit(1587016364.457:516): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“/snap/core/8935/usr/lib/snapd/snap-confine” pid=20513 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.614264] audit: type=1400 audit(1587016364.457:517): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“/snap/core/8935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper” pid=20513 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.687538] audit: type=1400 audit(1587016364.529:518): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.activate” pid=20515 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.757994] audit: type=1400 audit(1587016364.601:519): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.benchmark” pid=20516 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.821838] audit: type=1400 audit(1587016364.665:520): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.buginfo” pid=20517 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.892226] audit: type=1400 audit(1587016364.733:521): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.check-kernel” pid=20518 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.981815] audit: type=1400 audit(1587016364.825:522): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.daemon” pid=20519 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291236.068705] audit: type=1400 audit(1587016364.913:523): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.configure” pid=20520 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291236.136342] audit: type=1400 audit(1587016364.981:524): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.install” pid=20521 comm=“apparmor_parser”
Apr 16 05:52:45 node0 kernel: [291236.208660] audit: type=1400 audit(1587016365.053:525): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.remove” pid=20522 comm=“apparmor_parser”
Apr 16 05:52:45 node0 systemd[1]: Reloading.
Apr 16 05:52:45 node0 systemd[1]: Listening on Socket unix for snap application lxd.daemon.
Apr 16 05:52:45 node0 systemd[1]: Starting Service for snap application lxd.activate…
Apr 16 05:52:46 node0 lxd.activate[20557]: => Starting LXD activation
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Loading snap configuration
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Checking for socket activation support
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Setting LXD socket ownership
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Checking if LXD needs to be activated
Apr 16 05:52:47 node0 systemd[1]: Started Service for snap application lxd.daemon.
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Preparing the system (14611)
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Loading snap configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up mntns symlink (mnt:[4026533088])
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up kmod wrapper
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing /boot
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing a clean copy of /run
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing a clean copy of /etc
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up ceph configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up LVM configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Rotating logs
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up ZFS (0.7)
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Escaping the systemd cgroups
Apr 16 05:52:47 node0 lxd.daemon[20616]: ====> Detected cgroup V1
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Escaping the systemd process resource limits
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Disabling shiftfs on this kernel (auto)
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Re-using existing LXCFS
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Starting LXD
Apr 16 05:52:48 node0 lxd.daemon[20616]: t=2020-04-16T05:52:48+0000 lvl=warn msg=" - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored"
Apr 16 05:52:48 node0 lxd.daemon[20616]: t=2020-04-16T05:52:48+0000 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=10.64.96.0.80:8443 attempt=0”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222c: use of closed network connection”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222b: use of closed network connection”
Apr 16 05:53:07 node0 lxd.daemon[25944]: Closed liblxcfs.so
Apr 16 05:53:07 node0 lxd.daemon[25944]: Running destructor lxcfs_exit
Apr 16 05:53:07 node0 kernel: [291258.699936] new mount options do not match the existing superblock, will be ignored
Apr 16 05:53:07 node0 lxd.daemon[25944]: Running constructor lxcfs_init to reload liblxcfs
Apr 16 05:53:07 node0 lxd.daemon[25944]: mount namespace: 5
Apr 16 05:53:07 node0 lxd.daemon[25944]: hierarchies:
Apr 16 05:53:07 node0 lxd.daemon[25944]: 0: fd: 6:
Apr 16 05:53:07 node0 lxd.daemon[25944]: 1: fd: 7: name=systemd
Apr 16 05:53:07 node0 lxd.daemon[25944]: 2: fd: 8: hugetlb
Apr 16 05:53:07 node0 lxd.daemon[25944]: 3: fd: 9: cpu,cpuacct
Apr 16 05:53:07 node0 lxd.daemon[25944]: 4: fd: 10: blkio
Apr 16 05:53:07 node0 lxd.daemon[25944]: 5: fd: 11: memory
Apr 16 05:53:07 node0 lxd.daemon[25944]: 6: fd: 12: freezer
Apr 16 05:53:07 node0 lxd.daemon[25944]: 7: fd: 13: cpuset
Apr 16 05:53:07 node0 lxd.daemon[25944]: 8: fd: 14: net_cls,net_prio
Apr 16 05:53:07 node0 lxd.daemon[25944]: 9: fd: 15: pids
Apr 16 05:53:07 node0 lxd.daemon[25944]: 10: fd: 16: devices
Apr 16 05:53:07 node0 lxd.daemon[25944]: 11: fd: 17: perf_event
Apr 16 05:53:07 node0 lxd.daemon[25944]: 12: fd: 19: rdma
Apr 16 05:53:07 node0 lxd.daemon[25944]: api_extensions:
Apr 16 05:53:07 node0 lxd.daemon[25944]: - cgroups
Apr 16 05:53:07 node0 lxd.daemon[25944]: - sys_cpu_online
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_cpuinfo
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_diskstats
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_loadavg
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_meminfo
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_stat
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_swaps
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_uptime
Apr 16 05:53:07 node0 lxd.daemon[25944]: - shared_pidns
Apr 16 05:53:07 node0 lxd.daemon[25944]: - cpuview_daemon
Apr 16 05:53:07 node0 lxd.daemon[25944]: - loadavg_daemon
Apr 16 05:53:07 node0 lxd.daemon[25944]: - pidfds
Apr 16 05:53:07 node0 lxd.daemon[25944]: Reloaded LXCFS
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222d: use of closed network connection”
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:39928->10.64.96.0.80:8443: use of closed network connection”
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:34 node0 kernel: [291286.118449] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:34 node0 kernel: [291286.119178] device lxdfan0-mtu left promiscuous mode
Apr 16 05:53:34 node0 kernel: [291286.119180] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-mtu: Link DOWN
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-mtu: Lost carrier
Apr 16 05:53:34 node0 systemd-timesyncd[451]: Network configuration changed, trying to establish connection.
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-fan: Link DOWN
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 kernel: [291286.136460] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Apr 16 05:53:34 node0 kernel: [291286.138438] device lxdfan0-fan left promiscuous mode
Apr 16 05:53:34 node0 kernel: [291286.138442] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-fan: Lost carrier
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 kernel: [291286.167932] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Apr 16 05:53:35 node0 kernel: [291286.167935] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:35 node0 kernel: [291286.169480] device lxdfan0-mtu entered promiscuous mode
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Link UP
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Gained carrier
Apr 16 05:53:35 node0 kernel: [291286.172904] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Apr 16 05:53:35 node0 kernel: [291286.172906] lxdfan0: port 1(lxdfan0-mtu) entered forwarding state
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Gained IPv6LL
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 lxd.daemon[20616]: t=2020-04-16T05:53:35+0000 lvl=eror msg=“Failed to bring up network” err=“Failed to run: ip link set dev lxdfan0 mtu 1450: RTNETLINK answers: Invalid argument” name=lxdfan0
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Synchronized to time server 108.61.73.244:123 (2.time.constant.com).
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 networkd-dispatcher[513]: WARNING:Unknown index 63 seen, reloading interface list
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 systemd-udevd[20955]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 16 05:53:35 node0 systemd-udevd[20955]: Could not generate persistent MAC address for lxdfan0-mtu: No such file or directory
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Network configuration changed, trying to establish connection.
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Synchronized to time server 108.61.73.244:123 (2.time.constant.com).
Apr 16 05:53:35 node0 systemd[1]: Started Service for snap application lxd.activate.
Apr 16 05:53:35 node0 systemd[1]: Reloading.
Apr 16 05:53:35 node0 lxd.daemon[20616]: => LXD is ready
Apr 16 05:53:37 node0 snapd[2100]: storehelpers.go:438: cannot refresh snap “lxd”: snap has no updates available
Around the same time, I see the following logs in the other two nodes:
Apr 15 05:55:18 node1 systemd-resolved[486]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Apr 15 05:55:18 node1 systemd-resolved[486]: message repeated 2 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]
Apr 15 05:53:34 node2 systemd-resolved[488]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Apr 15 05:53:34 node2 systemd-resolved[488]: message repeated 2 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]
Version/setup details:
uname -a
:
Linux node0.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux node1.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux node2.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Vultr private network (Yes it’s /20 not /24 nor /16)
node0: 10.64.96.80/20
node1: 10.64.96.36/20
node2: 10.64.96.100/20
Fan bridge: 10.64.96.0/24 (as it only supports /24 or /16)
lxd --version
: 4.0.0
lxc --version
: 4.0.0
which lxd
: /snap/bin/lxd (deb lxd was installed out of a fresh OS install. I removed it with apt purge lxd lxd-client
before snap install lxd
)
which lxc
: /snap/bin/lxc
I plan to do:
systemctl restart snap.lxd.daemon
on all nodes, then setup a script to keepdig ...lxd
every few seconds to see when exactly the network breaks.- Change private network to /24 to match fan bridge config and see if that helps.
Any idea or ways to increase lxd verbose level to help debugging?