LXD 4.0 cluster snap refresh breaks fan bridge

Haven’t root caused the problem. The title is just a speculation. I am here to get more troubleshoot/debug instructions.

Setup: 3 identical fresh vultr ubuntu 18.04 nodes with snap lxd 4.0 installed a few days ago (after 4.0 released)
After fresh installs, cluster is fully operational. I can launch containers in all 3 nodes, and inside each container, I can resolve all .lxd domains and connect to all containers.
A few days later, the fan networks cross nodes are broken:

  • Inside each container, it can only resolve .lxd domains of the containers on the same node. When resolving .lxd domains of containers on another node, it blocks for 3~4 seconds and returns empty result.
  • Inside each container, pinging containers on a different node results in Destination Host Unreachable. However pinging containers on the same node is fine.
  • From host, dig name.lxd @240.80.0.1 works fine for containers on the same node; blocks for 3~4 seconds and returns empty result for containers on another node.
  • lxc list on all nodes works and showing all containers running with ipv4 address assigned
  • lxc cluster list on all nodes works and showing all nodes fully operational
  • lxc exec name -- bash DOES work from any node to any container
  • Restarting containers does NOT resolve the issue.
  • systemctl reload snap.lxd.daemon a single node does NOT resolve the issue.
  • systemctl reload snap.lxd.daemon all nodes does NOT resolve the issue.
  • systemctl restart snap.lxd.daemon a single node does NOT resolve the issue.
  • systemctl restart snap.lxd.daemon 2 nodes DOES resolve the connection/dns between these 2 nodes. Connection/dns from/to the third node is still Destination Host Unreachable / empty result
  • systemctl restart snap.lxd.daemon all nodes DOES resolve the issue completely

However, after a few hours, the network broke again. I didn’t find anything other than syslog which provides some hint. Only thing happens within these few hours is snap refresh:

Apr 16 05:17:01 node0 CRON[6829]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 16 05:52:35 node0 snapd[2100]: storehelpers.go:438: cannot refresh: snap has no updates available: “core”, “core18”
Apr 16 05:52:40 node0 systemd[1]: Reloading.
Apr 16 05:52:41 node0 systemd[1]: Starting Message of the Day…
Apr 16 05:52:41 node0 systemd[1]: Mounting Mount unit for lxd, revision 14611…
Apr 16 05:52:41 node0 systemd[1]: Mounted Mount unit for lxd, revision 14611.
Apr 16 05:52:41 node0 systemd[1]: Closed Socket unix for snap application lxd.daemon.
Apr 16 05:52:41 node0 systemd[1]: Stopping Service for snap application lxd.daemon…
Apr 16 05:52:41 node0 50-motd-news[20321]: * Kubernetes 1.18 GA is now available! See https://microk8s.io for docs or
Apr 16 05:52:41 node0 50-motd-news[20321]: install it with:
Apr 16 05:52:41 node0 50-motd-news[20321]: sudo snap install microk8s --channel=1.18 --classic
Apr 16 05:52:41 node0 50-motd-news[20321]: * Multipass 1.1 adds proxy support for developers behind enterprise
Apr 16 05:52:41 node0 50-motd-news[20321]: firewalls. Rapid prototyping for cloud operations just got easier.
Apr 16 05:52:41 node0 50-motd-news[20321]: https://multipass.run/
Apr 16 05:52:41 node0 systemd[1]: Started Message of the Day.
Apr 16 05:52:42 node0 lxd.daemon[20360]: => Stop reason is: snap refresh
Apr 16 05:52:42 node0 lxd.daemon[20360]: => Stopping LXD
Apr 16 05:52:42 node0 lxd.daemon[25944]: t=2020-04-16T05:52:42+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:48276->10.64.96.0.36:8443: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: t=2020-04-16T05:52:43+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:48282->10.64.96.0.36:8443: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: t=2020-04-16T05:52:43+0000 lvl=warn msg=“Dqlite server proxy TLS → Unix: read tcp 10.64.96.0.80:8443->10.64.96.0.36:43846: use of closed network connection”
Apr 16 05:52:43 node0 lxd.daemon[25944]: => LXD exited cleanly
Apr 16 05:52:43 node0 systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 16 05:52:44 node0 systemd[1]: Reloading.
Apr 16 05:52:44 node0 kernel: [291235.614260] kauditd_printk_skb: 5 callbacks suppressed
Apr 16 05:52:44 node0 kernel: [291235.614261] audit: type=1400 audit(1587016364.457:516): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“/snap/core/8935/usr/lib/snapd/snap-confine” pid=20513 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.614264] audit: type=1400 audit(1587016364.457:517): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“/snap/core/8935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper” pid=20513 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.687538] audit: type=1400 audit(1587016364.529:518): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.activate” pid=20515 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.757994] audit: type=1400 audit(1587016364.601:519): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.benchmark” pid=20516 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.821838] audit: type=1400 audit(1587016364.665:520): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.buginfo” pid=20517 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.892226] audit: type=1400 audit(1587016364.733:521): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.check-kernel” pid=20518 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291235.981815] audit: type=1400 audit(1587016364.825:522): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.daemon” pid=20519 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291236.068705] audit: type=1400 audit(1587016364.913:523): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.configure” pid=20520 comm=“apparmor_parser”
Apr 16 05:52:44 node0 kernel: [291236.136342] audit: type=1400 audit(1587016364.981:524): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.install” pid=20521 comm=“apparmor_parser”
Apr 16 05:52:45 node0 kernel: [291236.208660] audit: type=1400 audit(1587016365.053:525): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“snap.lxd.hook.remove” pid=20522 comm=“apparmor_parser”
Apr 16 05:52:45 node0 systemd[1]: Reloading.
Apr 16 05:52:45 node0 systemd[1]: Listening on Socket unix for snap application lxd.daemon.
Apr 16 05:52:45 node0 systemd[1]: Starting Service for snap application lxd.activate…
Apr 16 05:52:46 node0 lxd.activate[20557]: => Starting LXD activation
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Loading snap configuration
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Checking for socket activation support
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Setting LXD socket ownership
Apr 16 05:52:46 node0 lxd.activate[20557]: ==> Checking if LXD needs to be activated
Apr 16 05:52:47 node0 systemd[1]: Started Service for snap application lxd.daemon.
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Preparing the system (14611)
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Loading snap configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up mntns symlink (mnt:[4026533088])
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up kmod wrapper
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing /boot
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing a clean copy of /run
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Preparing a clean copy of /etc
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up ceph configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up LVM configuration
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Rotating logs
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Setting up ZFS (0.7)
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Escaping the systemd cgroups
Apr 16 05:52:47 node0 lxd.daemon[20616]: ====> Detected cgroup V1
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Escaping the systemd process resource limits
Apr 16 05:52:47 node0 lxd.daemon[20616]: ==> Disabling shiftfs on this kernel (auto)
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Re-using existing LXCFS
Apr 16 05:52:47 node0 lxd.daemon[20616]: => Starting LXD
Apr 16 05:52:48 node0 lxd.daemon[20616]: t=2020-04-16T05:52:48+0000 lvl=warn msg=" - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored"
Apr 16 05:52:48 node0 lxd.daemon[20616]: t=2020-04-16T05:52:48+0000 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=10.64.96.0.80:8443 attempt=0”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222c: use of closed network connection”
Apr 16 05:53:04 node0 lxd.daemon[20616]: t=2020-04-16T05:53:04+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222b: use of closed network connection”
Apr 16 05:53:07 node0 lxd.daemon[25944]: Closed liblxcfs.so
Apr 16 05:53:07 node0 lxd.daemon[25944]: Running destructor lxcfs_exit
Apr 16 05:53:07 node0 kernel: [291258.699936] new mount options do not match the existing superblock, will be ignored
Apr 16 05:53:07 node0 lxd.daemon[25944]: Running constructor lxcfs_init to reload liblxcfs
Apr 16 05:53:07 node0 lxd.daemon[25944]: mount namespace: 5
Apr 16 05:53:07 node0 lxd.daemon[25944]: hierarchies:
Apr 16 05:53:07 node0 lxd.daemon[25944]: 0: fd: 6:
Apr 16 05:53:07 node0 lxd.daemon[25944]: 1: fd: 7: name=systemd
Apr 16 05:53:07 node0 lxd.daemon[25944]: 2: fd: 8: hugetlb
Apr 16 05:53:07 node0 lxd.daemon[25944]: 3: fd: 9: cpu,cpuacct
Apr 16 05:53:07 node0 lxd.daemon[25944]: 4: fd: 10: blkio
Apr 16 05:53:07 node0 lxd.daemon[25944]: 5: fd: 11: memory
Apr 16 05:53:07 node0 lxd.daemon[25944]: 6: fd: 12: freezer
Apr 16 05:53:07 node0 lxd.daemon[25944]: 7: fd: 13: cpuset
Apr 16 05:53:07 node0 lxd.daemon[25944]: 8: fd: 14: net_cls,net_prio
Apr 16 05:53:07 node0 lxd.daemon[25944]: 9: fd: 15: pids
Apr 16 05:53:07 node0 lxd.daemon[25944]: 10: fd: 16: devices
Apr 16 05:53:07 node0 lxd.daemon[25944]: 11: fd: 17: perf_event
Apr 16 05:53:07 node0 lxd.daemon[25944]: 12: fd: 19: rdma
Apr 16 05:53:07 node0 lxd.daemon[25944]: api_extensions:
Apr 16 05:53:07 node0 lxd.daemon[25944]: - cgroups
Apr 16 05:53:07 node0 lxd.daemon[25944]: - sys_cpu_online
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_cpuinfo
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_diskstats
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_loadavg
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_meminfo
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_stat
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_swaps
Apr 16 05:53:07 node0 lxd.daemon[25944]: - proc_uptime
Apr 16 05:53:07 node0 lxd.daemon[25944]: - shared_pidns
Apr 16 05:53:07 node0 lxd.daemon[25944]: - cpuview_daemon
Apr 16 05:53:07 node0 lxd.daemon[25944]: - loadavg_daemon
Apr 16 05:53:07 node0 lxd.daemon[25944]: - pidfds
Apr 16 05:53:07 node0 lxd.daemon[25944]: Reloaded LXCFS
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:15 node0 lxd.daemon[20616]: t=2020-04-16T05:53:15+0000 lvl=warn msg=“Dqlite client proxy Unix → TLS: read unix @->@2222d: use of closed network connection”
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 10.64.96.0.80:39928->10.64.96.0.80:8443: use of closed network connection”
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:34 node0 kernel: [291286.118449] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:34 node0 kernel: [291286.119178] device lxdfan0-mtu left promiscuous mode
Apr 16 05:53:34 node0 kernel: [291286.119180] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:33 node0 lxd.daemon[20616]: t=2020-04-16T05:53:33+0000 lvl=warn msg=“Dqlite server proxy Unix → TLS: read unix @->@2222a: use of closed network connection”
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-mtu: Link DOWN
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-mtu: Lost carrier
Apr 16 05:53:34 node0 systemd-timesyncd[451]: Network configuration changed, trying to establish connection.
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-fan: Link DOWN
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 kernel: [291286.136460] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Apr 16 05:53:34 node0 kernel: [291286.138438] device lxdfan0-fan left promiscuous mode
Apr 16 05:53:34 node0 kernel: [291286.138442] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Apr 16 05:53:34 node0 systemd-networkd[1817]: lxdfan0-fan: Lost carrier
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:34 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:34 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 kernel: [291286.167932] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Apr 16 05:53:35 node0 kernel: [291286.167935] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Apr 16 05:53:35 node0 kernel: [291286.169480] device lxdfan0-mtu entered promiscuous mode
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Link UP
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Gained carrier
Apr 16 05:53:35 node0 kernel: [291286.172904] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Apr 16 05:53:35 node0 kernel: [291286.172906] lxdfan0: port 1(lxdfan0-mtu) entered forwarding state
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 systemd-networkd[1817]: lxdfan0-mtu: Gained IPv6LL
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 lxd.daemon[20616]: t=2020-04-16T05:53:35+0000 lvl=eror msg=“Failed to bring up network” err=“Failed to run: ip link set dev lxdfan0 mtu 1450: RTNETLINK answers: Invalid argument” name=lxdfan0
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Synchronized to time server 108.61.73.244:123 (2.time.constant.com).
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 networkd-dispatcher[513]: WARNING:Unknown index 63 seen, reloading interface list
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 systemd-udevd[20955]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 16 05:53:35 node0 systemd-udevd[20955]: Could not generate persistent MAC address for lxdfan0-mtu: No such file or directory
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Network configuration changed, trying to establish connection.
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 dnsmasq[26167]: reading /etc/resolv.conf
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain 240.in-addr.arpa
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 240.80.0.1#1053 for domain lxd
Apr 16 05:53:35 node0 dnsmasq[26167]: using nameserver 127.0.0.53#53
Apr 16 05:53:35 node0 systemd-timesyncd[451]: Synchronized to time server 108.61.73.244:123 (2.time.constant.com).
Apr 16 05:53:35 node0 systemd[1]: Started Service for snap application lxd.activate.
Apr 16 05:53:35 node0 systemd[1]: Reloading.
Apr 16 05:53:35 node0 lxd.daemon[20616]: => LXD is ready
Apr 16 05:53:37 node0 snapd[2100]: storehelpers.go:438: cannot refresh snap “lxd”: snap has no updates available

Around the same time, I see the following logs in the other two nodes:

Apr 15 05:55:18 node1 systemd-resolved[486]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Apr 15 05:55:18 node1 systemd-resolved[486]: message repeated 2 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]

Apr 15 05:53:34 node2 systemd-resolved[488]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Apr 15 05:53:34 node2 systemd-resolved[488]: message repeated 2 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]

Version/setup details:
uname -a:
Linux node0.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux node1.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Linux node2.my.domain 4.15.0-96-generic #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Vultr private network (Yes it’s /20 not /24 nor /16)
node0: 10.64.96.80/20
node1: 10.64.96.36/20
node2: 10.64.96.100/20
Fan bridge: 10.64.96.0/24 (as it only supports /24 or /16)
lxd --version: 4.0.0
lxc --version: 4.0.0
which lxd: /snap/bin/lxd (deb lxd was installed out of a fresh OS install. I removed it with apt purge lxd lxd-client before snap install lxd)
which lxc: /snap/bin/lxc

I plan to do:

  • systemctl restart snap.lxd.daemon on all nodes, then setup a script to keep dig ...lxd every few seconds to see when exactly the network breaks.
  • Change private network to /24 to match fan bridge config and see if that helps.

Any idea or ways to increase lxd verbose level to help debugging?

Can you show ps fauxww on all nodes when things are no longer working?

I’m wondering if forkdns is gone somehow.

So I tried to manually trigger a snap refresh on node0. It shows

lxd 4.0.0 from Canonical✓ refreshed

It did break the connection between node0 <=> node1 and node0 <=> node2 after the refresh. Connection between node1 <=> node2 still works fine.

I do see forkdns is alive.

ps fauxww on node0:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Apr12 0:27 [kthreadd]
root 4 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< Apr12 0:00 _ [mm_percpu_wq]
root 7 0.0 0.0 0 0 ? S Apr12 0:33 _ [ksoftirqd/0]
root 8 0.0 0.0 0 0 ? I Apr12 2:02 _ [rcu_sched]
root 9 0.0 0.0 0 0 ? I Apr12 0:00 _ [rcu_bh]
root 10 0.0 0.0 0 0 ? S Apr12 0:00 _ [migration/0]
root 11 0.0 0.0 0 0 ? S Apr12 0:00 _ [watchdog/0]
root 12 0.0 0.0 0 0 ? S Apr12 0:00 _ [cpuhp/0]
root 13 0.0 0.0 0 0 ? S Apr12 0:00 _ [kdevtmpfs]
root 14 0.0 0.0 0 0 ? I< Apr12 0:00 _ [netns]
root 15 0.0 0.0 0 0 ? S Apr12 0:00 _ [rcu_tasks_kthre]
root 16 0.0 0.0 0 0 ? S Apr12 0:00 _ [kauditd]
root 17 0.0 0.0 0 0 ? S Apr12 0:00 _ [khungtaskd]
root 18 0.0 0.0 0 0 ? S Apr12 0:00 _ [oom_reaper]
root 19 0.0 0.0 0 0 ? I< Apr12 0:00 _ [writeback]
root 20 0.0 0.0 0 0 ? S Apr12 0:01 _ [kcompactd0]
root 21 0.0 0.0 0 0 ? SN Apr12 0:00 _ [ksmd]
root 22 0.0 0.0 0 0 ? SN Apr12 0:09 _ [khugepaged]
root 23 0.0 0.0 0 0 ? I< Apr12 0:00 _ [crypto]
root 24 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kintegrityd]
root 25 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kblockd]
root 26 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ata_sff]
root 27 0.0 0.0 0 0 ? I< Apr12 0:00 _ [md]
root 28 0.0 0.0 0 0 ? I< Apr12 0:00 _ [edac-poller]
root 29 0.0 0.0 0 0 ? I< Apr12 0:00 _ [devfreq_wq]
root 30 0.0 0.0 0 0 ? I< Apr12 0:00 _ [watchdogd]
root 34 0.1 0.0 0 0 ? S Apr12 9:02 _ [kswapd0]
root 35 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kworker/u3:0]
root 36 0.0 0.0 0 0 ? S Apr12 0:00 _ [ecryptfs-kthrea]
root 78 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kthrotld]
root 79 0.0 0.0 0 0 ? I< Apr12 0:00 _ [acpi_thermal_pm]
root 80 0.0 0.0 0 0 ? S Apr12 0:00 _ [scsi_eh_0]
root 81 0.0 0.0 0 0 ? I< Apr12 0:00 _ [scsi_tmf_0]
root 82 0.0 0.0 0 0 ? S Apr12 0:00 _ [scsi_eh_1]
root 83 0.0 0.0 0 0 ? I< Apr12 0:00 _ [scsi_tmf_1]
root 89 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ipv6_addrconf]
root 98 0.0 0.0 0 0 ? I< Apr12 0:00 _ [kstrp]
root 115 0.0 0.0 0 0 ? I< Apr12 0:00 _ [charger_manager]
root 180 0.0 0.0 0 0 ? I< Apr12 0:06 _ [kworker/0:1H]
root 182 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ttm_swap]
root 283 0.0 0.0 0 0 ? I< Apr12 0:00 _ [raid5wq]
root 339 0.0 0.0 0 0 ? S Apr12 0:10 _ [jbd2/vda1-8]
root 340 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ext4-rsv-conver]
root 370 0.0 0.0 0 0 ? S Apr12 0:00 _ [hwrng]
root 408 0.0 0.0 0 0 ? I< Apr12 0:00 _ [iscsi_eh]
root 414 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ib-comp-wq]
root 415 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ib-comp-unb-wq]
root 416 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ib_mcast]
root 417 0.0 0.0 0 0 ? I< Apr12 0:00 _ [ib_nl_sa_wq]
root 427 0.0 0.0 0 0 ? I< Apr12 0:00 _ [rdma_cm]
root 2039 0.0 0.0 0 0 ? S< Apr12 0:00 _ [loop0]
root 3740 0.0 0.0 0 0 ? I< Apr12 0:00 _ [dio/vda1]
root 3750 0.0 0.0 0 0 ? S< Apr12 0:00 _ [spl_system_task]
root 3751 0.0 0.0 0 0 ? S< Apr12 0:00 _ [spl_delay_taskq]
root 3752 0.0 0.0 0 0 ? S< Apr12 0:06 _ [spl_dynamic_tas]
root 3753 0.0 0.0 0 0 ? S< Apr12 0:32 _ [spl_kmem_cache]
root 3760 0.0 0.0 0 0 ? S< Apr12 0:00 _ [zvol]
root 3762 0.0 0.0 0 0 ? S Apr12 0:00 _ [arc_prune]
root 3771 0.0 0.0 0 0 ? S Apr12 0:07 _ [arc_reclaim]
root 3772 0.0 0.0 0 0 ? S Apr12 0:01 _ [dbu_evict]
root 3773 0.0 0.0 0 0 ? SN Apr12 0:13 _ [dbuf_evict]
root 3774 0.0 0.0 0 0 ? SN Apr12 0:36 _ [z_vdev_file]
root 3775 0.0 0.0 0 0 ? S Apr12 0:02 _ [l2arc_feed]
root 3897 0.0 0.0 0 0 ? S< Apr12 0:03 _ [z_null_iss]
root 3898 0.0 0.0 0 0 ? S< Apr12 0:04 _ [z_null_int]
root 3899 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_rd_iss]
root 3900 0.0 0.0 0 0 ? S< Apr12 0:13 _ [z_rd_int_0]
root 3901 0.0 0.0 0 0 ? S< Apr12 0:14 _ [z_rd_int_1]
root 3902 0.0 0.0 0 0 ? S< Apr12 0:15 _ [z_rd_int_2]
root 3903 0.0 0.0 0 0 ? S< Apr12 0:15 _ [z_rd_int_3]
root 3904 0.0 0.0 0 0 ? S< Apr12 0:15 _ [z_rd_int_4]
root 3905 0.0 0.0 0 0 ? S< Apr12 0:16 _ [z_rd_int_5]
root 3906 0.0 0.0 0 0 ? S< Apr12 0:15 _ [z_rd_int_6]
root 3907 0.0 0.0 0 0 ? S< Apr12 0:15 _ [z_rd_int_7]
root 3908 0.0 0.0 0 0 ? S< Apr12 3:38 _ [z_wr_iss]
root 3909 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_wr_iss_h]
root 3910 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_0]
root 3911 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_1]
root 3912 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_2]
root 3913 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_3]
root 3914 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_4]
root 3915 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_5]
root 3916 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_6]
root 3917 0.0 0.0 0 0 ? S< Apr12 0:06 _ [z_wr_int_7]
root 3918 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_wr_int_h]
root 3919 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_0]
root 3920 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_1]
root 3921 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_2]
root 3922 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_3]
root 3923 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_4]
root 3924 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_5]
root 3925 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_6]
root 3926 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_iss_7]
root 3927 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_fr_int]
root 3928 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_cl_iss]
root 3929 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_cl_int]
root 3930 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_ioctl_iss]
root 3931 0.0 0.0 0 0 ? S< Apr12 0:00 _ [z_ioctl_int]
root 3932 0.0 0.0 0 0 ? S Apr12 0:00 _ [z_zvol]
root 3933 0.0 0.0 0 0 ? S Apr12 0:00 _ [z_prefetch]
root 3934 0.0 0.0 0 0 ? S Apr12 0:00 _ [z_upgrade]
root 3935 0.0 0.0 0 0 ? S< Apr12 0:00 _ [metaslab_group_]
root 3942 0.0 0.0 0 0 ? SN Apr12 0:17 _ [dp_sync_taskq]
root 3944 0.0 0.0 0 0 ? SN Apr12 0:00 _ [dp_zil_clean_ta]
root 3946 0.0 0.0 0 0 ? S Apr12 0:00 _ [z_iput]
root 3947 0.0 0.0 0 0 ? S Apr12 0:00 _ [txg_quiesce]
root 3948 0.0 0.0 0 0 ? S Apr12 0:36 _ [txg_sync]
root 3949 0.0 0.0 0 0 ? S Apr12 0:02 _ [mmp]
root 6351 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop2]
root 6469 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop3]
root 20341 0.0 0.0 0 0 ? S< 05:52 0:00 _ [loop4]
root 32545 0.0 0.0 0 0 ? I 18:23 0:00 _ [kworker/0:0]
root 4016 0.0 0.0 0 0 ? I 18:32 0:00 _ [kworker/u2:2]
root 6356 0.0 0.0 0 0 ? I 18:38 0:00 _ [kworker/u2:0]
root 6743 0.0 0.0 0 0 ? I 18:39 0:00 _ [kworker/0:1]
root 7112 0.0 0.0 0 0 ? I 18:39 0:00 _ [kworker/u2:4]
root 8290 0.0 0.0 0 0 ? S< 18:41 0:00 _ [loop1]
root 14917 0.0 0.0 0 0 ? I 18:51 0:00 _ [kworker/u2:1]
root 14918 0.0 0.0 0 0 ? I 18:51 0:00 _ [kworker/u2:3]
root 14928 0.0 0.0 0 0 ? I 18:51 0:00 _ [kworker/0:2]
root 15015 0.4 0.0 0 0 ? I 18:51 0:00 _ [kworker/0:3]
root 17509 0.0 0.0 0 0 ? SN 18:52 0:00 _ [z_vdev_file]
root 17523 0.0 0.0 0 0 ? SN 18:52 0:00 _ [z_vdev_file]
root 17524 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_3]
root 17525 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_4]
root 17527 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_6]
root 17528 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_6]
root 17529 0.0 0.0 0 0 ? SN 18:52 0:00 _ [z_vdev_file]
root 17531 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_7]
root 17533 0.0 0.0 0 0 ? SN 18:52 0:00 _ [z_vdev_file]
root 17534 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_6]
root 17535 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_2]
root 17536 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_2]
root 17537 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_4]
root 17543 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_5]
root 17544 0.0 0.0 0 0 ? S< 18:52 0:00 _ [z_wr_int_5]
root 1 0.0 0.2 225556 6056 ? Ss Apr12 0:05 /sbin/init
root 402 0.0 1.5 163060 30900 ? S<s Apr12 0:11 /lib/systemd/systemd-journald
root 410 0.0 0.0 97708 0 ? Ss Apr12 0:00 /sbin/lvmetad -f
root 412 0.0 0.1 45700 2276 ? Ss Apr12 0:19 /lib/systemd/systemd-udevd
systemd+ 451 0.0 0.0 141936 1288 ? Ssl Apr12 0:00 /lib/systemd/systemd-timesyncd
systemd+ 485 0.0 0.0 70756 1608 ? Ss Apr12 0:02 /lib/systemd/systemd-resolved
message+ 512 0.0 0.0 50100 452 ? Ss Apr12 0:01 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 513 0.0 0.2 170732 5788 ? Ssl Apr12 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 515 0.0 0.0 31620 560 ? Ss Apr12 0:00 /usr/sbin/cron -f
root 517 0.0 0.0 644444 160 ? Ssl Apr12 0:08 /usr/bin/lxcfs /var/lib/lxcfs/
root 518 0.0 0.0 287940 1760 ? Ssl Apr12 0:06 /usr/lib/accountsservice/accounts-daemon
daemon 521 0.0 0.0 28332 28 ? Ss Apr12 0:00 /usr/sbin/atd -f
root 523 0.0 0.1 70644 2460 ? Ss Apr12 0:00 /lib/systemd/systemd-logind
root 539 0.0 0.0 291452 1612 ? Ssl Apr12 0:00 /usr/lib/policykit-1/polkitd --no-debug
root 549 0.0 0.0 187540 292 ? Ssl Apr12 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root 730 0.0 0.0 16480 0 tty1 Ss+ Apr12 0:00 /sbin/agetty -o -p – \u --noclear tty1 linux
root 1558 0.0 0.0 72300 980 ? Ss Apr12 0:02 /usr/sbin/sshd -D
root 7079 0.0 0.3 112280 6860 ? Ss 18:39 0:00 _ sshd: #user# [priv]
#user# 7220 0.0 0.1 112280 4032 ? S 18:39 0:00 _ sshd: #user#@pts/0
#user# 7221 0.0 0.2 23196 5588 pts/0 Ss 18:39 0:00 _ -bash
#user# 17546 0.0 0.1 40116 3764 pts/0 R+ 18:52 0:00 _ ps fauxww
systemd+ 1817 0.0 0.0 80188 1948 ? Ss Apr12 0:00 /lib/systemd/systemd-networkd
root 2100 0.0 0.9 681140 19664 ? Ssl Apr12 0:43 /usr/lib/snapd/snapd
syslog 22386 0.0 0.0 263032 584 ? Ssl Apr13 0:03 /usr/sbin/rsyslogd -n
root 26386 0.0 0.0 65988 1368 ? Ss Apr13 0:08 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock -sf 9534 -x /run/haproxy/admin.sock
haproxy 25262 0.0 0.2 66896 5048 ? S Apr15 0:16 _ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock -sf 9534 -x /run/haproxy/admin.sock
root 14593 0.0 0.1 655208 2144 ? Sl 15:35 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
lxd 14699 0.0 0.0 49968 1560 ? Ss 15:35 0:04 dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.80.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.80.0.2,240.80.0.254,1h -s lxd -S /lxd/240.80.0.1#1053 --rev-server=240.0.0.0/8,240.80.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root 14700 0.0 1.0 1157476 22188 ? Ssl 15:35 0:05 /snap/lxd/current/bin/lxd forkdns 240.80.0.1:1053 lxd lxdfan0
root 15392 0.0 0.7 1090984 15216 ? Ss 15:35 0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers web
1000000 15449 0.0 0.1 225092 3140 ? Ss 15:35 0:00 _ /sbin/init
1000000 15860 0.0 0.1 86632 3724 ? Ss 15:35 0:00 _ /lib/systemd/systemd-journald
1000000 15940 0.0 0.0 42108 572 ? Ss 15:35 0:00 _ /lib/systemd/systemd-udevd
1000100 18558 0.0 0.0 80056 1680 ? Ss 15:36 0:00 _ /lib/systemd/systemd-networkd
1000101 18671 0.0 0.0 70640 1556 ? Ss 15:36 0:01 _ /lib/systemd/systemd-resolved
1000000 19991 0.0 0.2 170836 5328 ? Ssl 15:36 0:00 _ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1000000 19992 0.0 0.0 287992 1604 ? Ssl 15:36 0:00 _ /usr/lib/accountsservice/accounts-daemon
1000000 20014 0.0 0.0 31748 684 ? Ss 15:36 0:00 _ /usr/sbin/cron -f
1000102 20016 0.0 0.0 197636 940 ? Ssl 15:36 0:00 _ /usr/sbin/rsyslogd -n
1000000 20019 0.0 0.2 405648 5340 ? Ss 15:36 0:00 _ php-fpm: master process (/etc/php/7.2/fpm/php-fpm.conf)
1000033 22117 0.0 0.1 407944 3400 ? S 15:36 0:00 | _ php-fpm: pool www
1000033 22118 0.0 0.1 407944 3400 ? S 15:36 0:00 | _ php-fpm: pool www
1000001 20020 0.0 0.0 28332 460 ? Ss 15:36 0:00 _ /usr/sbin/atd -f
1000000 20026 0.0 0.0 62116 1360 ? Ss 15:36 0:00 _ /lib/systemd/systemd-logind
1000103 20042 0.0 0.0 50100 1128 ? Ss 15:36 0:00 _ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1000000 20235 0.0 0.0 16412 584 ? Ss+ 15:36 0:00 _ /sbin/agetty -o -p – \u --noclear --keep-baud console 115200,38400,9600 linux
1000000 20348 0.0 0.3 187676 6328 ? Ssl 15:36 0:00 _ /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
1000000 20448 0.0 0.0 72300 900 ? Ss 15:36 0:00 _ /usr/sbin/sshd -D
1000000 20718 0.0 0.0 288884 1192 ? Ssl 15:36 0:00 _ /usr/lib/policykit-1/polkitd --no-debug
1000000 21773 0.0 0.0 84624 1128 ? Ss 15:36 0:00 _ /usr/sbin/apache2 -k start
1000033 22136 0.0 0.0 836924 1148 ? Sl 15:36 0:00 | _ /usr/sbin/apache2 -k start
1000033 22137 0.0 0.0 836924 908 ? Sl 15:36 0:00 | _ /usr/sbin/apache2 -k start
1000000 20973 0.0 0.1 29448 3220 ? Ss 16:51 0:00 _ tmux
1000000 20974 0.0 0.1 23008 2860 pts/0 Ss 16:51 0:00 _ -bash
1000000 20998 0.0 0.3 37536 7088 pts/0 S+ 16:51 0:01 _ python3 dns.py
#user# 7081 0.0 0.3 76792 7220 ? Ss 18:39 0:00 /lib/systemd/systemd --user
#user# 7082 0.0 0.0 263704 1828 ? S 18:39 0:00 _ (sd-pam)
root 8545 0.0 0.0 4640 1788 ? Ss 18:41 0:00 /bin/sh /snap/lxd/14623/commands/daemon.start
root 8662 0.8 5.5 1523764 112840 ? Sl 18:41 0:05 _ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd

ps fauxww on node1:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Apr15 0:01 [kthreadd]
root 4 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< Apr15 0:00 _ [mm_percpu_wq]
root 7 0.0 0.0 0 0 ? S Apr15 0:02 _ [ksoftirqd/0]
root 8 0.0 0.0 0 0 ? I Apr15 0:08 _ [rcu_sched]
root 9 0.0 0.0 0 0 ? I Apr15 0:00 _ [rcu_bh]
root 10 0.0 0.0 0 0 ? S Apr15 0:00 _ [migration/0]
root 11 0.0 0.0 0 0 ? S Apr15 0:00 _ [watchdog/0]
root 12 0.0 0.0 0 0 ? S Apr15 0:00 _ [cpuhp/0]
root 13 0.0 0.0 0 0 ? S Apr15 0:00 _ [kdevtmpfs]
root 14 0.0 0.0 0 0 ? I< Apr15 0:00 _ [netns]
root 15 0.0 0.0 0 0 ? S Apr15 0:00 _ [rcu_tasks_kthre]
root 16 0.0 0.0 0 0 ? S Apr15 0:00 _ [kauditd]
root 17 0.0 0.0 0 0 ? S Apr15 0:00 _ [khungtaskd]
root 18 0.0 0.0 0 0 ? S Apr15 0:00 _ [oom_reaper]
root 19 0.0 0.0 0 0 ? I< Apr15 0:00 _ [writeback]
root 20 0.0 0.0 0 0 ? S Apr15 0:00 _ [kcompactd0]
root 21 0.0 0.0 0 0 ? SN Apr15 0:00 _ [ksmd]
root 22 0.0 0.0 0 0 ? SN Apr15 0:00 _ [khugepaged]
root 23 0.0 0.0 0 0 ? I< Apr15 0:00 _ [crypto]
root 24 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kintegrityd]
root 25 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kblockd]
root 26 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ata_sff]
root 27 0.0 0.0 0 0 ? I< Apr15 0:00 _ [md]
root 28 0.0 0.0 0 0 ? I< Apr15 0:00 _ [edac-poller]
root 29 0.0 0.0 0 0 ? I< Apr15 0:00 _ [devfreq_wq]
root 30 0.0 0.0 0 0 ? I< Apr15 0:00 _ [watchdogd]
root 34 0.0 0.0 0 0 ? S Apr15 0:02 _ [kswapd0]
root 35 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kworker/u3:0]
root 36 0.0 0.0 0 0 ? S Apr15 0:00 _ [ecryptfs-kthrea]
root 78 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kthrotld]
root 79 0.0 0.0 0 0 ? I< Apr15 0:00 _ [acpi_thermal_pm]
root 80 0.0 0.0 0 0 ? S Apr15 0:00 _ [scsi_eh_0]
root 81 0.0 0.0 0 0 ? I< Apr15 0:00 _ [scsi_tmf_0]
root 82 0.0 0.0 0 0 ? S Apr15 0:00 _ [scsi_eh_1]
root 83 0.0 0.0 0 0 ? I< Apr15 0:00 _ [scsi_tmf_1]
root 89 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ipv6_addrconf]
root 98 0.0 0.0 0 0 ? I< Apr15 0:00 _ [kstrp]
root 115 0.0 0.0 0 0 ? I< Apr15 0:00 _ [charger_manager]
root 179 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ttm_swap]
root 184 0.0 0.0 0 0 ? I< Apr15 0:02 _ [kworker/0:1H]
root 283 0.0 0.0 0 0 ? I< Apr15 0:00 _ [raid5wq]
root 339 0.0 0.0 0 0 ? S Apr15 0:02 _ [jbd2/vda1-8]
root 340 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ext4-rsv-conver]
root 370 0.0 0.0 0 0 ? S Apr15 0:00 _ [hwrng]
root 413 0.0 0.0 0 0 ? I< Apr15 0:00 _ [iscsi_eh]
root 420 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ib-comp-wq]
root 423 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ib-comp-unb-wq]
root 424 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ib_mcast]
root 425 0.0 0.0 0 0 ? I< Apr15 0:00 _ [ib_nl_sa_wq]
root 432 0.0 0.0 0 0 ? I< Apr15 0:00 _ [rdma_cm]
root 498 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop0]
root 511 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop2]
root 513 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop3]
root 1094 0.0 0.0 0 0 ? I< Apr15 0:00 _ [dio/vda1]
root 1106 0.0 0.0 0 0 ? S< Apr15 0:00 _ [spl_system_task]
root 1107 0.0 0.0 0 0 ? S< Apr15 0:00 _ [spl_delay_taskq]
root 1108 0.0 0.0 0 0 ? S< Apr15 0:00 _ [spl_dynamic_tas]
root 1109 0.0 0.0 0 0 ? S< Apr15 0:00 _ [spl_kmem_cache]
root 1119 0.0 0.0 0 0 ? S< Apr15 0:00 _ [zvol]
root 1121 0.0 0.0 0 0 ? S Apr15 0:00 _ [arc_prune]
root 1130 0.0 0.0 0 0 ? S Apr15 0:01 _ [arc_reclaim]
root 1131 0.0 0.0 0 0 ? S Apr15 0:00 _ [dbu_evict]
root 1132 0.0 0.0 0 0 ? SN Apr15 0:01 _ [dbuf_evict]
root 1178 0.0 0.0 0 0 ? SN Apr15 0:02 _ [z_vdev_file]
root 1179 0.0 0.0 0 0 ? S Apr15 0:00 _ [l2arc_feed]
root 1296 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_null_iss]
root 1305 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_null_int]
root 1311 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_iss]
root 1316 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_0]
root 1320 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_1]
root 1321 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_2]
root 1322 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_3]
root 1323 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_4]
root 1324 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_5]
root 1325 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_6]
root 1326 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_rd_int_7]
root 1327 0.0 0.0 0 0 ? S< Apr15 0:09 _ [z_wr_iss]
root 1328 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_iss_h]
root 1329 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_0]
root 1330 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_1]
root 1331 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_2]
root 1332 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_3]
root 1333 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_4]
root 1334 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_5]
root 1335 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_6]
root 1336 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_7]
root 1337 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_wr_int_h]
root 1338 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_0]
root 1339 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_1]
root 1340 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_2]
root 1341 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_3]
root 1342 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_4]
root 1343 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_5]
root 1344 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_6]
root 1345 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_iss_7]
root 1346 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_fr_int]
root 1347 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_cl_iss]
root 1348 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_cl_int]
root 1349 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_ioctl_iss]
root 1350 0.0 0.0 0 0 ? S< Apr15 0:00 _ [z_ioctl_int]
root 1351 0.0 0.0 0 0 ? S Apr15 0:00 _ [z_zvol]
root 1352 0.0 0.0 0 0 ? S Apr15 0:00 _ [z_prefetch]
root 1353 0.0 0.0 0 0 ? S Apr15 0:00 _ [z_upgrade]
root 1354 0.0 0.0 0 0 ? S< Apr15 0:00 _ [metaslab_group_]
root 1361 0.0 0.0 0 0 ? SN Apr15 0:00 _ [dp_sync_taskq]
root 1363 0.0 0.0 0 0 ? SN Apr15 0:00 _ [dp_zil_clean_ta]
root 1365 0.0 0.0 0 0 ? S Apr15 0:00 _ [z_iput]
root 1461 0.0 0.0 0 0 ? S Apr15 0:00 _ [txg_quiesce]
root 1462 0.0 0.0 0 0 ? S Apr15 0:03 _ [txg_sync]
root 1463 0.0 0.0 0 0 ? S Apr15 0:00 _ [mmp]
root 31075 0.0 0.0 0 0 ? S< 05:53 0:00 _ [loop4]
root 19525 0.0 0.0 0 0 ? I 17:06 0:00 _ [kworker/0:0]
root 23377 0.0 0.0 0 0 ? I 18:28 0:00 _ [kworker/u2:2]
root 27564 0.0 0.0 0 0 ? I 18:39 0:00 _ [kworker/0:1]
root 31047 0.0 0.0 0 0 ? I 18:49 0:00 _ [kworker/u2:1]
root 3722 0.0 0.0 0 0 ? I 19:02 0:00 _ [kworker/0:2]
root 4573 0.0 0.0 0 0 ? I 19:04 0:00 _ [kworker/u2:0]
root 4852 0.0 0.0 0 0 ? SN 19:05 0:00 _ [z_vdev_file]
root 4889 0.0 0.0 0 0 ? SN 19:05 0:00 _ [z_vdev_file]
root 4914 0.0 0.0 0 0 ? SN 19:05 0:00 _ [z_vdev_file]
root 4915 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4916 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4917 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4918 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4919 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4920 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_7]
root 4921 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_7]
root 4923 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_6]
root 4924 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4925 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_7]
root 4926 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_7]
root 4927 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_2]
root 4929 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_1]
root 4930 0.0 0.0 0 0 ? S< 19:05 0:00 _ [z_wr_int_1]
root 1 0.0 0.4 159896 8632 ? Ss Apr15 0:06 /sbin/init
root 405 0.0 4.0 280460 83368 ? S<s Apr15 0:15 /lib/systemd/systemd-journald
root 414 0.0 0.0 97708 1720 ? Ss Apr15 0:00 /sbin/lvmetad -f
root 416 0.0 0.1 45400 3836 ? Ss Apr15 0:01 /lib/systemd/systemd-udevd
systemd+ 531 0.0 0.2 80180 5696 ? Ss Apr15 0:00 /lib/systemd/systemd-networkd
systemd+ 544 0.0 0.1 141936 3288 ? Ssl Apr15 0:00 /lib/systemd/systemd-timesyncd
systemd+ 562 0.0 0.2 70768 5836 ? Ss Apr15 0:00 /lib/systemd/systemd-resolved
daemon 612 0.0 0.1 28332 2148 ? Ss Apr15 0:00 /usr/sbin/atd -f
root 616 0.0 0.1 644544 2628 ? Ssl Apr15 0:00 /usr/bin/lxcfs /var/lib/lxcfs/
message+ 618 0.0 0.1 50092 3932 ? Ss Apr15 0:05 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
syslog 622 0.0 0.2 263032 4444 ? Ssl Apr15 0:03 /usr/sbin/rsyslogd -n
root 623 0.0 0.1 31620 2704 ? Ss Apr15 0:00 /usr/sbin/cron -f
root 624 0.0 0.2 287848 5364 ? Ssl Apr15 0:04 /usr/lib/accountsservice/accounts-daemon
root 625 0.0 0.2 70652 5892 ? Ss Apr15 0:00 /lib/systemd/systemd-logind
root 627 0.0 0.7 170732 15020 ? Ssl Apr15 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 628 0.0 0.9 647580 19252 ? Ssl Apr15 0:09 /usr/lib/snapd/snapd
root 639 0.0 0.2 291452 5508 ? Ssl Apr15 0:03 /usr/lib/policykit-1/polkitd --no-debug
root 646 0.0 0.8 187540 17108 ? Ssl Apr15 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root 676 0.0 0.0 16480 1624 tty1 Ss+ Apr15 0:00 /sbin/agetty -o -p – \u --noclear tty1 linux
root 679 0.0 0.3 72300 6140 ? Ss Apr15 0:03 /usr/sbin/sshd -D
root 4308 0.0 0.2 72300 5604 ? Ss 19:04 0:00 _ sshd: [accepted]
sshd 4333 0.0 0.1 72300 2836 ? S 19:04 0:00 | _ sshd: [net]
root 4726 0.0 0.3 112280 7316 ? Ss 19:05 0:00 _ sshd: #user# [priv]
#user# 4818 0.0 0.1 112280 3668 ? S 19:05 0:00 _ sshd: #user#@pts/0
#user# 4819 0.2 0.2 23064 5124 pts/0 Ss 19:05 0:00 _ -bash
#user# 4936 0.0 0.1 40116 3856 pts/0 R+ 19:05 0:00 _ ps fauxww
root 10536 0.0 0.0 4640 960 ? Ss 16:49 0:00 /bin/sh /snap/lxd/14611/commands/daemon.start
root 10661 0.7 5.2 1458032 106296 ? Sl 16:49 0:59 _ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd 10748 0.0 0.0 49968 1724 ? Ss 16:49 0:00 _ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.36.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.36.0.2,240.36.0.254,1h -s lxd -S /lxd/240.36.0.1#1053 --rev-server=240.0.0.0/8,240.36.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root 10749 0.0 1.4 1157476 29776 ? Ssl 16:49 0:02 _ /snap/lxd/current/bin/lxd forkdns 240.36.0.1:1053 lxd lxdfan0
root 10649 0.0 0.0 311076 1168 ? Sl 16:49 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 10866 0.0 0.7 1090984 15548 ? Ss 16:50 0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers mysql
1000000 10879 0.0 0.3 159536 6132 ? Ss 16:50 0:00 _ /sbin/init
1000000 11001 0.0 0.3 78456 7488 ? Ss 16:50 0:00 _ /lib/systemd/systemd-journald
1000000 11032 0.0 0.0 42108 1832 ? Ss 16:50 0:00 _ /lib/systemd/systemd-udevd
1000100 11293 0.0 0.1 80056 3608 ? Ss 16:50 0:00 _ /lib/systemd/systemd-networkd
1000101 11311 0.0 0.1 70640 3632 ? Ss 16:50 0:01 _ /lib/systemd/systemd-resolved
1000000 11393 0.0 0.1 62116 3604 ? Ss 16:50 0:00 _ /lib/systemd/systemd-logind
1000001 11394 0.0 0.0 28332 1228 ? Ss 16:50 0:00 _ /usr/sbin/atd -f
1000000 11398 0.0 0.2 287996 4256 ? Ssl 16:50 0:00 _ /usr/lib/accountsservice/accounts-daemon
1000103 11400 0.0 0.1 50052 2492 ? Ss 16:50 0:00 _ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1000102 11420 0.0 0.1 197636 2688 ? Ssl 16:50 0:00 _ /usr/sbin/rsyslogd -n
1000000 11421 0.0 0.5 170832 11972 ? Ssl 16:50 0:00 _ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1000000 11423 0.0 0.0 31748 1632 ? Ss 16:50 0:00 _ /usr/sbin/cron -f
1000000 11443 0.0 0.0 16412 1344 pts/0 Ss+ 16:50 0:00 _ /sbin/agetty -o -p – \u --noclear --keep-baud console 115200,38400,9600 linux
1000000 11450 0.0 0.6 187676 13044 ? Ssl 16:50 0:00 _ /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
1000000 11462 0.0 0.1 72300 3368 ? Ss 16:50 0:00 _ /usr/sbin/sshd -D
1000000 11478 0.0 0.2 288884 4208 ? Ssl 16:50 0:00 _ /usr/lib/policykit-1/polkitd --no-debug
1000111 11555 0.6 24.3 1383048 497408 ? Ssl 16:50 0:53 _ /usr/sbin/mysqld
1000000 12420 0.0 0.1 29440 3420 ? Ss 16:52 0:01 _ tmux
1000000 12421 0.0 0.1 23008 3860 pts/0 Ss 16:52 0:00 _ -bash
1000000 12442 0.0 0.3 37600 7060 pts/0 S+ 16:52 0:01 _ python3 dns.py
#user# 4728 0.0 0.3 76668 7548 ? Ss 19:05 0:00 /lib/systemd/systemd --user
#user# 4729 0.0 0.1 198044 2640 ? S 19:05 0:00 _ (sd-pam)

ps fauxww on node2:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Apr13 0:32 [kthreadd]
root 4 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< Apr13 0:00 _ [mm_percpu_wq]
root 7 0.0 0.0 0 0 ? S Apr13 0:17 _ [ksoftirqd/0]
root 8 0.0 0.0 0 0 ? I Apr13 0:35 _ [rcu_sched]
root 9 0.0 0.0 0 0 ? I Apr13 0:00 _ [rcu_bh]
root 10 0.0 0.0 0 0 ? S Apr13 0:00 _ [migration/0]
root 11 0.0 0.0 0 0 ? S Apr13 0:00 _ [watchdog/0]
root 12 0.0 0.0 0 0 ? S Apr13 0:00 _ [cpuhp/0]
root 13 0.0 0.0 0 0 ? S Apr13 0:00 _ [kdevtmpfs]
root 14 0.0 0.0 0 0 ? I< Apr13 0:00 _ [netns]
root 15 0.0 0.0 0 0 ? S Apr13 0:00 _ [rcu_tasks_kthre]
root 16 0.0 0.0 0 0 ? S Apr13 0:00 _ [kauditd]
root 17 0.0 0.0 0 0 ? S Apr13 0:00 _ [khungtaskd]
root 18 0.0 0.0 0 0 ? S Apr13 0:00 _ [oom_reaper]
root 19 0.0 0.0 0 0 ? I< Apr13 0:00 _ [writeback]
root 20 0.0 0.0 0 0 ? S Apr13 0:00 _ [kcompactd0]
root 21 0.0 0.0 0 0 ? SN Apr13 0:00 _ [ksmd]
root 22 0.0 0.0 0 0 ? SN Apr13 0:04 _ [khugepaged]
root 23 0.0 0.0 0 0 ? I< Apr13 0:00 _ [crypto]
root 24 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kintegrityd]
root 25 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kblockd]
root 26 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ata_sff]
root 27 0.0 0.0 0 0 ? I< Apr13 0:00 _ [md]
root 28 0.0 0.0 0 0 ? I< Apr13 0:00 _ [edac-poller]
root 29 0.0 0.0 0 0 ? I< Apr13 0:00 _ [devfreq_wq]
root 30 0.0 0.0 0 0 ? I< Apr13 0:00 _ [watchdogd]
root 34 0.0 0.0 0 0 ? S Apr13 0:50 _ [kswapd0]
root 35 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kworker/u3:0]
root 36 0.0 0.0 0 0 ? S Apr13 0:00 _ [ecryptfs-kthrea]
root 78 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kthrotld]
root 79 0.0 0.0 0 0 ? I< Apr13 0:00 _ [acpi_thermal_pm]
root 80 0.0 0.0 0 0 ? S Apr13 0:00 _ [scsi_eh_0]
root 81 0.0 0.0 0 0 ? I< Apr13 0:00 _ [scsi_tmf_0]
root 82 0.0 0.0 0 0 ? S Apr13 0:00 _ [scsi_eh_1]
root 83 0.0 0.0 0 0 ? I< Apr13 0:00 _ [scsi_tmf_1]
root 89 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ipv6_addrconf]
root 98 0.0 0.0 0 0 ? I< Apr13 0:00 _ [kstrp]
root 115 0.0 0.0 0 0 ? I< Apr13 0:00 _ [charger_manager]
root 184 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ttm_swap]
root 191 0.0 0.0 0 0 ? I< Apr13 2:08 _ [kworker/0:1H]
root 283 0.0 0.0 0 0 ? I< Apr13 0:00 _ [raid5wq]
root 339 0.0 0.0 0 0 ? S Apr13 2:52 _ [jbd2/vda1-8]
root 340 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ext4-rsv-conver]
root 370 0.0 0.0 0 0 ? S Apr13 0:00 _ [hwrng]
root 411 0.0 0.0 0 0 ? I< Apr13 0:00 _ [iscsi_eh]
root 414 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ib-comp-wq]
root 415 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ib-comp-unb-wq]
root 416 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ib_mcast]
root 420 0.0 0.0 0 0 ? I< Apr13 0:00 _ [ib_nl_sa_wq]
root 427 0.0 0.0 0 0 ? I< Apr13 0:00 _ [rdma_cm]
root 2007 0.0 0.0 0 0 ? S< Apr13 0:00 _ [loop0]
root 2997 0.0 0.0 0 0 ? I< Apr13 0:00 _ [dio/vda1]
root 3120 0.0 0.0 0 0 ? S< Apr13 0:00 _ [spl_system_task]
root 3121 0.0 0.0 0 0 ? S< Apr13 0:00 _ [spl_delay_taskq]
root 3122 0.0 0.0 0 0 ? S< Apr13 0:12 _ [spl_dynamic_tas]
root 3123 0.0 0.0 0 0 ? S< Apr13 0:03 _ [spl_kmem_cache]
root 3130 0.0 0.0 0 0 ? S< Apr13 0:00 _ [zvol]
root 3133 0.0 0.0 0 0 ? S Apr13 0:00 _ [arc_prune]
root 3140 0.0 0.0 0 0 ? S Apr13 0:04 _ [arc_reclaim]
root 3141 0.0 0.0 0 0 ? S Apr13 0:00 _ [dbu_evict]
root 3142 0.0 0.0 0 0 ? SN Apr13 0:04 _ [dbuf_evict]
root 3143 0.0 0.0 0 0 ? SN Apr13 2:45 _ [z_vdev_file]
root 3144 0.0 0.0 0 0 ? S Apr13 0:03 _ [l2arc_feed]
root 3150 0.0 0.0 0 0 ? S< Apr13 0:03 _ [z_null_iss]
root 3151 0.0 0.0 0 0 ? S< Apr13 0:21 _ [z_null_int]
root 3152 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_iss]
root 3153 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_0]
root 3154 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_1]
root 3155 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_2]
root 3156 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_3]
root 3157 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_4]
root 3158 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_5]
root 3159 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_6]
root 3160 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_rd_int_7]
root 3161 0.0 0.0 0 0 ? S< Apr13 2:53 _ [z_wr_iss]
root 3162 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_wr_iss_h]
root 3163 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_0]
root 3164 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_1]
root 3165 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_2]
root 3166 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_3]
root 3167 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_4]
root 3168 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_5]
root 3169 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_6]
root 3170 0.0 0.0 0 0 ? S< Apr13 0:13 _ [z_wr_int_7]
root 3171 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_wr_int_h]
root 3172 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_0]
root 3173 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_1]
root 3174 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_2]
root 3175 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_3]
root 3176 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_4]
root 3177 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_5]
root 3178 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_6]
root 3179 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_iss_7]
root 3180 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_fr_int]
root 3181 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_cl_iss]
root 3182 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_cl_int]
root 3183 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_ioctl_iss]
root 3184 0.0 0.0 0 0 ? S< Apr13 0:00 _ [z_ioctl_int]
root 3185 0.0 0.0 0 0 ? S Apr13 0:00 _ [z_zvol]
root 3186 0.0 0.0 0 0 ? S Apr13 0:00 _ [z_prefetch]
root 3187 0.0 0.0 0 0 ? S Apr13 0:00 _ [z_upgrade]
root 3188 0.0 0.0 0 0 ? S< Apr13 0:00 _ [metaslab_group_]
root 3194 0.0 0.0 0 0 ? SN Apr13 0:05 _ [dp_sync_taskq]
root 3196 0.0 0.0 0 0 ? SN Apr13 0:00 _ [dp_zil_clean_ta]
root 3198 0.0 0.0 0 0 ? S Apr13 0:00 _ [z_iput]
root 3199 0.0 0.0 0 0 ? S Apr13 0:00 _ [txg_quiesce]
root 3200 0.0 0.0 0 0 ? S Apr13 0:32 _ [txg_sync]
root 3201 0.0 0.0 0 0 ? S Apr13 0:03 _ [mmp]
root 941 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop2]
root 1045 0.0 0.0 0 0 ? S< Apr15 0:00 _ [loop3]
root 18590 0.0 0.0 0 0 ? S< 05:53 0:00 _ [loop4]
root 28016 0.0 0.0 0 0 ? I 16:09 0:00 _ [kworker/0:2]
root 20279 0.0 0.0 0 0 ? I 17:13 0:00 _ [kworker/0:0]
root 13394 0.0 0.0 0 0 ? I 18:15 0:00 _ [kworker/u2:1]
root 21999 0.0 0.0 0 0 ? I 18:36 0:00 _ [kworker/u2:0]
root 29608 0.0 0.0 0 0 ? I 18:54 0:00 _ [kworker/u2:2]
root 32402 0.0 0.0 0 0 ? S< 19:01 0:00 _ [z_wr_int_2]
root 32407 0.0 0.0 0 0 ? S< 19:01 0:00 _ [z_wr_int_2]
root 32408 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32409 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32410 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32411 0.0 0.0 0 0 ? S< 19:01 0:00 _ [z_wr_int_2]
root 32414 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32415 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32416 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32419 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32420 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32421 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 32422 0.0 0.0 0 0 ? SN 19:01 0:00 _ [z_vdev_file]
root 1 0.0 0.2 159976 5968 ? Ss Apr13 0:12 /sbin/init
root 399 0.0 1.5 155256 31372 ? S<s Apr13 0:51 /lib/systemd/systemd-journald
root 409 0.0 0.0 97708 176 ? Ss Apr13 0:00 /sbin/lvmetad -f
root 412 0.0 0.1 45396 2556 ? Ss Apr13 0:01 /lib/systemd/systemd-udevd
systemd+ 461 0.0 0.0 141936 1696 ? Ssl Apr13 0:00 /lib/systemd/systemd-timesyncd
systemd+ 486 0.0 0.0 70760 1568 ? Ss Apr13 0:00 /lib/systemd/systemd-resolved
root 513 0.0 0.0 31620 916 ? Ss Apr13 0:00 /usr/sbin/cron -f
root 514 0.0 0.4 170732 9140 ? Ssl Apr13 0:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 515 0.0 0.0 287848 1856 ? Ssl Apr13 0:15 /usr/lib/accountsservice/accounts-daemon
syslog 516 0.0 0.0 263032 1668 ? Ssl Apr13 0:12 /usr/sbin/rsyslogd -n
root 518 0.0 0.1 70660 2964 ? Ss Apr13 0:00 /lib/systemd/systemd-logind
daemon 519 0.0 0.0 28332 204 ? Ss Apr13 0:00 /usr/sbin/atd -f
root 521 0.0 0.1 644412 2144 ? Ssl Apr13 0:02 /usr/bin/lxcfs /var/lib/lxcfs/
message+ 522 0.0 0.0 50212 1944 ? Rs Apr13 0:11 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 540 0.0 0.0 291452 1960 ? Ssl Apr13 0:06 /usr/lib/policykit-1/polkitd --no-debug
root 547 0.0 0.3 187540 7904 ? Ssl Apr13 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root 731 0.0 0.0 16480 132 tty1 Ss+ Apr13 0:00 /sbin/agetty -o -p – \u --noclear tty1 linux
root 733 0.0 0.0 72300 1316 ? Ss Apr13 0:14 /usr/sbin/sshd -D
root 27503 0.0 0.3 112272 7084 ? Ss 18:49 0:00 _ sshd: #user# [priv]
#user# 27595 0.0 0.2 112272 4456 ? S 18:49 0:00 | _ sshd: #user#@pts/0
#user# 27596 0.0 0.2 23064 4708 pts/0 Ss 18:49 0:00 | _ -bash
#user# 32426 0.0 0.1 40288 3736 pts/0 R+ 19:01 0:00 | _ ps fauxww
root 32424 0.0 0.3 112176 7048 ? Ss 19:01 0:00 _ sshd: unknown [priv]
sshd 32425 0.0 0.1 72300 2836 ? S 19:01 0:00 _ sshd: unknown [net]
root 2070 0.0 0.8 647580 17228 ? Ssl Apr13 0:24 /usr/lib/snapd/snapd
systemd+ 2727 0.0 0.0 80180 1656 ? Ss Apr13 0:00 /lib/systemd/systemd-networkd
root 14478 0.0 0.0 4640 1104 ? Ss 15:39 0:00 /bin/sh /snap/lxd/14611/commands/daemon.start
root 14603 0.4 4.6 1466384 94072 ? Sl 15:39 0:56 _ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd 14698 0.0 0.0 49968 1576 ? Ss 15:39 0:00 _ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --no-ping --interface=lxdfan0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=240.100.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.hosts --dhcp-range 240.100.0.2,240.100.0.254,1h -s lxd -S /lxd/240.100.0.1#1053 --rev-server=240.0.0.0/8,240.100.0.1#1053 --conf-file=/var/snap/lxd/common/lxd/networks/lxdfan0/dnsmasq.raw -u lxd
root 14699 0.0 0.9 1157476 19912 ? Ssl 15:39 0:03 _ /snap/lxd/current/bin/lxd forkdns 240.100.0.1:1053 lxd lxdfan0
root 14591 0.0 0.0 311076 1060 ? Sl 15:39 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 14748 0.0 0.7 1090984 14560 ? Ss 15:39 0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers ubuntu
1000000 14763 0.0 0.1 225052 3884 ? Ss 15:39 0:00 _ /sbin/init
1000000 15028 0.0 0.1 78448 2544 ? Ss 15:39 0:00 _ /lib/systemd/systemd-journald
1000000 15096 0.0 0.0 42108 1168 ? Ss 15:39 0:00 _ /lib/systemd/systemd-udevd
1000100 15298 0.0 0.0 80056 1692 ? Ss 15:39 0:00 _ /lib/systemd/systemd-networkd
1000101 15313 0.0 0.0 70640 1788 ? Ss 15:39 0:00 _ /lib/systemd/systemd-resolved
1000000 15406 0.0 0.0 62124 1416 ? Ss 15:39 0:00 _ /lib/systemd/systemd-logind
1000000 15407 0.0 0.0 31748 780 ? Ss 15:39 0:00 _ /usr/sbin/cron -f
1000000 15408 0.0 0.0 287996 1432 ? Ssl 15:39 0:00 _ /usr/lib/accountsservice/accounts-daemon
1000001 15409 0.0 0.0 28332 556 ? Ss 15:39 0:00 _ /usr/sbin/atd -f
1000000 15417 0.0 0.4 170828 9220 ? Ssl 15:39 0:00 _ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1000102 15418 0.0 0.0 197636 1112 ? Ssl 15:39 0:00 _ /usr/sbin/rsyslogd -n
1000103 15420 0.0 0.0 50104 1144 ? Ss 15:39 0:00 _ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1000000 15474 0.0 0.4 187628 8708 ? Ssl 15:39 0:00 _ /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
1000000 15489 0.0 0.0 288884 1408 ? Ssl 15:39 0:00 _ /usr/lib/policykit-1/polkitd --no-debug
1000000 15500 0.0 0.0 16412 640 pts/0 Ss+ 15:39 0:00 _ /sbin/agetty -o -p – \u --noclear --keep-baud console 115200,38400,9600 linux
1000000 15554 0.0 0.0 72300 1032 ? Ss 15:39 0:00 _ /usr/sbin/sshd -D
1000000 11339 0.0 0.1 29444 2468 ? Ss 16:53 0:00 _ tmux
1000000 11340 0.0 0.1 23136 2152 pts/0 Ss 16:53 0:00 _ -bash
1000000 11432 0.0 0.2 37600 6056 pts/0 S+ 16:53 0:01 _ python3 dns.py
#user# 27505 0.0 0.2 76660 6068 ? Ss 18:49 0:00 /lib/systemd/systemd --user
#user# 27506 0.0 0.1 198124 2704 ? S 18:49 0:00 _ (sd-pam)

New findings:
systemctl reload snap.lxd.daemon manually also breaks the network. It has nothing to do with snap refresh. Snap refresh just triggers a reload.

Reload put the specific node in a broken state. Restart will put the node into the operational state. If any of the source/target nodes is in the broken state, the connection would fail. I.e.

  • All nodes are operational
  • Reload node0: 0 broken 1/2 operational, connections 0<=>1, 0<=>2 broken, 1<=>2 operational
  • Reload node1: 0/1 broken 2 operational, all connections broken
  • Restart node 0: 1 broken 0/2 operational, connections 0<=>1, 1<=>2 broken, 0<=>2 operational

Now I have a stable repro of the problem. How should I proceed?

I’m still seeing the same issue even after upgrading to 4.0.1