Networking problem : unstable bridge connection

william.cocker · August 19, 2018, 11:40am

I wonder if it could be linked to the snap update process. Here are two log sequences, the first one when I simply restart the daemon and the second one when a snap refresh occurs (after a system restart for instance):

Daemon restart

Aug 19 06:00:02 lxd-metal lxd.daemon[7495]: => Stop reason is: host shutdown
Aug 19 06:00:02 lxd-metal lxd.daemon[7495]: => Stopping LXD (with container shutdown)
Aug 19 06:00:32 lxd-metal lxd.daemon[25500]: action=shutdown created=2018-07-27T15:43:16+0200 ephemeral=false lvl=eror msg="Failed shutting down container" name=quickbox t=2018-08-19T06:00:32+0200 timeout=30s used=2018-08-18T21:51:06+0200
Aug 19 06:00:33 lxd-metal lxd.daemon[25500]: No such file or directory - Failed to receive file descriptor
Aug 19 06:00:33 lxd-metal lxd.daemon[25500]: lvl=warn msg="Unable to update backup.yaml at this time" name=quickbox t=2018-08-19T06:00:33+0200
Aug 19 06:00:33 lxd-metal lxd.daemon[25500]: => LXD exited cleanly
Aug 19 06:00:33 lxd-metal lxd.daemon[7495]: => Stopping LXCFS
Aug 19 06:00:34 lxd-metal lxd.daemon[7495]: umount: /var/snap/lxd/common/ns: not mounted
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: => Preparing the system
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Loading snap configuration
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Setting up mntns symlink (mnt:[4026532197])
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Setting up kmod wrapper
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Preparing /boot
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Preparing a clean copy of /run
Aug 19 06:00:34 lxd-metal lxd.daemon[7757]: ==> Preparing a clean copy of /etc
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Setting up ceph configuration
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Setting up LVM configuration
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Rotating logs
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Setting up ZFS (0.6)
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Escaping the systemd cgroups
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: ==> Escaping the systemd process resource limits
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: => Starting LXCFS
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: => Starting LXD
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: mount namespace: 5
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: hierarchies:
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   0: fd:   6: devices
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   1: fd:   7: memory
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   2: fd:   8: perf_event
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   3: fd:   9: blkio
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   4: fd:  10: freezer
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   5: fd:  11: pids
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   6: fd:  12: cpuset
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   7: fd:  13: hugetlb
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   8: fd:  14: cpu,cpuacct
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:   9: fd:  15: net_cls,net_prio
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]:  10: fd:  16: name=systemd
Aug 19 06:00:35 lxd-metal lxd.daemon[7757]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-08-19T06:00:35+0200
Aug 19 06:00:36 lxd-metal lxd.daemon[7757]: lvl=warn msg="Unable to update backup.yaml at this time" name=quickbox t=2018-08-19T06:00:36+0200
Aug 19 06:00:37 lxd-metal lxd.daemon[7757]: => LXD is ready

Snap refresh

Aug 18 06:02:17 lxd-metal lxd.daemon[2424]: => Stop reason is: snap refresh
Aug 18 06:02:17 lxd-metal lxd.daemon[2424]: => Stopping LXD
Aug 18 06:02:17 lxd-metal lxd.daemon[914]: lvl=warn msg="Failed to update instance types: Get https://images.linuxcontainers.org/meta/instance-types/.yaml: context canceled" t=2018-08-18T06:02:17+0200
Aug 18 06:02:17 lxd-metal lxd.daemon[914]: => LXD exited cleanly
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: => Preparing the system
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Loading snap configuration
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Setting up mntns symlink (mnt:[4026532197])
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Setting up kmod wrapper
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Preparing /boot
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Preparing a clean copy of /run
Aug 18 06:02:23 lxd-metal lxd.daemon[3514]: ==> Preparing a clean copy of /etc
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Setting up ceph configuration
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Setting up LVM configuration
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Rotating logs
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Setting up ZFS (0.6)
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Escaping the systemd cgroups
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: ==> Escaping the systemd process resource limits
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: => Re-using existing LXCFS
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: => Starting LXD
Aug 18 06:02:24 lxd-metal lxd.daemon[3514]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-08-18T06:02:24+0200
Aug 18 06:02:25 lxd-metal lxd.daemon[914]: mount namespace: 7
Aug 18 06:02:25 lxd-metal lxd.daemon[914]: hierarchies:
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   0: fd:   8: devices
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   1: fd:   9: memory
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   2: fd:  10: perf_event
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   3: fd:  11: blkio
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   4: fd:  12: freezer
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   5: fd:  13: pids
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   6: fd:  14: cpuset
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   7: fd:  15: hugetlb
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   8: fd:  16: cpu,cpuacct
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:   9: fd:  17: net_cls,net_prio
Aug 18 06:02:25 lxd-metal lxd.daemon[914]:  10: fd:  18: name=systemd
Aug 18 06:02:25 lxd-metal lxd.daemon[914]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Aug 18 06:02:48 lxd-metal lxd.daemon[3514]: _err="Failed to run: dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.10.10.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.10.10.2,10.10.10.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd: dnsmasq: failed to create listening socket for fe80::6c00:d8ff:fea8:6782%lxdbr0: Address already in use" lvl=eror msg="Failed to bring up network" name=lxdbr0 t=2018-08-18T06:02:48+0200_
Aug 18 06:02:48 lxd-metal lxd.daemon[3514]: => LXD is ready

So this error “err=“Failed to run: dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.10.10.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.10.10.2,10.10.10.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd: dnsmasq: failed to create listening socket for fe80::6c00:d8ff:fea8:6782%lxdbr0: Address already in use” lvl=eror msg=“Failed to bring up network” name=lxdbr0 t=2018-08-18T06:02:48+0200_” seems like a clue but I have no idea on how to further diagnose this.

And this is not a new problem with LXD 3.4, I’ve been dealing with this ever since I started using this server three months ago (so I guess LXD was at v3.1 back then).

I’ve searched this forum and of course asked Google but found nothing similar which really makes me think it is related to my server configuration.

The closest issue I found was this thread and its associated Github issue. But it’s not really the same since most of the time dnsmasq works normally.

Just in case you ask, here is further information.

ls -lh /etc/resolv.conf
lrwxrwxrwx 1 root root 29 Jul 27 15:20 /etc/resolv.conf -> ../run/resolvconf/resolv.conf

readlink -f /etc/resolv.conf
/run/resolvconf/resolv.conf

ls -lh /proc/$(pgrep daemon.start)/root/etc/
total 20K
lrwxrwxrwx 1 root root 35 Aug 19 06:00 alternatives -> /snap/core/current/etc/alternatives
lrwxrwxrwx 1 root root 31 Aug 19 06:00 apparmor -> /snap/core/current/etc/apparmor
lrwxrwxrwx 1 root root 33 Aug 19 06:00 apparmor.d -> /snap/core/current/etc/apparmor.d
lrwxrwxrwx 1 root root 30 Aug 19 06:00 ceph -> /var/lib/snapd/hostfs/etc/ceph
lrwxrwxrwx 1 root root 31 Aug 19 06:00 group -> /var/lib/snapd/hostfs/etc/group
lrwxrwxrwx 1 root root 34 Aug 19 06:00 hostname -> /var/lib/snapd/hostfs/etc/hostname
lrwxrwxrwx 1 root root 31 Aug 19 06:00 hosts -> /var/lib/snapd/hostfs/etc/hosts
-rw-r–r-- 1 root root 14K Aug 19 06:00 ld.so.cache
lrwxrwxrwx 1 root root 35 Aug 19 06:00 localtime -> /var/lib/snapd/hostfs/etc/localtime
-rw-r–r-- 1 root root 86 Aug 19 06:00 logrotate.status
drwxr-xr-x 2 root root 60 Aug 19 06:00 lvm
lrwxrwxrwx 1 root root 12 Aug 19 06:00 mtab -> /proc/mounts
lrwxrwxrwx 1 root root 39 Aug 19 06:00 nsswitch.conf -> /var/lib/snapd/hostfs/etc/nsswitch.conf
lrwxrwxrwx 1 root root 36 Aug 19 06:00 os-release -> /var/lib/snapd/hostfs/etc/os-release
lrwxrwxrwx 1 root root 32 Aug 19 06:00 passwd -> /var/lib/snapd/hostfs/etc/passwd
lrwxrwxrwx 1 root root 37 Aug 19 06:00 resolv.conf -> /var/lib/snapd/hostfs/etc/resolv.conf
lrwxrwxrwx 1 root root 36 Aug 19 06:00 resolvconf -> /var/lib/snapd/hostfs/etc/resolvconf
lrwxrwxrwx 1 root root 29 Aug 19 06:00 ssl -> /var/lib/snapd/hostfs/etc/ssl
lrwxrwxrwx 1 root root 34 Aug 19 06:00 timezone -> /var/lib/snapd/hostfs/etc/timezone
lrwxrwxrwx 1 root root 26 Aug 19 06:00 vim -> /snap/core/current/etc/vim

ls -lh /proc/$(pgrep daemon.start)/root/run/
total 0
drwxr-xr-x 3 root root 60 Aug 19 06:00 lxc
drwx------ 4 root root 80 Aug 19 06:00 lxcfs
drwxr-xr-x 2 root root 60 Aug 19 06:00 mount
lrwxrwxrwx 1 root root 37 Aug 19 06:00 openvswitch -> /var/lib/snapd/hostfs/run/openvswitch
lrwxrwxrwx 1 root root 36 Aug 19 06:00 resolvconf -> /var/lib/snapd/hostfs/run/resolvconf
lrwxrwxrwx 1 root root 31 Aug 19 06:00 snapd -> /var/lib/snapd/hostfs/run/snapd
lrwxrwxrwx 1 root root 43 Aug 19 06:00 snapd-snap.socket -> /var/lib/snapd/hostfs/run/snapd-snap.socket
lrwxrwxrwx 1 root root 38 Aug 19 06:00 snapd.socket -> /var/lib/snapd/hostfs/run/snapd.socket
lrwxrwxrwx 1 root root 33 Aug 19 06:00 systemd -> /var/lib/snapd/hostfs/run/systemd
-rw------- 1 root root 0 Aug 19 06:00 xtables.lock

And the logs as recorded in /var/snap/lxd/common/lxd/logs (instead of journalctl output)
lvl=info msg=“LXD 3.4 is starting in normal mode” path=/var/snap/lxd/common/lxd t=2018-08-18T06:02:24+0200
lvl=info msg=“Kernel uid/gid map:” t=2018-08-18T06:02:24+0200
lvl=info msg=" - u 0 0 4294967295" t=2018-08-18T06:02:24+0200
lvl=info msg=" - g 0 0 4294967295" t=2018-08-18T06:02:24+0200
lvl=info msg=“Configured LXD uid/gid map:” t=2018-08-18T06:02:24+0200
lvl=info msg=" - u 0 1000000 1000000000" t=2018-08-18T06:02:24+0200
lvl=info msg=" - g 0 1000000 1000000000" t=2018-08-18T06:02:24+0200
lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.” t=2018-08-18T06:02:24+0200
lvl=info msg=“Initializing local database” t=2018-08-18T06:02:24+0200
lvl=info msg=“Initializing database gateway” t=2018-08-18T06:02:25+0200
address= id=1 lvl=info msg=“Start database node” t=2018-08-18T06:02:25+0200
lvl=info msg=“Raft: Restored from snapshot 1-133744-1534564769446” t=2018-08-18T06:02:25+0200
lvl=info msg=“Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]” t=2018-08-18T06:02:25+0200
lvl=info msg=“Dqlite: starting event loop” t=2018-08-18T06:02:25+0200
lvl=info msg=“LXD isn’t socket activated” t=2018-08-18T06:02:25+0200
lvl=info msg=“Starting /dev/lxd handler:” t=2018-08-18T06:02:25+0200
lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-08-18T06:02:25+0200
lvl=info msg=“REST API daemon:” t=2018-08-18T06:02:25+0200
lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-08-18T06:02:25+0200
lvl=info msg=“Initializing global database” t=2018-08-18T06:02:25+0200
lvl=info msg=“Dqlite: handling new connection (fd=20)” t=2018-08-18T06:02:25+0200
lvl=info msg=“Raft: Node at 0 [Leader] entering Leader state” t=2018-08-18T06:02:25+0200
lvl=info msg=“Dqlite: connected address=0 attempt=0” t=2018-08-18T06:02:25+0200
lvl=info msg=“Initializing storage pools” t=2018-08-18T06:02:25+0200
lvl=info msg=“Initializing networks” t=2018-08-18T06:02:25+0200
err=“Failed to run: dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.10.10.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.10.10.2,10.10.10.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd: dnsmasq: failed to create listening socket for fe80::6c00:d8ff:fea8:6782%lxdbr0: Address already in use” lvl=eror msg=“Failed to bring up network” name=lxdbr0 t=2018-08-18T06:02:48+0200
lvl=info msg=“Pruning leftover image files” t=2018-08-18T06:02:48+0200
lvl=info msg=“Done pruning leftover image files” t=2018-08-18T06:02:48+0200
lvl=info msg=“Loading configuration” t=2018-08-18T06:02:48+0200
lvl=info msg=“Connected to MAAS controller” t=2018-08-18T06:02:48+0200
lvl=info msg=“Pruning expired images” t=2018-08-18T06:02:48+0200
lvl=info msg=“Done pruning expired images” t=2018-08-18T06:02:48+0200
lvl=info msg=“Expiring log files” t=2018-08-18T06:02:48+0200
lvl=info msg=“Updating instance types” t=2018-08-18T06:02:48+0200
lvl=info msg=“Done expiring log files” t=2018-08-18T06:02:48+0200
lvl=info msg=“Updating images” t=2018-08-18T06:02:48+0200
lvl=info msg=“Done updating images” t=2018-08-18T06:02:48+0200
lvl=info msg=“Done updating instance types” t=2018-08-18T06:02:52+0200