Containers suddenly stopped working since move to core20 snap - No more IP's assigned

Skyrider · June 17, 2021, 10:06am

After months where the containers were up and running without a problem, they all stopped working as of last night. Usually, a system reboot works and or restarting the entire LXD (snap version, running v4.15).

But even when they aren’t working. I can “start” them up just fine, but no IP address is being assigned:

Example container info:

architecture: x86_64
config:
image.architecture: x86_64
image.description: Ubuntu 18.04 LTS minimal (20200506)
image.os: ubuntu
image.release: bionic
volatile.base_image: 572979f0119c180392944f756f3aa6e402ae7c11ec3380fc2e465b2cc76e309d
volatile.eth0.host_name: vethe7b3dc8d
volatile.eth0.hwaddr: 00:16:3e:50:c3:f7
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.power: RUNNING
volatile.uuid: b31837cd-d4b6-4024-b188-bd50eff94a6d
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: znc
type: disk
znc:
connect: tcp:127.0.0.1:xxx
listen: tcp:0.0.0.0:xxx
type: proxy

lxdbr0 that is being used managed mode:

config:
ipv4.address: 10.248.110.1/24
ipv4.nat: “true”
ipv6.address: fd42:78f0:8cd8:9b63::1/64
ipv6.nat: “true”
raw.dnsmasq: |
auth-zone=lxd
dns-loop-detect
description: “”
name: lxdbr0
type: bridge
used_by:

/1.0/instances/znc
managed: true
status: Created
locations:

none

Any idea what might be wrong? Tomp mentioned to check the log files, though I have no idea where the log file is located.

This isn’t the first time btw that containers just stopped working, happened a few times in the past few months. But as I mentioned above, restart usually fixed that. Not this time though.

tomp · June 17, 2021, 10:22am

Try:

sudo grep /var/snap/lxd/common/lxd/logs/lxd.log dnsmasq

tomp · June 17, 2021, 10:23am

What OS host version are you using?

Skyrider · June 17, 2021, 10:27am

Getting: dnsmasq: No such file or directory.

Can’t find anything useful in logs though: t=2021-06-17T08:30:47+0200 lvl=info msg="LXD 4.15 is starting in normal mode" pa - Pastebin.com

As for the OS. Sorry, forgot to mention that. Ubuntu 20.04.2 LTS

tomp · June 17, 2021, 10:28am

That would be it:

t=2021-06-17T08:30:48+0200 lvl=eror msg="The dnsmasq process exited prematurely" driver=bridge err="Process exited with non-zero value 1" network=lxdbr0 project=default

tomp · June 17, 2021, 10:30am

It seems coincidental and might be related to:

github.com/lxc/lxd

dnsmask process exited prematurely

opened 09:09AM - 17 Jun 21 UTC

geodb27

Bug

Hi, * Distribution: Ubuntu * Distribution version: 18.04.5 LTS (Bionic Be…aver) * The output of "lxc info" or if that fails: * Kernel version: 4.15.0-112-generic * LXC version: 4.15 * LXD version: 4.15 * Storage backend in use: ceph There might have been an update that broke the dnsmasq binary that is provided by the last lxd snap. The 'snap start lxd' emits the following error : lvl=eror msg="The dnsmasq process exited prematurely" driver=bridge err="Process exited with non-zero value 1" network=lxdbr0 project=default I've dug further and here is what I did : - nsenter -t $(cat /var/snap/lxd/common/lxd.pid) -n - cd /snap/lxd/current/bin - ./dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr1 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.124.111.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.hosts --dhcp-range 10.124.111.2,10.124.111.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.raw -u lxd -g lxd - ./dnsmasq: error while loading shared libraries: libnettle.so.7: cannot open shared object file: No such file or directory To my knoledge, but I admit that I don't know that much about snap, the problem resides that I have an ubuntu 18.04 that uses snap core18 and lxd-4.15 has been built against core20... I don't know how to get lxd's dnsmasq run anymore. If anyone can help. Thanks a lot !

But you’re running Focal on the host, which the other issue isn’t so might not be related.

Can you provide output of:

sudo ss -ulpn

Thanks

tomp · June 17, 2021, 10:32am

I notice you have raw dnsmasq, just a hunch, can you try unsetting that:

github.com/lxc/lxd

dnsmask process exited prematurely

opened 09:09AM - 17 Jun 21 UTC

geodb27

Bug

Hi, * Distribution: Ubuntu * Distribution version: 18.04.5 LTS (Bionic Be…aver) * The output of "lxc info" or if that fails: * Kernel version: 4.15.0-112-generic * LXC version: 4.15 * LXD version: 4.15 * Storage backend in use: ceph There might have been an update that broke the dnsmasq binary that is provided by the last lxd snap. The 'snap start lxd' emits the following error : lvl=eror msg="The dnsmasq process exited prematurely" driver=bridge err="Process exited with non-zero value 1" network=lxdbr0 project=default I've dug further and here is what I did : - nsenter -t $(cat /var/snap/lxd/common/lxd.pid) -n - cd /snap/lxd/current/bin - ./dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr1 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.124.111.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.hosts --dhcp-range 10.124.111.2,10.124.111.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.raw -u lxd -g lxd - ./dnsmasq: error while loading shared libraries: libnettle.so.7: cannot open shared object file: No such file or directory To my knoledge, but I admit that I don't know that much about snap, the problem resides that I have an ubuntu 18.04 that uses snap core18 and lxd-4.15 has been built against core20... I don't know how to get lxd's dnsmasq run anymore. If anyone can help. Thanks a lot !

Skyrider · June 17, 2021, 10:34am

State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
UNCONN 0 0 10.0.3.1:53 0.0.0.0:* users:((“dnsmasq”,pid=14109,fd=6))
UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:((“systemd-resolve”,pid=13876,fd=12))
UNCONN 0 0 0.0.0.0%lxcbr0:67 0.0.0.0:* users:((“dnsmasq”,pid=14109,fd=4))
UNCONN 0 0 0.0.0.0:27015 0.0.0.0:* users:((“hlds_linux”,pid=13977,fd=8))

And you mean unsetting it with:

sudo nsenter --mount=/run/snapd/ns/lxd.mnt – bash
LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ /snap/lxd/current/bin/dnsmasq --help

I’m not familiar with this.

tomp · June 17, 2021, 10:35am

No not the snap commands, the link I posted showed you how to do it, but its:

lxc network unset lxdbr0 raw.dnsmasq

Skyrider · June 17, 2021, 10:36am

if I do that, only IPV6’s are assigned to the containers.

±-------------±--------±-----±----------------------------------------------±----------±----------+
| baker | RUNNING | | fd42:78f0:8cd8:9b63:216:3eff:fe93:526c (eth1) | CONTAINER | 0 |
| | | | fd42:78f0:8cd8:9b63:216:3eff:fe69:389b (eth0) | | |
±-------------±--------±-----±----------------------------------------------±----------±----------+

tomp · June 17, 2021, 10:37am

Can you reload LXD now:

sudo systemctl reload snap.lxd.daemon

And then restart your containers.

Skyrider · June 17, 2021, 10:39am

Just tried, sorry. Only IPV6 still:

Ips:
eth0: inet6 fd42:78f0:8cd8:9b63:216:3eff:fe69:389b vethfaa4b48f
eth0: inet6 fe80::216:3eff:fe69:389b vethfaa4b48f
eth1: inet6 fd42:78f0:8cd8:9b63:216:3eff:fe93:526c vethc3504a1d
eth1: inet6 fe80::216:3eff:fe93:526c vethc3504a1d
lo: inet 127.0.0.1
lo: inet6 ::1

tomp · June 17, 2021, 10:41am

Can you run dhclient inside your container please

Skyrider · June 17, 2021, 10:47am

Running it doesn’t seem to do anything. It just… “hangs”.

tomp · June 17, 2021, 10:48am

Can you show the output of sudo ss -ulpn on the LXD host please.

Skyrider · June 17, 2021, 10:49am

tomp · June 17, 2021, 10:54am

You have DHCP service listening.

BTW what is lxcbr0 (as opposed to lxdbr0)?

Can you check the output of sudo iptables-save and sudo nft list ruleset to check if a firewall could be blocking it.

Skyrider · June 17, 2021, 10:55am

No idea what lxcbr0 is, haven’t used it and is set as unmanaged.

As for iptables, I’m using UFW.

tomp · June 17, 2021, 10:56am

Well in that case can you kill the dnsmasq process listening on that interface to rule it out. Always best to keep things as simple as possible in my experience.

The output of those commands above please.

Skyrider · June 17, 2021, 11:00am

As for “sudo nft list ruleset” → Command not found.