Containers suddenly stopped working since move to core20 snap - No more IP's assigned

tomp · June 17, 2021, 10:32am

I notice you have raw dnsmasq, just a hunch, can you try unsetting that:

dnsmask process exited prematurely

opened 09:09AM - 17 Jun 21 UTC

Bug

Hi, * Distribution: Ubuntu * Distribution version: 18.04.5 LTS (Bionic Be…aver) * The output of "lxc info" or if that fails: * Kernel version: 4.15.0-112-generic * LXC version: 4.15 * LXD version: 4.15 * Storage backend in use: ceph There might have been an update that broke the dnsmasq binary that is provided by the last lxd snap. The 'snap start lxd' emits the following error : lvl=eror msg="The dnsmasq process exited prematurely" driver=bridge err="Process exited with non-zero value 1" network=lxdbr0 project=default I've dug further and here is what I did : - nsenter -t $(cat /var/snap/lxd/common/lxd.pid) -n - cd /snap/lxd/current/bin - ./dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr1 --dhcp-rapid-commit --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.124.111.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.hosts --dhcp-range 10.124.111.2,10.124.111.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr1/dnsmasq.raw -u lxd -g lxd - ./dnsmasq: error while loading shared libraries: libnettle.so.7: cannot open shared object file: No such file or directory To my knoledge, but I admit that I don't know that much about snap, the problem resides that I have an ubuntu 18.04 that uses snap core18 and lxd-4.15 has been built against core20... I don't know how to get lxd's dnsmasq run anymore. If anyone can help. Thanks a lot !

Skyrider · June 17, 2021, 10:34am

State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
UNCONN 0 0 10.0.3.1:53 0.0.0.0:* users:((“dnsmasq”,pid=14109,fd=6))
UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:((“systemd-resolve”,pid=13876,fd=12))
UNCONN 0 0 0.0.0.0%lxcbr0:67 0.0.0.0:* users:((“dnsmasq”,pid=14109,fd=4))
UNCONN 0 0 0.0.0.0:27015 0.0.0.0:* users:((“hlds_linux”,pid=13977,fd=8))

And you mean unsetting it with:

sudo nsenter --mount=/run/snapd/ns/lxd.mnt – bash
LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ /snap/lxd/current/bin/dnsmasq --help

I’m not familiar with this.

tomp · June 17, 2021, 10:35am

No not the snap commands, the link I posted showed you how to do it, but its:

lxc network unset lxdbr0 raw.dnsmasq

Skyrider · June 17, 2021, 10:36am

if I do that, only IPV6’s are assigned to the containers.

±-------------±--------±-----±----------------------------------------------±----------±----------+
| baker | RUNNING | | fd42:78f0:8cd8:9b63:216:3eff:fe93:526c (eth1) | CONTAINER | 0 |
| | | | fd42:78f0:8cd8:9b63:216:3eff:fe69:389b (eth0) | | |
±-------------±--------±-----±----------------------------------------------±----------±----------+

tomp · June 17, 2021, 10:37am

Can you reload LXD now:

sudo systemctl reload snap.lxd.daemon

And then restart your containers.

Skyrider · June 17, 2021, 10:39am

Just tried, sorry. Only IPV6 still:

Ips:
eth0: inet6 fd42:78f0:8cd8:9b63:216:3eff:fe69:389b vethfaa4b48f
eth0: inet6 fe80::216:3eff:fe69:389b vethfaa4b48f
eth1: inet6 fd42:78f0:8cd8:9b63:216:3eff:fe93:526c vethc3504a1d
eth1: inet6 fe80::216:3eff:fe93:526c vethc3504a1d
lo: inet 127.0.0.1
lo: inet6 ::1

tomp · June 17, 2021, 10:41am

Can you run dhclient inside your container please

Skyrider · June 17, 2021, 10:47am

Running it doesn’t seem to do anything. It just… “hangs”.

tomp · June 17, 2021, 10:48am

Can you show the output of sudo ss -ulpn on the LXD host please.

Skyrider · June 17, 2021, 10:49am

tomp · June 17, 2021, 10:54am

You have DHCP service listening.

BTW what is lxcbr0 (as opposed to lxdbr0)?

Can you check the output of sudo iptables-save and sudo nft list ruleset to check if a firewall could be blocking it.

Skyrider · June 17, 2021, 10:55am

No idea what lxcbr0 is, haven’t used it and is set as unmanaged.

As for iptables, I’m using UFW.

tomp · June 17, 2021, 10:56am

Well in that case can you kill the dnsmasq process listening on that interface to rule it out. Always best to keep things as simple as possible in my experience.

The output of those commands above please.

Skyrider · June 17, 2021, 11:00am

As for “sudo nft list ruleset” → Command not found.

tomp · June 17, 2021, 11:03am

Ah docker. See

and

Skyrider · June 17, 2021, 11:06am

Not sure if the first UFW rules did it… But containers are getting an IP again!

tomp · June 17, 2021, 11:07am

Basically docker modifies the iptables rules so that it prevents container’s DHCP requests. But depending on the start up order of docker vs LXD it may work or not without rule modification.

But if you then reload LXD it wipes its own rules and re-adds them which can then cause the docker rules to take effect.

Skyrider · June 17, 2021, 11:08am

I assume this is in combination with dnsmasq? As it worked fine for many weeks/months before.

tomp · June 17, 2021, 11:15am

Its a well known historical issue (search the forums for docker), its firewall rules prevent containers reaching dnsmasq’s DHCP service. But as I explained it depends on ordering, which can be unpredictable and variable on different systems.

tomp · June 17, 2021, 11:34am

So its been confirmed that its the raw.dnsmasq auth-zone=lxd setting that is causing the problem. See dnsmask process exited prematurely if raw.dnsmasq auth-zone set when using core20 snap · Issue #8905 · lxc/lxd · GitHub issue for more detail.