LXD 4.12: Container networking failure with docker

Hello folks!

A container that has been happily working for several months is now unable to access the LAN.

I’ve tried recreating the bridge, rebooting, disabling other network services (like docker) and restarting/reinstalling lxd.

Setup: Host: Ubuntu Focal Desktop, NetworkManager renderer. Physical wired interface with bridge br0 bound. Static or DHCP (Tried both) (5.8.0-45-generic #51~20.04.1-Ubuntu SMP Tue Feb 23 13:46:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux)

Container: Ubuntu-daily:focal image deployed today (5.8.0-45-generic #51~20.04.1-Ubuntu SMP Tue Feb 23 13:46:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux)

Anything in systemctl --failed in the container?

1 Like

Are other containers on the same host working ok?

Please show output of ip a and ip r inside the container as well as ip a, ip r and lxc config show <instance> --expanded from the LXD host.

I notice now that there’s some sort of issue on the host. The VMs on this host also have lost connectivity. Removing and recreating the bridge hasn’t helped. Something is up. The host itself has no connectivity issues.

ubuntu@test1201:~$ sudo systemctl --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION                         
● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems

No, other containers and VMs on the host are also broken.
Container.
Host.

Thanks.

So in these cases (and you’ll find several on these forums), the issue is almost always one of the following:

  1. Some other process listening on the DHCP or DNS ports preventing LXD’s dnsmasq from starting.
  2. Another firewall on the system preventing DHCP packets from LXD containers from reaching the DHCP server.

In your case you are using an unmanaged bridge, and so LXD won’t be running a DHCP/DNS server for your br0 bridge. This rules out 1.

So it is most likely another application on your host that is creating firewall rules blocking DHCP packets from being sent out of br0 to your network’s DHCP server.

I also note that in your host’s interface output you have a docker0 bridge, suggesting you have docker running. This is a big red light to me, as unfortunately Docker is known to modify the host’s firewall in such a way as to often cause DHCP issues.

See Lxd and Docker Firewall Redux - How to deal with FORWARD policy set to drop

I’m not certain at this stage that is the issue, but is certainly the first path to investigate.

Please can you show the output of on the host:

  • sudo iptables-save
  • sudo nft list ruleset
1 Like

Tom thanks so much for the detailed analysis, your valuable time is very much appreciated :slight_smile:
You’ve given me several helpful leads here - I’ll take a detailed look at the firewall and also see what happens if I remove Docker from the host.

I’ll report my findings here :+1:

1 Like

sudo iptables-save

 ~ sudo nft list ruleset 
sudo: nft: command not found
 ! ~

Yeah its the classic :FORWARD DROP [0:0] that docker applies that means any unmatched forwarded traffic gets dropped.

Can you try running:

 sudo iptables -I DOCKER-USER  -j ACCEPT

And see if that helps.

3 Likes

Thanks again Tom, that does seem to help. In a quick test a container that previously had no connectivity is now restored to digital glory. I’ll check more thoroughly and mark this thread resolved.

Reading the linked thread, I do agree there’s a race condition here. I observed that containers that are set to autostart would indeed get an IP after a host reboot most of the time, but container restarts would be the end of that.

Ah good. Yes they probably start before docker modifies the firewall. Although even if they are running OK before docker modifies the firewall, after that they won’t be able to renew their DHCP lease, so will eventually lose their IPs (assuming they are not statically configured).

1 Like

You may want to tune that override rule slightly depending on your security needs, as that effectively allows all forwarded packets to/from anywhere to pass the firewall.

1 Like

That’s what I was seeing, although even when I set static IPs there was no connectivity from containers.

Yup, noted. I am now back in my comfort zone in terms of being able to batten down the hatches where necessary :+1:

My VMs are working again too, thanks so much! Marking resolved.