I was fiddling with
resolvectl trying to configure DNS forwarding to a container within the LXD network when noticed that my containers got no IP addresses.
lxdbr0 here is a managed bridge, and containers previously acquired DHCP leases normally.
When I assigned IP addresses inside containers they started pinging each other, but could not ping the bridge address. In the packets I captured on lxdbr0 with tshark I could see ARP requests for bridge’s address but no replies. And the same with pinging containers from the host: ARP requests for container’s address without replies.
I created another managed network and attached a couple of containers to it. They acquired addresses (after some reconfiguration), so it was possible to delete the default network, then recreate it, and it would presumably resolve the issue. But I wanted to find out the cause and fix it for real. This is where I decided to post on this forum asking for help.
But first I needed some more data to attach, and tshark’s interface isn’t very comfortable to drill into individual packets, so I fired up Wireshark and started capture on lxdbr0. Suddenly I noticed something tshark didn’t show me: packets from containers were VLAN tagged with VID 1, and packets from the bridge were untagged. I checked with another bridge: packets went untagged in both directions. What could have caused this pointless tagging? Cursory search on
linux bridge vlan tags got me to this Unix StackExchange question: https://unix.stackexchange.com/questions/546136/bridged-interfaces-and-vlan-tags. So when I typed in the command to view current vlan configuration:
> bridge -d vlan port vlan ids lxdbr0 None lxdbr1 1 PVID Egress Untagged vethe722bae8 1 PVID Egress Untagged vethe044557c 1 PVID Egress Untagged veth3e3bc5e4 1 PVID Egress Untagged veth42e3c8f4 1 PVID Egress Untagged
— it left me somewhat confused. Turns out everything is by default tagged, and my bridge somehow lost its PVID. Well, it could be fixed with this somewhat weird incantation:
bridge vlan add dev lxdbr0 vid 1 pvid untagged self. Containers got their addresses straight away and could access the outside network.
The only questions left are how and why this happened and if this can happen again