Lan seizes up when deploying a new container

Pucky_wins · September 18, 2020, 3:16pm

Hi

When we deploy a new container to our cluster all the containers become unresponsive. It looks a bit like this, where it’s enabling a new interface and other interfaces become unresponsive. Sorry for the rough logs.

Sep 18 16:06:47 container2 systemd-udevd[31736]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 18 16:06:47 container2 systemd-udevd[31737]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 18 16:06:47 container2 systemd-udevd[31737]: Could not generate persistent MAC address for vethe0e6ad52: No such file or directory
Sep 18 16:06:47 container2 systemd-networkd[917]: vethe0e6ad52: Link UP
Sep 18 16:06:47 container2 kernel: [3603425.459129] eth0: renamed from veth75f39ae7
Sep 18 16:06:47 container2 kernel: [3603425.481912] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Sep 18 16:06:47 container2 kernel: [3603425.483082] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sep 18 16:06:47 container2 kernel: [3603425.483111] br0: port 29(vethe0e6ad52) entered blocking state
Sep 18 16:06:47 container2 kernel: [3603425.483112] br0: port 29(vethe0e6ad52) entered forwarding state
Sep 18 16:06:48 container2 systemd-networkd[917]: vethe0e6ad52: Gained carrier

We’re running lxd 4.5 and kernel 4.15.0-112-generic #113-Ubuntu SMP
That said, this issue has been around for a while.

Any ideas?

simos · October 2, 2020, 6:29am

This might be a hint to the problem. There is a discussion at coreos - Could not generate persistent MAC address for vethXXXXX: No such file or directory - Stack Overflow
However, I get that as well in my logs. From the link above, there is a bug report on systemd that was closed just recently.

I think the issue you are facing is not related to the errors shown above. There should be some other logs that might help.

(p.s. I edited the post to make it easier to read the logs. Click on Edit to see how I set the code environment.).

Pucky_wins · October 5, 2020, 7:27pm

Thanks. I’m not even sure what to look at at the moment. The whole cluster freezes when I add a new container. The database is often locked. Have to wait about 5 minutes to do anything after deploying. The cluster also just freezes up from time to time. I’m at a loss.

stgraber · October 5, 2020, 7:44pm

What’s your networking like?

If you’re directly bridging to the host network, it may be because your bridge doesn’t have a pinned MAC address. Unless you have a pinned MAC address, a bridge will take the lowest MAC of all its members, making it potentially change MAC every time an instance starts/stops, which combined with ARP delays and distributed database could explain what you’re seeing.

Pucky_wins · October 5, 2020, 7:52pm

Interesting. I have my nics bonded with a bridge on top of that so I’m guessing the answer is no?

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
link/ether 7e:17:65:63:dd:80 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
link/ether 7e:17:65:63:dd:80 brd ff:ff:ff:ff:ff:ff
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a2:1a:ee:69:9d:2b brd ff:ff:ff:ff:ff:ff
inet 10.3.0.48/16 brd 10.3.255.255 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::a01a:eeff:fe69:9d2b/64 scope link
valid_lft forever preferred_lft forever
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
link/ether 7e:17:65:63:dd:80 brd ff:ff:ff:ff:ff:ff

stgraber · October 5, 2020, 7:55pm

What are you using to configure that setup?

Whatever tool you’re using for that will most likely have a way to set the MAC of the bridge, it’s usually something called macaddress or hwaddr, set that to the MAC of your bond 7e:17:65:63:dd:80 then reboot to confirm it all applies properly on boot and you should be good, your bridge will never change MAC address again.

Pucky_wins · October 5, 2020, 7:59pm

netplan. Will do.

To be clear, do I need to set the mac on the bond or on the bridge?

stgraber · October 5, 2020, 8:01pm

On the bridge itself.

The MAC of the bond doesn’t really matter, it will always be that of the first interface which is added to the bond with that address then getting applied to the bond and to the other interfaces in the bond. But none of that matters since you’re putting your IP addresses on the bridge and not on the bond.

For netplan, I believe it’s macaddress.

Pucky_wins · October 5, 2020, 8:06pm

Ok, setting mac address to a2:1a:ee:69:9d:2b . Got it.

4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether a2:1a:ee:69:9d:2b brd ff:ff:ff:ff:ff:ff
inet 10.3.0.48/16 brd 10.3.255.255 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::a01a:eeff:fe69:9d2b/64 scope link
valid_lft forever preferred_lft forever

stgraber · October 5, 2020, 8:32pm

Good, hopefully that will take care of the network issues.