Delete a stopped container bring down the fan interface

druggo · August 26, 2020, 1:55pm

I have a lxd/3.0.3 cluster with fan network running smooth for one year.
Today, I just delete a stopped container on a node, then tragedy happened:
the fanbr0 lost its ip address, all conntainers on the node was unreachable from other node

/var/log/lxd/lxd.log

t=2020-08-26T13:42:16+0800 lvl=info msg="Done pruning expired images" 
t=2020-08-26T16:45:17+0800 lvl=info msg="Deleting container" created=2018-08-13T22:26:30+0800 ephemeral=false name=u1604 used=2019-05-26T15:48:20+0800
t=2020-08-26T16:45:21+0800 lvl=info msg="Deleted container" created=2018-08-13T22:26:30+0800 ephemeral=false name=u1604 used=2019-05-26T15:48:20+0800

at the same time, fan interface got removed too …

Aug 26 16:45:20 dxc1 kernel: [39743274.216544] device fanbr0-mtu left promiscuous mode
Aug 26 16:45:20 dxc1 kernel: [39743274.216551] fanbr0: port 1(fanbr0-mtu) entered disabled state
Aug 26 16:45:20 dxc1 kernel: [39743274.259105] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 16:45:20 dxc1 kernel: [39743274.259320] device fanbr0-fan left promiscuous mode
Aug 26 16:45:20 dxc1 kernel: [39743274.259322] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 16:45:20 dxc1 NetworkManager[912]: <info>  [1598431520.8627] device (fanbr0-fan): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3]
Aug 26 16:45:20 dxc1 NetworkManager[912]: <info>  [1598431520.8782] devices removed (path: /sys/devices/virtual/net/fanbr0-fan, iface: fanbr0-fan)
Aug 26 16:45:20 dxc1 NetworkManager[912]: <info>  [1598431520.8791] devices removed (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 16:45:21 dxc1 NetworkManager[912]: nm_device_get_device_type: assertion 'NM_IS_DEVICE (self)' failed
Aug 26 16:45:21 dxc1 kernel: [39743274.566557] device fanbr0-mtu entered promiscuous mode
Aug 26 16:45:21 dxc1 NetworkManager[912]: <info>  [1598431521.1038] manager: (fanbr0-mtu): new Generic device (/org/freedesktop/NetworkManager/Devices/82)
Aug 26 16:45:21 dxc1 NetworkManager[912]: <info>  [1598431521.3210] devices added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 16:45:21 dxc1 NetworkManager[912]: <info>  [1598431521.3211] device added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu): no ifupdown configuration found.

its weird that delete a stopped container can bring down fan inteface , pls help .

stgraber · August 26, 2020, 2:14pm

systemctl restart lxd should do the trick to bring the fan device back up.
If fanbr0 itself still exists, then existing containers should work again without needing a restart.

LXD itself would only ever delete the interfaces if you delete the managed network.
So this would suggest something else on the system dropping those interfaces for you, this is particularly confusing given the container in question wasn’t even running.

Anything in dmesg that may explain the two child interfaces going away?

druggo · August 26, 2020, 2:29pm

I have no clue while all dmesg pasted below:

dxc1:~# dmesg
[33172557.649156] nr_pdflush_threads exported in /proc is scheduled for removal
[39743274.216544] device fanbr0-mtu left promiscuous mode
[39743274.216551] fanbr0: port 1(fanbr0-mtu) entered disabled state
[39743274.259105] fanbr0: port 2(fanbr0-fan) entered disabled state
[39743274.259320] device fanbr0-fan left promiscuous mode
[39743274.259322] fanbr0: port 2(fanbr0-fan) entered disabled state
[39743274.566557] device fanbr0-mtu entered promiscuous mode

in fact restart lxd will bring back the fan interface but with a wrong ip,
let me explain it, after running lxd cluster for a long time, I add a vip( 10.2.1.42 ) to the loopback device lo:1 for lvs usage, it seems fan interface choose the wrong interface and ip address

ip -d l show fanbr0-fan
6: fanbr0-fan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65486 qdisc noqueue master fanbr0 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 4e:3a:18:86:5a:8b brd ff:ff:ff:ff:ff:ff promiscuity 1 
    vxlan id 15728640 fan-map 240.0.0.0/8:10.2.1.0/24 local 10.2.1.42 dev lo srcport 0 0 dstport 8472 ageing 300 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode none

its ok, I can temprary remove vip to fix this.
the real problem still why fan interface get remove when delete a stopped container ?

stgraber · August 26, 2020, 2:34pm

Does it happen again if you create and delete a container?

As for picking the wrong address, you may be able to avoid that by setting fan.underlay_subnet on the network.

druggo · August 26, 2020, 2:38pm

thank you, but vip is the same subnet as eth0.

as a production cluster, I will try to reproduce it at mid night.

and this is all the command sequence I made:

login
ifconfig
lxc list
lxc rm u1604  ( the stopped container)
lxc list
lxc image list
logout

druggo · August 27, 2020, 4:36am

before I obtain the permission to reproduce the issue, more infomation as below:
after first node’s fan was down ( at the time I delete the stopped container) ,
25min later, the other node’s fan device lost its ip too,
lxd.log

t=2020-08-26T10:29:47+0800 lvl=info msg="Done pruning expired images" 
t=2020-08-26T17:12:21+0800 lvl=warn msg="Detected poll(POLLNVAL) event." 
t=2020-08-26T17:13:17+0800 lvl=warn msg="Detected poll(POLLNVAL) event." 
t=2020-08-26T17:13:17+0800 lvl=warn msg="Detected poll(POLLNVAL) event: exiting." 
t=2020-08-26T17:13:18+0800 lvl=warn msg="Failed to get events from node 10.2.1.83:8443: Unable to connect to: 10.2.1.83:8443"

dmesg

Aug 26 17:09:25 dxc2 kernel: [39745925.350838] device fanbr0-mtu left promiscuous mode
Aug 26 17:09:25 dxc2 kernel: [39745925.350851] fanbr0: port 1(fanbr0-mtu) entered disabled state
Aug 26 17:09:25 dxc2 kernel: [39745925.389828] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 17:09:25 dxc2 kernel: [39745925.400737] device fanbr0-fan left promiscuous mode
Aug 26 17:09:25 dxc2 kernel: [39745925.400742] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4202] devices removed (path: /sys/devices/virtual/net/fanbr0-fan, iface: fanbr0-fan)
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4204] devices removed (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4906] device (fanbr0-fan): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3]
Aug 26 17:09:25 dxc2 NetworkManager[907]: nm_device_get_device_type: assertion 'NM_IS_DEVICE (self)' failed
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.7789] manager: (fanbr0-mtu): new Generic device (/org/freedesktop/NetworkManager/Devices/61)
Aug 26 17:09:25 dxc2 kernel: [39745925.835250] device fanbr0-mtu entered promiscuous mode
Aug 26 17:09:26 dxc2 NetworkManager[907]: <info>  [1598432966.0001] devices added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 17:09:26 dxc2 NetworkManager[907]: <info>  [1598432966.0001] device added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu): no ifupdown configuration found.
Aug 26 17:27:54 dxc2 kernel: [39747034.555585] audit: type=1400 audit(1598434074.499:170): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/bin/lxc-start" pid=12803 comm="apparmor_parser"

the two nodes have common points:

lxd cluster member
same vip on lo:1 ( for k8s master lb)
k8s master with calico

stgraber · August 27, 2020, 4:29pm

I’m a bit confused as to why those devices would be in promiscuous mode in the first place too, maybe it’s normal but could also suggest some kind of networking monitoring system messing with them?

druggo · August 28, 2020, 4:20am

promiscuous mode maybe normal , a test cluster setup log:

Wed Aug 26 16:38:20 2020 /snap/lxd/current/bin/lxd forkdns 240.178.147.1:1053 lxd lxdfan0

Aug 26 16:38:20 dq0 kernel: [ 9633.986127] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9633.986130] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Aug 26 16:38:20 dq0 kernel: [ 9633.986240] device lxdfan0-mtu entered promiscuous mode
Aug 26 16:38:20 dq0 kernel: [ 9633.991093] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9633.991095] lxdfan0: port 1(lxdfan0-mtu) entered forwarding state
Aug 26 16:38:20 dq0 kernel: [ 9634.057238] lxdfan0: port 2(lxdfan0-fan) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9634.057240] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Aug 26 16:38:20 dq0 kernel: [ 9634.057435] device lxdfan0-fan entered promiscuous mode
Aug 26 16:38:20 dq0 kernel: [ 9634.059714] lxdfan0: port 2(lxdfan0-fan) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9634.059716] lxdfan0: port 2(lxdfan0-fan) entered forwarding state
Aug 26 16:38:20 dq0 kernel: [ 9634.137012] audit: type=1400 audit(1598431100.985:45): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdfan0_</var/snap/lxd/common/lxd>" pid=5810 comm="apparmor_parser"

still possible there something else mess with it, I will investigate later, thanks for advice.

druggo · August 29, 2020, 2:52pm

after dig into the code , delete container will go through network restart and delete the tunnel device at lxd/networks.go:936

	// Cleanup any existing tunnel device
	for _, iface := range ifaces {
		if strings.HasPrefix(iface.Name, fmt.Sprintf("%s-", n.name)) {
			_, err = shared.RunCommand("ip", "link", "del", "dev", iface.Name)
			if err != nil {
				return err
			}
		}
	}

I think it can reproduce elsewhere,

so I setup a two node v3.03 cluster with ubuntu 16.04 to emulate the production, startup four containers , bring up vip ( same subnet as eth0 ) on the host:

ifconfig lo:1 10.2.1.77 netmask 255.255.255.255

stop a container, its ok
when I delete the stopped container, the tunnel devices immediately go away:

[ 1725.085003] device lxdfan0-mtu left promiscuous mode
[ 1725.085009] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
[ 1725.121044] lxdfan0: port 2(lxdfan0-fan) entered disabled state
[ 1725.122627] device lxdfan0-fan left promiscuous mode
[ 1725.122631] lxdfan0: port 2(lxdfan0-fan) entered disabled state
[ 1725.164273] device lxdfan0-mtu entered promiscuous mode

now I notice the last line, -mtu tunnel come up, and with the wrong mtu:

13: lxdfan0-mtu: <BROADCAST,NOARP> mtu 65486 qdisc noop master lxdfan0 state DOWN mode DEFAULT group default qlen 1000
    link/ether c2:66:87:e9:be:6e brd ff:ff:ff:ff:ff:ff

it seems, bring up -fan tunnel was failed when network restart, its strange that lxd.log show nothing about it. ( all good infomation )

stgraber · August 29, 2020, 3:53pm

That’s odd, we’re not supposed to be restarting the network on container delete.
We kick dnsmasq to have it delete the lease but the rest of the network should not be touched at all.

I don’t suppose you could test this on LXD 4.0.x to see if that’s something we already fixed?

@tomp can you take a look at this next week?

druggo · August 30, 2020, 6:54am

I tested with version 3.0.4 and above ( 4.0.3, 4.4, 4.5 ), delete container will not bring down the fan interface , looks like fixed by this commit ( remove the network restart ).

But lxd picking the wrong fan address still exist while vip on the lo with the same subnet as eth0, it make the node’s fan network unreachable. Would you consider skip loopback device when fan address picking or add a setting like fan.underlay_interface ?

tomp · August 30, 2020, 2:11pm

Yes will do.

tomp · September 1, 2020, 9:51am

OK so it is fixed in 3.0.4, excellent.

Please can you describe more about what you mean about skipping the loopback device, I do not understand the issue you are describing here.

Thanks

druggo · September 1, 2020, 12:04pm

you can repreducet it like this:

setup lxd cluster using fan network
assign a ip address to lo ( the ip’s subnet need the same as fan.underlay_subnet ) on a node
restart/reload lxd on the node
now fan bridge ip address changed and the node was unreachable from others.

tomp · September 1, 2020, 2:20pm

So I’m a bit confused by this.

If I setup a cluster, with the local LAN subnet being 10.135.120.0/24 then the fan’s fan.underlay_subnet setting also becomes 10.135.120.0/24.

Node 1: 10.135.120.142/24
Node 2: 10.135.120.18/24
Default gateway: 10.135.120.1

Fan network:

lxc network show lxdfan0
config:
  bridge.mode: fan
  fan.underlay_subnet: 10.135.120.0/24
description: ""
name: lxdfan0
type: bridge

If I then add a “VIP” IP to the lo interface on one of the nodes with the same subnet as the fan’s underlay, say 10.135.120.254 on V2:

V2

ip a add 10.135.120.254/24 dev lo

Then this will create a local route for 10.135.120.0/24 to the lo interface and immediately breaks all cluster communication (as one would expect).

So I don’t see what you are trying to achieve with this.

druggo · September 1, 2020, 2:45pm

a VIP sometimes used for lvs HA, and the netmask should /32, see my above post
ifconfig lo:1 10.2.1.77 netmask 255.255.255.255

so, such a VIP will not break route table.

tomp · September 1, 2020, 2:46pm

Ah yes that is how I would normally add a VIP using a /32 but you said " the ip’s subnet need the same as fan.underlay_subnet" which is what confused me. OK I will try again.

druggo · September 1, 2020, 2:53pm

sorry about the misleading , the strange behavoir come from addressForSubnet fuction I guess .

tomp · September 1, 2020, 3:06pm

Oh I see, so because the fan.underlay_subnet setting is effectively used to match the first interface with an address in that subnet, and that interface’s address is then used to derive the fan’s overlay address that if you add another address to a different interface (in this case lo) then it will cause an early match and recompute the node’s fan overlay address differently.

@stgraber there are a couple of options here I can see:

As @druggo says, we could skip the the lo interface from being considered as a source of the overlay’s address (i.e fan.underlay_subnet could never be derived from an address on lo).
As I understand it the addressForSubnet() is using the fan.underlay_subnet to find the interface that is expected to be used for the underlay. In that case we could be match both the interface’s address is part of fan.underlay_subnet and that the subnet masks match as well. This way a /32 interface address wouldn’t match and would be skipped.

druggo · September 1, 2020, 3:58pm

That’s it ! I’m waiting for the fix