Delete a stopped container bring down the fan interface

I have no clue while all dmesg pasted below:

dxc1:~# dmesg
[33172557.649156] nr_pdflush_threads exported in /proc is scheduled for removal
[39743274.216544] device fanbr0-mtu left promiscuous mode
[39743274.216551] fanbr0: port 1(fanbr0-mtu) entered disabled state
[39743274.259105] fanbr0: port 2(fanbr0-fan) entered disabled state
[39743274.259320] device fanbr0-fan left promiscuous mode
[39743274.259322] fanbr0: port 2(fanbr0-fan) entered disabled state
[39743274.566557] device fanbr0-mtu entered promiscuous mode

in fact restart lxd will bring back the fan interface but with a wrong ip,
let me explain it, after running lxd cluster for a long time, I add a vip( 10.2.1.42 ) to the loopback device lo:1 for lvs usage, it seems fan interface choose the wrong interface and ip address

ip -d l show fanbr0-fan
6: fanbr0-fan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65486 qdisc noqueue master fanbr0 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 4e:3a:18:86:5a:8b brd ff:ff:ff:ff:ff:ff promiscuity 1 
    vxlan id 15728640 fan-map 240.0.0.0/8:10.2.1.0/24 local 10.2.1.42 dev lo srcport 0 0 dstport 8472 ageing 300 
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode none 

its ok, I can temprary remove vip to fix this.
the real problem still why fan interface get remove when delete a stopped container ?

Does it happen again if you create and delete a container?

As for picking the wrong address, you may be able to avoid that by setting fan.underlay_subnet on the network.

thank you, but vip is the same subnet as eth0.

as a production cluster, I will try to reproduce it at mid night.

and this is all the command sequence I made:

login
ifconfig
lxc list
lxc rm u1604  ( the stopped container)
lxc list
lxc image list
logout

before I obtain the permission to reproduce the issue, more infomation as below:
after first node’s fan was down ( at the time I delete the stopped container) ,
25min later, the other node’s fan device lost its ip too,
lxd.log

t=2020-08-26T10:29:47+0800 lvl=info msg="Done pruning expired images" 
t=2020-08-26T17:12:21+0800 lvl=warn msg="Detected poll(POLLNVAL) event." 
t=2020-08-26T17:13:17+0800 lvl=warn msg="Detected poll(POLLNVAL) event." 
t=2020-08-26T17:13:17+0800 lvl=warn msg="Detected poll(POLLNVAL) event: exiting." 
t=2020-08-26T17:13:18+0800 lvl=warn msg="Failed to get events from node 10.2.1.83:8443: Unable to connect to: 10.2.1.83:8443" 

dmesg

Aug 26 17:09:25 dxc2 kernel: [39745925.350838] device fanbr0-mtu left promiscuous mode
Aug 26 17:09:25 dxc2 kernel: [39745925.350851] fanbr0: port 1(fanbr0-mtu) entered disabled state
Aug 26 17:09:25 dxc2 kernel: [39745925.389828] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 17:09:25 dxc2 kernel: [39745925.400737] device fanbr0-fan left promiscuous mode
Aug 26 17:09:25 dxc2 kernel: [39745925.400742] fanbr0: port 2(fanbr0-fan) entered disabled state
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4202] devices removed (path: /sys/devices/virtual/net/fanbr0-fan, iface: fanbr0-fan)
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4204] devices removed (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.4906] device (fanbr0-fan): state change: disconnected -> unmanaged (reason 'unmanaged') [30 10 3]
Aug 26 17:09:25 dxc2 NetworkManager[907]: nm_device_get_device_type: assertion 'NM_IS_DEVICE (self)' failed
Aug 26 17:09:25 dxc2 NetworkManager[907]: <info>  [1598432965.7789] manager: (fanbr0-mtu): new Generic device (/org/freedesktop/NetworkManager/Devices/61)
Aug 26 17:09:25 dxc2 kernel: [39745925.835250] device fanbr0-mtu entered promiscuous mode
Aug 26 17:09:26 dxc2 NetworkManager[907]: <info>  [1598432966.0001] devices added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu)
Aug 26 17:09:26 dxc2 NetworkManager[907]: <info>  [1598432966.0001] device added (path: /sys/devices/virtual/net/fanbr0-mtu, iface: fanbr0-mtu): no ifupdown configuration found.
Aug 26 17:27:54 dxc2 kernel: [39747034.555585] audit: type=1400 audit(1598434074.499:170): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/bin/lxc-start" pid=12803 comm="apparmor_parser"

the two nodes have common points:

  1. lxd cluster member
  2. same vip on lo:1 ( for k8s master lb)
  3. k8s master with calico

I’m a bit confused as to why those devices would be in promiscuous mode in the first place too, maybe it’s normal but could also suggest some kind of networking monitoring system messing with them?

promiscuous mode maybe normal , a test cluster setup log:

Wed Aug 26 16:38:20 2020 /snap/lxd/current/bin/lxd forkdns 240.178.147.1:1053 lxd lxdfan0

Aug 26 16:38:20 dq0 kernel: [ 9633.986127] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9633.986130] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
Aug 26 16:38:20 dq0 kernel: [ 9633.986240] device lxdfan0-mtu entered promiscuous mode
Aug 26 16:38:20 dq0 kernel: [ 9633.991093] lxdfan0: port 1(lxdfan0-mtu) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9633.991095] lxdfan0: port 1(lxdfan0-mtu) entered forwarding state
Aug 26 16:38:20 dq0 kernel: [ 9634.057238] lxdfan0: port 2(lxdfan0-fan) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9634.057240] lxdfan0: port 2(lxdfan0-fan) entered disabled state
Aug 26 16:38:20 dq0 kernel: [ 9634.057435] device lxdfan0-fan entered promiscuous mode
Aug 26 16:38:20 dq0 kernel: [ 9634.059714] lxdfan0: port 2(lxdfan0-fan) entered blocking state
Aug 26 16:38:20 dq0 kernel: [ 9634.059716] lxdfan0: port 2(lxdfan0-fan) entered forwarding state
Aug 26 16:38:20 dq0 kernel: [ 9634.137012] audit: type=1400 audit(1598431100.985:45): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd_dnsmasq-lxdfan0_</var/snap/lxd/common/lxd>" pid=5810 comm="apparmor_parser"

still possible there something else mess with it, I will investigate later, thanks for advice.

after dig into the code , delete container will go through network restart and delete the tunnel device at lxd/networks.go:936

	// Cleanup any existing tunnel device
	for _, iface := range ifaces {
		if strings.HasPrefix(iface.Name, fmt.Sprintf("%s-", n.name)) {
			_, err = shared.RunCommand("ip", "link", "del", "dev", iface.Name)
			if err != nil {
				return err
			}
		}
	}

I think it can reproduce elsewhere,

so I setup a two node v3.03 cluster with ubuntu 16.04 to emulate the production, startup four containers , bring up vip ( same subnet as eth0 ) on the host:

ifconfig lo:1 10.2.1.77 netmask 255.255.255.255

stop a container, its ok
when I delete the stopped container, the tunnel devices immediately go away:

[ 1725.085003] device lxdfan0-mtu left promiscuous mode
[ 1725.085009] lxdfan0: port 1(lxdfan0-mtu) entered disabled state
[ 1725.121044] lxdfan0: port 2(lxdfan0-fan) entered disabled state
[ 1725.122627] device lxdfan0-fan left promiscuous mode
[ 1725.122631] lxdfan0: port 2(lxdfan0-fan) entered disabled state
[ 1725.164273] device lxdfan0-mtu entered promiscuous mode

now I notice the last line, -mtu tunnel come up, and with the wrong mtu:

13: lxdfan0-mtu: <BROADCAST,NOARP> mtu 65486 qdisc noop master lxdfan0 state DOWN mode DEFAULT group default qlen 1000
    link/ether c2:66:87:e9:be:6e brd ff:ff:ff:ff:ff:ff

it seems, bring up -fan tunnel was failed when network restart, its strange that lxd.log show nothing about it. ( all good infomation )

That’s odd, we’re not supposed to be restarting the network on container delete.
We kick dnsmasq to have it delete the lease but the rest of the network should not be touched at all.

I don’t suppose you could test this on LXD 4.0.x to see if that’s something we already fixed?

@tomp can you take a look at this next week?

I tested with version 3.0.4 and above ( 4.0.3, 4.4, 4.5 ), delete container will not bring down the fan interface , looks like fixed by this commit ( remove the network restart ).

But lxd picking the wrong fan address still exist while vip on the lo with the same subnet as eth0, it make the node’s fan network unreachable. Would you consider skip loopback device when fan address picking or add a setting like fan.underlay_interface ?

Yes will do.

OK so it is fixed in 3.0.4, excellent.

Please can you describe more about what you mean about skipping the loopback device, I do not understand the issue you are describing here.

Thanks

you can repreducet it like this:

  1. setup lxd cluster using fan network
  2. assign a ip address to lo ( the ip’s subnet need the same as fan.underlay_subnet ) on a node
  3. restart/reload lxd on the node
  4. now fan bridge ip address changed and the node was unreachable from others.

So I’m a bit confused by this.

If I setup a cluster, with the local LAN subnet being 10.135.120.0/24 then the fan’s fan.underlay_subnet setting also becomes 10.135.120.0/24.

Node 1: 10.135.120.142/24
Node 2: 10.135.120.18/24
Default gateway: 10.135.120.1

Fan network:

lxc network show lxdfan0
config:
  bridge.mode: fan
  fan.underlay_subnet: 10.135.120.0/24
description: ""
name: lxdfan0
type: bridge

If I then add a “VIP” IP to the lo interface on one of the nodes with the same subnet as the fan’s underlay, say 10.135.120.254 on V2:

V2

ip a add 10.135.120.254/24 dev lo

Then this will create a local route for 10.135.120.0/24 to the lo interface and immediately breaks all cluster communication (as one would expect).

So I don’t see what you are trying to achieve with this.

a VIP sometimes used for lvs HA, and the netmask should /32, see my above post
ifconfig lo:1 10.2.1.77 netmask 255.255.255.255

so, such a VIP will not break route table.

Ah yes that is how I would normally add a VIP using a /32 but you said " the ip’s subnet need the same as fan.underlay_subnet" which is what confused me. OK I will try again.

sorry about the misleading , the strange behavoir come from addressForSubnet fuction I guess .

Oh I see, so because the fan.underlay_subnet setting is effectively used to match the first interface with an address in that subnet, and that interface’s address is then used to derive the fan’s overlay address that if you add another address to a different interface (in this case lo) then it will cause an early match and recompute the node’s fan overlay address differently.

@stgraber there are a couple of options here I can see:

  1. As @druggo says, we could skip the the lo interface from being considered as a source of the overlay’s address (i.e fan.underlay_subnet could never be derived from an address on lo).
  2. As I understand it the addressForSubnet() is using the fan.underlay_subnet to find the interface that is expected to be used for the underlay. In that case we could be match both the interface’s address is part of fan.underlay_subnet and that the subnet masks match as well. This way a /32 interface address wouldn’t match and would be skipped.

That’s it ! I’m waiting for the fix :slight_smile:

I’ve added a PR to fix this

1 Like

Apparently this fix is causing issues with GCP users as they have /32 addresses on their main interfaces.

I might need to tweak this to only exclude the lo interface as mentioned earlier rather than the more general approach of excluding /32 address.

See LXD container stuck in "RUNNING" without IP address