"incus network create vpnbr0 bridge.external_interfaces=tap0" fails to link to tap0

Hello

In the process of migrating an LXD setup to Incus, encountering a snag regarding Bridge networking (Bridge network - Incus documentation).

** Use case:
A local “tap0” interface exists and provides access to an overlay network.
A “vpnbr0” network bridge is declared in Incus to bridge the tap0 interface with container veth.
The setup works with LXD but fails with Incus.

** LXD
In the LXD setup, a “vpnbr0” is created to bridging a local “tap0” interface with container veth:
lxc network create vpnbr0 bridge.external_interfaces=tap0 ipv4.address=none ipv4.dhcp=false ipv6.address=none ipv6.dhcp=false

Each container receives an additional interface upon creation to have access to the bridge:
lxc config device add $CONTAINER eth1 nic name=eth1 nictype=bridged parent=vpnbr0

One can easily check that the setup works through:

> >$ ip a
> > (...)
> > tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vpnbr0 state UNKNOWN group default qlen 1000
> >     link/ether 3e:a5:01:7e:88:7d brd ff:ff:ff:ff:ff:ff
> >     inet 192.168.0.15/24 brd 192.168.255.255 scope global tap0
> >        valid_lft forever preferred_lft forever
> >     inet6 fe80::xxx:1ff:xxx:887d/64 scope link proto kernel_ll
> >        valid_lft forever preferred_lft forever

(note the “master vpnbr0” in the 1st line)
and

$ lxc network info vpnbr0
Name: vpnbr0
MAC address: 00:16:3e:6f:8e:12
MTU: 1500
State: up
Type: broadcast

Network usage:
  Bytes received: 3.03MB
  Bytes sent: 0B
  Packets received: 88051
  Packets sent: 0

Bridge:
  ID: 8000.00163e6f8e12
  STP: false
  Forward delay: 1500
  Default VLAN ID: 1
  VLAN filtering: true
  Upper devices: tap0, vethxxxxx, vethxxxxx....

** INCUS

When trying to do the exact same actions on Incus, the result differ however, and the bridge connection to tap0 seems broken:

incus network create vpnbr0 bridge.external_interfaces=tap0 ipv4.address=none ipv4.dhcp=false ipv6.address=none ipv6.dhcp=false

$ ip a
tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000   # <--- NO MENTION OF "master vpnbr0"
    link/ether 36:67:89:88:db:54 brd ff:ff:ff:ff:ff:ff
     inet 192.168.0.15/24 brd 192.168.255.255 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::3467:89ff:fe88:db54/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

shows no “master vpnbr0” unlike with LXC

$ incus network info vpnbr0
Name: vpnbr0
MAC address: 00:16:3e:2f:ee:9a
MTU: 1500
State: up
Type: broadcast

Network usage:
  Bytes received: 650.87kB
  Bytes sent: 0B
  Packets received: 1999
  Packets sent: 0

Bridge:
  ID: 8000.00163e2fee9a
  STP: false
  Forward delay: 1500
  Default VLAN ID: 1
  VLAN filtering: true
  Upper devices: vethxxxxx   # <--- NO MENTION OF TAP0

As a consequence adding a new interface to each containers in Incus is useless as they do not have access to the overlay network.

Having looked at the Incus documentation I do not see any change from LXD to Incus, however the same actions fail silently for Incus, as evident above where the “tap0” interface is missing.
I made sure services are started in the same order as the creation of the “tap0” interface to soon may prevent it from being “claimed” by the bridge. The same setup works with the production LXD servers, so looking for any insight on why vpnbr0 fails to attach itself to tap0.

Thanks!

root@castiana:~# ip link show dev tap0
8: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 2a:5e:94:bc:f9:c9 brd ff:ff:ff:ff:ff:ff
root@castiana:~# incus network create vpnbr0 bridge.external_interfaces=tap0 ipv4.address=none ipv4.dhcp=false ipv6.address=none ipv6.dhcp=false
Network vpnbr0 created
root@castiana:~# ip link show dev tap0
8: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast master vpnbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 2a:5e:94:bc:f9:c9 brd ff:ff:ff:ff:ff:ff
root@castiana:~# incus network info vpnbr0
Name: vpnbr0
MAC address: 00:16:3e:5f:b8:a9
MTU: 1500
State: up
Type: broadcast

Network usage:
  Bytes received: 0B
  Bytes sent: 0B
  Packets received: 0
  Packets sent: 0

Bridge:
  ID: 8000.00163e5fb8a9
  STP: false
  Forward delay: 1500
  Default VLAN ID: 1
  VLAN filtering: true
  Upper devices: tap0
root@castiana:~# incus version
Client version: 0.6
Server version: 0.6
root@castiana:~# 

Thanks Stéphane.
I checked the logs in /var/log/incus which bring more details.
If I start the service which creates tap0 before incus starts, the following message is displayed:

time="2024-03-12T18:25:21+01:00" level=warning msg="Skipping attaching missing external interface" driver=bridge interface=tap0 network=vpnbr0 project=default

(this was not the case with LXD)

However, if I start the Incus service after tap0 is up, the following error appears:

time="2024-03-12T18:25:21+01:00" level=error msg="Failed initializing network" err="Failed starting: Only unconfigured network interfaces can be bridged" network=vpnbr0 project=default

A bit of a catch 22…

OK, so it appears that the interface tap0 needs to exist in an unconfigured state before the Incus service starts. However the software that creates tap0 (peervpn) automatically assigns an IP address, preventing Incus from creating the bridge as it sees the interface as up and configured.

I will find a way to force the “IP drop” from tap0 before the Incus service is launched which I believe will solve this issue.

Thanks again for your feedback @stgraber.

(BTW this behavior did not occur with LXD)

Might be dependent on the LXD version, this particular requirement for the interface to not be configured is definitely something we introduced a while back in LXD and not something that’s new in Incus. So if LXD still allows it, it’s a bug in LXD :slight_smile:

lxd --version
5.20

The LXD and Incus are running on 2 different (physical) servers but they run the same baseline OS (Trixie) with the same services scripts deployed etc. So there may be some difference that I am not aware of, but we have ~20 servers with this setup and this is the 1st time I see this issue.
In any case, the solution is on its way to be found, thanks again for your help.