Incus, instances, bridges, and VLANs

Hi, I’ve been having very intermittent behaviour in my environment where connections drop abruptly. While troubleshooting I’ve realised I’m not confident on my network setup - I understand the netplan configuration, so I can configure the host and I can configure the Incus instances, but I am not sure about the interaction between the two so I wanted some advice. Apologies therefore upfront for what are basic questions.

I’m using Ubuntu 24.04 on the host and instances.

The host’s netplan is like this :

network:
  ethernets:
    eno1:
      dhcp4: false
      dhcp6: false
      wakeonlan: true
    enp1s0:
      dhcp4: false
      dhcp6: false
  bridges:
    br0:
      interfaces: [eno1]
      addresses: [10.1.1.30/24]
      nameservers:
        addresses: [10.1.1.42,10.1.1.11]
        search: [redacted]
      routes:
        - to: default
          via: 10.1.1.1
    br1:
      interfaces: [vlan10]
      addresses: [10.1.10.30/24]
    br2:
      interfaces: [vlan20]
      addresses: [10.1.20.30/24]
    br3:
      interfaces: [vlan30]
      addresses: [10.1.30.30/24]
    br4:
      interfaces: [vlan40]
      addresses: [10.1.40.30/24]
  vlans:
    vlan10:
      id: 10
      link: enp1s0
    vlan20:
      id: 20
      link: enp1s0
    vlan30:
      id: 30
      link: eno1
    vlan40:
      id: 40
      link: eno1
  version: 2

with the idea being I may need to attach different instances to different VLANs.

If I now attach an instance, either an LXC or a VM to a bridge, will the instance need to be configured to tag its traffic to that VLAN? Ideally I would want to just give the instances simple netplans where they just use their interface (type nic and bridged) with the VLAN tag being applied appropriately by the underlying host according to the bridge, but looking at the configuration now I don’t see any reason that would actually happen so I assume I’m just dropping untagged frames from the instance onto the bridge and they’re being pushed out onto the wire in the default VLAN.

What is the correct practice here? I don’t think I should be creating all the VLAN interfaces on the host first (enp1s0.10 etc) and creating bridges on top of those - this doesn’t match any examples in the netplan docs - but I think I’m now doubting everything. Can anyone assist?

Or, more generically - what is best practice for configuring Incus on top of a host with multiple VLANs?

I had a similar problem with configuring VLANs in incus recently (essentially, I wanted to use the same nic for both ceph and incus traffic, which should be isolated). So, as I figured out there are 2 options:

  1. You manage your bridges yourself (as you did in your netplan config).
  2. You let bridges be managed by incus (by creating a “network”)

In both cases you should then simply connect bridges to the network devices of containers, e.g., using profiles:

devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br1
    type: nic

Then you configure the network device within the container as usual, e.g., using netplan.

If you go with Option 2, you should not define the bridges in the netplan config of the host and only define VLANs. Then you should set the key bridge.external_interfaces to the corresponding VLAN when creating the network, e.g.,:

incus network create br1 bridge.external_interfaces=vlan10

Then the bridge br1 will be created and managed by incus.

Note that in this case the VLAN should be unconfigured, that is, they should not have any IP addresses assined (including ipv6 addresses). Otherwise incus will refuse creating the bridge. In our case, I had to add accept-ra: false to the netplan config for VLANs to prevent the router assigning ipv6 addresses to them.

B.t.w., it should, in principle, be possible to let incus manage the VLAN interfaces as well:

incus network create vlan10 --type physical parent=enp1s0 vlan=10

Unfortunately, there appears to be a bug in LXD according to which, after restarting lxd, the managed interfaces are created in the wrong order, which results in bridge br1 not properly configured because of the missing (at that time) VLAn vlan10. (I just verified that incus is affected by the same bug.)

Oh, it appears that “physical networks” do not seem to work the way I originally thought.
Creating a physical network in incus does not actually create a network interface.

$ incus network list
+-----------------+----------+---------+---------------+------+-------------+---------+---------+
|      NAME       |   TYPE   | MANAGED |     IPV4      | IPV6 | DESCRIPTION | USED BY |  STATE  |
+-----------------+----------+---------+---------------+------+-------------+---------+---------+
...
+-----------------+----------+---------+---------------+------+-------------+---------+---------+
| vlan10          | physical | YES     |               |      |             | 0       | CREATED |
+-----------------+----------+---------+---------------+------+-------------+---------+---------+
$ ip a | grep vlan10
[shows nothing]

Consequently, attaching vlan10 as an external interface of the bridge (silently) fails:

$ incus network set incusbr0 bridge.external_interfaces vlan10
$ sudo journalctl -u incus
...
Apr 03 14:37:32 server1 incusd[2130]: time="2025-04-03T14:37:32Z" level=warning msg="Skipping attaching missing external interface" driver=bridge interface=vlan10 network=incusbr0 project=default

So, interfaces managed by incus are probably not supposed to be used as external interfaces of bridges.

However, for some reason, if the name of the physical network is given like <VLAN-link>.<VLAN-id>, e.g., enp1s0.10 for the example above, the interface on the host is actually created and can be subsequently used for the bridge (but there still a problem with the order in which they appear after restart).

Hmmm. I’ve been doing your option 1 for a while but I’m still not sure that the VLAN tagging is working and I still have intermittent drops. Which may be unrelated but I do want to eliminate as many possibilities as I can.

In desperation I did configure source routing for the VLANs on the underlying hosts, but so far the situation has not improved.

Just saw this thread…

There haven’t been any updates on this issue in a while has anyone made any progress?

I originally used a configuration like the example provided (interfaces vlan10,20,30,40… connected to br1,2,3,4…), and it worked fine. You make different profiles which connect eth0 to the relevant bridge.

The downside is that every time you want to add a new vlan you have to create a new vlan interface and a new bridge in your netplan configuration, which gets messier the more vlans/bridges you have, and risks service interruption when reconfiguring.

I have now changed to using a single “vlan-aware bridge”. Since netplan doesn’t directly support this, you have to supplement it with a bit of systemd-networkd configuration.

==> /etc/netplan/01-netcfg.yaml <==
network:
  version: 2
  ethernets:
    enp1s0:
      wakeonlan: true
      dhcp4: false
      accept-ra: false
      link-local: []
  bridges:
    br0:
      # Set bridge to have same MAC address as the NIC.
      # See https://bugs.launchpad.net/netplan/+bug/1782221
      macaddress: 11:22:33:44:55:66
      interfaces: [enp1s0]
      parameters:
        stp: false
        forward-delay: 0
      dhcp4: false
      accept-ra: false
      addresses: [10.12.255.13/24, "2001:db8::13/64"]
      routes:
        - to: default
          via: 10.12.255.1
        - to: default
          via: "2001:db8::1"
      nameservers:
        addresses: [10.12.255.1]
        search: [example.net]

==> /etc/systemd/network/10-netplan-br0.netdev.d/vlan.conf <==
[Bridge]
MulticastSnooping=false
VLANFiltering=true

==> /etc/systemd/network/10-netplan-br0.network.d/vlan.conf <==
[BridgeVLAN]
VLAN=2-3
VLAN=248-256
PVID=255
EgressUntagged=255

==> /etc/systemd/network/10-netplan-enp1s0.network.d/vlan.conf <==
[BridgeVLAN]
VLAN=2-3
VLAN=248-256

You can if you prefer just set VLAN=2-4094 from the beginning. The only downside is that bridge vlan output is longer, but you can use bridge -compressvlans vlan for a shorter view.

Note that in my network, the management network is VLAN 255, which is tagged to the switch. The reason I set “PVID=255; EgressUntagged=255” on the bridge is to get the server’s own IP address on this VLAN. (Otherwise, I’d have to create a separate br0.255 interface)

Then you just create a new incus profile for each VLAN you want to use, which specifies br0 + the desired VLAN.

# incus profile show br255
config: {}
description: Bridge to management network
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
    vlan: "255"
  root:
    path: /
    pool: default
    type: disk
name: br255
project: default

# incus profile show br254
config: {}
description: Bridge to home network
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
    vlan: "254"
  root:
    path: /
    pool: default
    type: disk
name: br254
project: default

Aside: the old-school way of inspecting bridges is with brctl, which has gone the way of ifconfig - not available by default, but still available in its own package. The new way of doing this is with ip link and bridge:

ip [-d] [-j -p] link show type bridge
ip [-d] [-j -p] link show br0
ip [-d] [-j -p] link show master br0
bridge -compressvlans vlan show
bridge link show

Thank you, that looks more comprehensive and thus more complicated than what I have. :slight_smile: I’ll give it a try when I have a little time.

In the meantime I have decreased the random drops I was seeing somewhat by changing my MTU, I have a 5G internet connection and it turns out not to use a standard size. I still have issues which seem to be with flows between hosts on more than one VLAN, so there is more work to do.

Even my small environment generates around 215-275k internet flows every 24 hours and many times that internally, tracking down intermittent faults is hard!

Sorry, I can’t help you with your intermittent drops, you’ll need to collect more information (e.g. dmesg and syslog output from around the times of the drops). Make sure you don’t have NetworkManager running or anything else that might be interfering with the network.

I do recommend turning off spanning tree on all the bridges (see the netplan I posted for the settings required), especially when those bridges are all being trunked to some other switches. Unless your switch is a Cisco running proprietary per-VLAN spanning tree, this could make bad things happen.

I also note that it’s not necessary to give your incus host an IP address on the bridges, unless you’re planning to route between the bridges on the host. For example:

can become:

    br1:
      interfaces: [vlan10]
      dhcp4: false
      accept-ra: false
      link-local: []
    br2:
      interfaces: [vlan20]
      dhcp4: false
      accept-ra: false
      link-local: []
    br3:
      interfaces: [vlan30]
      dhcp4: false
      accept-ra: false
      link-local: []
    br4:
      interfaces: [vlan40]
      dhcp4: false
      accept-ra: false
      link-local: []

This should make the bridges “transparent”, in the sense that you can’t reach the host through them; they are only usable by incus containers/vms to reach the outside.

I’d like to see incus network list output, but either way it should show all the bridges as unmanaged.