Started learning/using incus projects and puzzled by project network bridges ip addresses

bmullan · July 1, 2025, 10:14pm

I have 2 servers on Digital Ocean

node1
node2

I had no problem creating 2 projects on each NODE

tenant1
tenant2

my use case involved use of Wireguard installed on each NODE and using
wireguard’s allowed-ips to add each NODE’s subnets (incus bridge networks) to be accessible via VPN tunnel from node1 ↔ node2

But I ran up against something I’m not sure I understand.

Checking the project named “tenant1” on each server NODE:

Top picture is from the Digital Ocean server named node1 and shows
result of executing:

incus network ls --project=tenant1

Bottom picture is from the Digital Ocean server named node2 and shows
result of executing:

incus network ls --project=tenant1

on both node1 and node2 despite being separate Digital Ocean server instances

node1
- br-tenant1 = 10.204.203.1/24
- br-tenant2 = 10.73.7.1/24

node2
- br-tenant1 = 10.204.203.1/24
- br-tenant2 = 10.73.7.1/24

the br-tenant1 ip address on node1 is identical to the br-tenant1 ip address on node2 ??
same for br-tenant2 on both server/NODES

Problem:
on each Wireguard VPN Peer the “allowed-ips” cannot permit the same subnet on two separate Nodes or tunnel traffic will only be directed to only 1 node’s incus subnet (either node1 or node2)

First, I was surprised they were identical on 2 separate servers as I had thought they would be more random IP addressed than that.

I copied & used the same script to create the projects on both node1 & node2.
And that script does not configure any IP address for the project bridges?

Can anyone help me understand how 2 separate servers are getting identical 10.x.x.x/24 addresses assigned to incus project bridges on each server?

note1:
yes I understand that in this test environment config it might be possible for this to occur
but
I’ve repeated this 3-4 times and the results are the same every time.
Each time both server/nodes end up with identical IP addresses for the incus bridges assigned to each project on each node

note2:
I have not looked at using OVN in this yet but have seen some comments that OVN might be a way to resolve this conundrum.

thanks for any ideas/thoughts

simos · July 2, 2025, 6:15am

When a managed (by Incus) network is created, there is code in there to select a 10.x.x.x/24 network that, to the best of its knowledge, will not have any conflicts elsewhere. Obviously, an Incus setup on one server is not aware of a potential conflict on a separate Incus setup on another server.

As far as I can remember, the network is generated randomly; a random network is generated, it’s checked that there is no conflict. If there is conflict, it will generate a new one. And do this for 100 times before giving up and informing the user. It’s weird that you are getting same networks on DO. As if rand() is not properly seeded. Or, something else.

Ok, I searched in there and it’s here.

github.com/lxc/incus

internal/server/network/network_utils.go

main


      
          		// Signal dnsmasq.
          		err = dnsmasq.Kill(network, true)
          		if err != nil {
          			return err
          		}
          	}
          
          	return nil
          }
          
          func randomSubnetV4() (string, error) {
          	for range 100 {
          		cidr := fmt.Sprintf("10.%d.%d.1/24", rand.Intn(255), rand.Intn(255))
          		_, subnet, err := net.ParseCIDR(cidr)
          		if err != nil {
          			continue
          		}
          
          		if inRoutingTable(subnet) {
          			continue
          		}

Can you check whether the server IP addresses on DO are in the 10.x.x.x range? And whether your server is able to somehow access the 10.x.x.x range of other customers?

oddjobz · July 2, 2025, 10:21am

Hi, just jumping ahead a few steps, it the aim to effectively have a virtual network segment that allows tenant1 on node1 to talk to tenant1 on node2?

If so (depending on what you want to do) you might consider using VXLAN over your WG connection to essentially extend a L2 bridge between boxes over your L3 wg tunnel. This means that instances on both nodes share the same network segment (/24 in this case) and behave like they’re all on the same LAN. Downsides are that you need to do the networking yourself and you need to run your own DHCP server. (albeit 4 lines of config)

You can also do the same with a GRE tunnel over WG, maybe fractionally easier but you need to be a little careful with loops if you scale, I found container DHCP doesn’t seem to play so well with STP turned on.

bmullan · July 2, 2025, 12:47pm

@oddjobz

Thanks.
Yes, I’m well aware of VxLAN config & use for a very long time.
About 7 yrs ago I was working on this:
GitHub - bmullan/CIAB.Full-Mesh.VPN.Wireguard.FRR.BGP.VXLAN.Internet.Overlay.Architecture: CIAB Full Mesh VPN Internet Overlay Implemented using Wireguard, FRR, BGP, BGP-VRFs, VXLAN and LXD VMs and Containers

That worked really well, but it took more configuration. Not a lot but…?
I could have containers/VMs on a subnet on one server node work directly with containers/VMs on any other Mesh node at both L2/L3…

I’m retired, but to keep busy I spend my time w networking related linux/incus/lxd things. But much of it is generally around full mesh vpn w/incus compute resources running on cloud hosts (dig ocean, hetzner, aws)

bmullan · July 2, 2025, 1:25pm

@simos
Hey thanks for looking Simos!

From my old networking days…

DHCP automatically assigns IP addresses and other network configuration
information to nodes on a “network”.

After receiving an IP address from a DHCP server, a “client” sends an ARP probe
(re ARP request), but before actually using the IP.

The ARP is used to ensure the DHCP allocated IP address isn’t already in use by
any other device on the “network”.

If a response is received to this ARP probe…
it indicates a conflict, and the “client” will send a DHCP Decline message to the
server, requesting a different new IP address
<rinse & repeat>

That’s why this situation caught my eye. It just seemed weird!

I thought it would be statistically unlikely that 4 repetitions of install on 2 different servers ended with IPs on each server being mirrored somehow (ESP?).

And yes, I realize that the post DHCP IP allocation client ARP would not detect that the IP was already in use on a totally different server. The mystery to me is how this is occurring at all?

So this morning, I created 2 new Digital Ocean servers and retried…

Prep Work

Ubuntu 24.04
Two Server/Nodes on Digital Ocean

Node1
Node2

On each NODE:

apt install incus incus-client
incus admin init (accepted all defaults)

Now with Incus installed on each Node, check IPs of each Node’s Interfaces

NODE1

lo : 127.0.0.1
eth0 : 159.65.47.31
eth0 : 10.17.0.5
eth1 : 10.108.0.6
wt0 : 100.126.112.147
incusbr0 : 10.117.189.1 ??? - Same IP on both Nodes - Reinstalled 4 times & same thing
Server/Node PublicIP : 159.65.47.31

NODE2

lo : 127.0.0.1
eth0 : 174.138.74.11
eth0 : 10.17.0.6
eth1 : 10.108.0.5
wt0 : 100.126.120.51
incusbr0 : 10.117.189.1 ???
Server/Node PublicIP : 174.138.74.11

Note:
Incus Projects have not been created yet!

Last night I just manually assigned different IPs but how this is occurring still bugs me and I wouldn’t want to have to remember to check & do that all the time.

simos · July 2, 2025, 2:11pm

There’s always an explanation, and it’s cool to try to figure out what’s going on.
In terms of random number generators, these are seeded with something random. If they are seeded with the same seed, then they output the same sequence.

In terms of seeding in Incus, here are the places:

I am not sure which invocation of the function GetStableRandomGenerator() is affected in the generation of the managed bridged network. The answer should be somewhere there though.

oddjobz · July 2, 2025, 10:10pm

Sorry, didn’t mean to patronize, it’s just you said;

I have not looked at using OVN in this yet but have seen some comments that OVN might be a way to resolve this conundrum.

After spending a lot of time on OVN recently I may be a little jaded, but using OVN to solve this type of issue feels a little sledge-hammer and nut.

bmullan · July 2, 2025, 11:01pm

like I said I have not looked at using OVN. Never having used it I don’t have any idea good use-cases for it are. Seems like most things I’ve read are OVN & Clusters.

oddjobz · July 2, 2025, 11:19pm

Yeah, that was me in January. I finally got my network stable and live on OVN in June, only to have it fall in a heap a few days later.

GRE works Ok but VxLAN pretty much does everything OVN does (at least within the context of my requirements), possibly with the exception of managed networks. (which I think is just down to Incus’s choice of what to support and how) There seems to be some support for VxLAN talking to Incus BGP for ‘proper’ routing, which might be needed at scale, but for a few nodes arp flooding seems to work fine for me.

Notably there are a number of unexpected quirks I found with OVN. Although you can mesh two sites with OVN-IC so cluster nodes talk directly to each other, it only allows for one route out of the network … so if you have OVN on a three node cluster, the OVN network have one outgoing gateway (on one node), which introduces an unwanted bottleneck. Not sure how this would look with 100 nodes…

bmullan · July 3, 2025, 3:03pm

@oddjobz
I’ve been using a CLI tool for quite awhile that (to me) is pretty unique.

VxWireguard-Generator, Utility to generate VXLAN over Wireguard mesh SD-WAN configuration

You have to build the python app but the author has made everything, including actual use of VxWireguard-Generator really easy. Its also pretty flexible.

oddjobz · July 3, 2025, 4:57pm

@bmullan Thanks, I yet may come to that, but for now on a relatively small-scale it’s probably a good bit more than I need. When it comes to simplicity and maintainability trying for OVN for me was a massive over-reach so I’ve kinda learnt my lesson.

I’m managing to do everything in a single netplan file of 65 lines which makes things very easy. The only gotcha I came across is that netplan doesn’t support scripting so getting the fdb entries set up needs a tiny bash script and “networkd-dispatcher”.

I’m about to start scaling so fingers crossed nothing else unexpected turns up

bmullan · July 3, 2025, 6:24pm

Since you’ve already gone through the pain of figuring out how to do some of this with wireguard using netplan, Maybe think about putting together a “How-to” guide. Who knows if you put it on GitHub others might contribute ideas or improvements.
Good luck…

For what it’s worth, my latest experimenting involves using self-hosted Netbird. Using the web UI is easy and convenient.

And for interconnecting many servers hosting Incus instances… The Web UI is really quick and convenient to add/drop etc different nodes/hosts & their Incus subnets to and from the wireguard mesh you create.

Over the past couple months I built several bash scripts to install and configure. They are even labeled step1 step2 step3 step4

Total self-hosted netbird install takes maybe 10 minutes. But you do need a domain name to use & a public IP for netbird.

It even has OTP security. I have some nodes that are behind CGNAT and they all connect just fine and containers on nodeX can work directly with containers on nodeY

If you ever get interested in wanting to try it send me a PM to get the scripts I use.

oddjobz · July 4, 2025, 12:05am

Yeah I’ve seen a lot about Tailscale recently, Netbird looks to be a similar sort of thing (?) I must admit I’m not a great fan of hosted solutions like this for myself, I’d far rather self-host and where necessary write my own code. (although the UI does look very nice) Another one I started looking at was Pangolin although I’m not sure how it relates / compares to Tailscale / Netbird.

I need to get a self hosted IAM/SSO going too, Auth0 is Ok but I’m not overly happy about having to rely on a third party for security / logins.

I’ll certainly have a crack at a Howto once I’m done, but I currently have 4 different networks / clusters that all need merging into VxLAN first, which may take me a little time

bmullan · July 4, 2025, 1:48am

That’s what I was talking about…

self-hosted Netbird.

My install scripts have everything up & running including OTP authentication in 10 minutes or so.

I do not use netbirds cloud hosted service.

osch · July 4, 2025, 2:08am

Wonder if you have looked at Authentik? It should do all what you need and is very flexible to configure.

oddjobz · July 4, 2025, 3:17am

That’s what I was talking about…

Ahh, I’d not twigged there was a fully self-hosted version, when you said it needs a public IP I assumed that was tying it to some hosting somewhere. I’ll take a closer look

oddjobz · July 4, 2025, 3:18am

Wonder if you have looked at Authentik?

Mmm, not sure I’ve seen that one, thanks, will take a look.

hereisjames · July 13, 2025, 9:11am

It’s pretty heavy if your scale is relatively small and your usecases mostly straightforward. You might also like Pocket-ID + Tinyauth, which is much lighter but not quite as all-encompassing.

Otherwise Authelia, Zitadel etc exist.

candlerb · July 14, 2025, 12:06pm

Another option worth looking into is kanidm. It is lightweight, written in Rust, and does not need an external database (but still supports HA). It also has some interesting features such as a read-only LDAP server and a wifi RADIUS server.

Sadly, the web admin UI has been removed while they reimplement it. So it’s CLI-only management at the moment. And it doesn’t yet have per-user custom attributes.

Authelia needs an ldap server (like LLDAP server) for the user database. Zitadel is IMO somewhat weird, but does support some multi-tenant and B2B use cases.