LXD 3.23 - Cluster setup with 2 networks

I am trying to configure an LXD cluster with 2 networks, but don’t understand what I should do to accomplish this.

My setup:

I have an LXD cluster setup with 7 hosts, each with 2 network interfaces. One of the interfaces is connected to the LAN, and picks up its IP address using DHCP. The other “data” interface is connected to a switch (along with the other hosts in the cluster), is configured to use DHCP, but does not pick up an address because there is no DHCP server available.

When configuring the LXD cluster, I chose fan networking, which created an ldxfan0 bridge on every node … associated with IP addresses derived from the host IP address on the LAN interface. I have verified that inter-node connectivity between instances launched on different cluster nodes works as expected.

I would now like to create a secondary “data” network for the cluster, so I can direct traffic from some of the instances to use the higher-bandwidth data interface/switch.

I tried the following, but am clearly doing something wrong …
(eno2 and enp3s0f0 are the host interfaces associated with the “data” network. I have not modified the netplan config settings on the host).

akriadmin@c4akri01:~/scripts$ lxc network create --target c4akri01 lxddata0 bridge.external_interfaces=eno2
Network lxddata0 pending on member c4akri01
akriadmin@c4akri01:~/scripts$ lxc network create --target c4akri02 lxddata0 bridge.external_interfaces=eno2
Network lxddata0 pending on member c4akri02
akriadmin@c4akri01:~/scripts$ lxc network create --target c4akri03 lxddata0 bridge.external_interfaces=eno2
Network lxddata0 pending on member c4akri03
akriadmin@c4akri01:~/scripts$ lxc network create --target c4akri04 lxddata0 bridge.external_interfaces=eno2
Network lxddata0 pending on member c4akri04
akriadmin@c4akri01:~/scripts$ lxc network create --target c4astore01 lxddata0 bridge.external_interfaces=enp3s0f0
Network lxddata0 pending on member c4astore01
akriadmin@c4akri01:~/scripts$ lxc network create --target c4astore02 lxddata0 bridge.external_interfaces=enp3s0f0
Network lxddata0 pending on member c4astore02
akriadmin@c4akri01:~/scripts$ lxc network create --target c4astore03 lxddata0 bridge.external_interfaces=enp3s0f0
Network lxddata0 pending on member c4astore03
akriadmin@c4akri01:~/scripts$ lxc network create lxddata0 ipv4.dhcp=false ipv4.nat=false ipv6.dhcp=false ipv6.nat=false
Error: FOREIGN KEY constraint failed
akriadmin@c4akri01:~/scripts$ 

I tried running the LXD daemon in debug mode, but the logs there also just show that there is a database protocol error.

Hmm, I’m not seeing any obvious reason why this wouldn’t work… It may be a bug.

Can you show lxd sql global "SELECT * FROM networks;" and lxd sql global "SELECT * FROM networks_config;"

Here you go:

akriadmin@c4akri01:~/scripts$ lxd sql global "SELECT * from networks;"
+----+----------+-------------+-------+
| id |   name   | description | state |
+----+----------+-------------+-------+
| 1  | lxdfan0  |             | 1     |
| 4  | lxddata0 | <nil>       | 0     |
+----+----------+-------------+-------+
akriadmin@c4akri01:~/scripts$ lxd sql global "SELECT * from networks_config;"
+----+------------+---------+----------------------------+---------------+
| id | network_id | node_id |            key             |     value     |
+----+------------+---------+----------------------------+---------------+
| 1  | 1          | <nil>   | bridge.mode                | fan           |
| 2  | 1          | <nil>   | fan.underlay_subnet        | 10.30.30.0/24 |
| 17 | 4          | 1       | bridge.external_interfaces | eno2          |
| 18 | 4          | 2       | bridge.external_interfaces | eno2          |
| 19 | 4          | 3       | bridge.external_interfaces | eno2          |
| 20 | 4          | 4       | bridge.external_interfaces | eno2          |
| 21 | 4          | 5       | bridge.external_interfaces | enp3s0f0      |
| 22 | 4          | 6       | bridge.external_interfaces | enp3s0f0      |
| 23 | 4          | 7       | bridge.external_interfaces | enp3s0f0      |
+----+------------+---------+----------------------------+---------------+

Yeah, the DB state looks good too, this definitely feels like a bug.
I’m assuming you’ve confirmed the interface names are correct on all nodes?

root@cluster2:~# lxc network create --target cluster1 test bridge.external_interfaces=ext1
Network test pending on member cluster1
root@cluster2:~# lxc network create --target cluster2 test bridge.external_interfaces=ext2
Network test pending on member cluster2
root@cluster2:~# lxc network create test ipv4.dhcp=false ipv4.nat=false ipv6.dhcp=false ipv6.nat=false
Error: FOREIGN KEY constraint failed
root@cluster2:~# 

Yes …

akriadmin@c4akri01:~/scripts$ ./run-physical-hosts.sh "ip a | egrep '(eno2|enp3s0f0)'"
[c4akri01]:
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4akri02]:
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4akri03]:
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4akri04]:
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4astore01]:
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
4: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4astore02]:
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
4: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
[c4astore03]:
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
4: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
akriadmin@c4akri01:~/scripts$

Thanks for the quick fix.

Is there a recommended way for me to get these fixes? I expect there is a lag before these fixes make it into a snap release.

We cherry-pick fixes very regularly. I expect to have this hit candidate later today and it will be in the 4.0 stable snap we push out tomorrow.

I have this exact issue as well and it’s blocking a prototype production cluster. Is there anyway to get the fix short of moving to the 4.0 candidate branch?

4.0 is rolling out to stable now

Now as in “now now” :smiley:

Oh, yes, I see it is. Any huge gotchas one should be aware of beyond “upgrade all the nodes”?

Nope, it’s a very small bump from 3.23 to 4.0, much smaller than a normal feature release.

Ah, nice. Hopefully it goes well then :smiley: Going to try it now