LXD clustering problem

Very odd issue… I am using 4.1 from the snap repo on Ubuntu 18.04.
I have just added a second node by the name of lxd02 which has the ip of 10.10.10.100

lxc copy nms1-xdc/snap1 lxd02:nms-xdc --verbose
Error: Failed instance creation: Error transferring instance data: Unable to connect to: 10.10.10.20:8443

lxd01 can connect and launch containers on lxd02
lxd01 cannot list the containers on lxd02 directly (lxc list lxc02 is empty).

The IP address in the error is a historical IP address. DNS and local hosts file point lxd02 to 10.10.10.100

@lxd01:~$ lxc cluster list
+-------+--------------------------+----------+--------+-------------------+--------------+
| NAME  |           URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd01 | https://10.10.10.101:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd02 | https://10.10.10.100:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+
lxd01:~$ 


@lxd02:~$ lxc cluster show lxd01
roles: []
architecture: x86_64
server_name: lxd01
url: https://10.10.10.101:8443
database: false
status: Online
message: fully operational

@lxd02:~$ lxc cluster show lxd02
roles: []
architecture: x86_64
server_name: lxd02
url: https://10.10.10.100:8443
database: false
status: Online
message: fully operational
@lxd02:~$ 

I am pretty stumped…

Why is it looking at an old IP address and how do i remove it?
Is there anything else I am doing wrong (aside from only having two machines in a cluster (so far))

Thanks in advance.

https://stgraber.org/2016/10/27/network-management-with-lxd-2-3/
Does this help?

A little bit… some more commands to wield thanks.
I am using macvlans for my containers… so if i have a look at my interfaces I see:

lxc network show lxdfan0 
config:
  bridge.mode: fan
  fan.underlay_subnet: 10.10.10.0/24
description: ""
name: lxdfan0
type: bridge
used_by:
- /1.0/instances/test1
managed: true
status: Created
locations:
- lxd01
- lxd02


@lxd02:~$ lxc network show eno1
config: {}
description: ""
name: eno1
type: physical
used_by:
- /1.0/instances/jump1-xdc
- /1.0/instances/nms1-xdc
managed: false
status: ""
locations: []

I notice that the locations are blank on the physical interface… I am not sure if that’s relevant. But still I cannot find a reference to the historical IP address of lxd02.

Can you show output of:

lxc remote ls

Hi,
Thanks for your reply.

The output from lxd01…

@lxd01:~$ lxc remote ls
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | none        | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| local (default) | unix://                                  | lxd           | file access | NO     | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| lxd02           | https://10.10.10.100:8443                 | lxd           | tls         | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | none        | YES    | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+

And from lxd02…

lxd02:~$ lxc remote ls
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | none        | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| local (default) | unix://                                  | lxd           | file access | NO     | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | none        | YES    | YES    |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+

This just raises more questions for meas lxd02 doesn’t show lxd01 as a remote. Still no sign of the elusive IP address though.

@freeekanayaka maybe able to assist here as the clustering subsystem is not my forte.

I’m perplexed that lxc cluster list shows NO in the DATABASE column for both of your nodes.

Can you paste the output of lxd sql local "SELECT * FROM raft_nodes" please?

As for the remote, I presume you added lxd02 as remote yourself, because lxd should not add any remote automatically (beside the default ones).

Thanks for the reply. To add to the cluser i did the lxd init from lxd02 to add to cluster which worked, then as lxd01 wasn’t seeing lxd02 i added it as a remote manually.

@lxd01:~$ lxd sql local "SELECT * FROM raft_nodes"
+----+-------------------------+------+
| id |     address               | role |
+----+-------------------------+------+
| 1  | 10.10.10.101:8443 | 0    |
| 2  | 10.10.10.100:8443 | 1    |
+----+-------------------------+------+

Can you paste again the output of lxc cluster list, ran from both lxd01 and lxd02, please?

As for your original issue, please don’t use remotes with clustering. You should first lxc remote remove lxd02, then if you want to copy a container snapshot from one lxd01 to lxd02, run:

lxc copy nms1-xdc/snap1 nms-xdc --target lxd02
1 Like

Sure…

LXD01

@lxd01:~$ lxc cluster list
+-------+--------------------------+----------+--------+-------------------+--------------+
| NAME  |           URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd01 | https://10.10.10.101:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd02 | https://10.10.10.100:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+

LXD02

@lxd02:~$ lxc cluster list
+-------+--------------------------+----------+--------+-------------------+--------------+
| NAME  |           URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd01 | https://10.10.10.101:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd02 | https://10.10.10.100:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+

That worked.

That worked too…

Do i still potentially have an issue with the database?

Thanks again

You might. What’s the output of lxd sql global "SELECT * FROM nodes_roles"?

The output is the same on both lxd01 & lxd02

lxd sql global "SELECT * FROM nodes_roles"
+---------+------+
| node_id | role |
+---------+------+
+---------+------+

For some reason you don’t have the database role on lxd01. Please run:

lxd sql global "INSERT INTO nodes_roles(node_id, role) VALUES(1, 0)"

and that should fix it.

Diamond! That looks better :smiley:

+-------+--------------------------+----------+--------+-------------------+--------------+
| NAME  |           URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd01 | https://10.10.10.101:8443 | YES      | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+
| lxd02 | https://10.10.10.100:8443 | NO       | ONLINE | fully operational | x86_64       |
+-------+--------------------------+----------+--------+-------------------+--------------+

Everything seems to work ok. Thank you very much for your assistance.