How to recover a failed cluster node after "lxc import"

kruska · April 22, 2023, 7:00pm

Hello everyone,

I have a small LXD cluster with 3 nodes and local btrfs storage as an experiment. I just wanted to simulate, what happens if one of the nodes is failing. I don’t need distributed storage, the container backup and manual recovery is good enough for our purposes.

So I halted node02:

+------------------+---------------------------+-----------------+--------------+----------------+-------------+---------+----------------------------------------------------------------------------+
|       NAME       |            URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION |  STATE  |                                  MESSAGE                                   |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+---------+----------------------------------------------------------------------------+
| m2cluster-node01 | https://192.168.64.6:8443 | database-leader | aarch64      | default        |             | ONLINE  | Fully operational                                                          |
|                  |                           | database        |              |                |             |         |                                                                            |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+---------+----------------------------------------------------------------------------+
| m2cluster-node02 | https://192.168.64.7:8443 | database        | aarch64      | default        |             | OFFLINE | No heartbeat for 4m31.479229085s (2023-04-22 17:39:42.305293643 +0000 UTC) |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+---------+----------------------------------------------------------------------------+
| m2cluster-node03 | https://192.168.64.8:8443 | database        | aarch64      | default        |             | ONLINE  | Fully operational                                                          |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+---------+----------------------------------------------------------------------------+

The container “deploy” running on node02 reports ERROR status:

+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| admin   | RUNNING | 192.168.64.14 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe9e:c48 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| backend | RUNNING | 192.168.64.15 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe29:c383 (eth0) | CONTAINER | 0         | m2cluster-node01 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| deploy  | ERROR   |                      |                                               | CONTAINER | 0         | m2cluster-node02 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| public  | RUNNING | 192.168.64.13 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe1c:e20 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+

My intention was here to restore “deploy” from container backup.
So I tried different things before that:

Evacuate node02:

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc cluster evacuate m2cluster-node02
Are you sure you want to evacuate cluster member "m2cluster-node02"? (yes/no) [default=no]: yes
Error: Failed to update cluster member state: Missing event connection with target cluster member

Move the unavailable container to node01:

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc move deploy --target m2cluster-node01
Error: Failed loading instance storage pool: Failed getting instance storage pool name: Instance storage pool not found

Delete the unavailable container from the cluster:

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc delete deploy 
Error: Failed checking instance exists "local:deploy": Missing event connection with target cluster member

So I finally restored the container to node01 from backup:

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc import deploy.img
[ rc0 ]-[root@m2cluster-node01]-[~] # lxc list
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
|  NAME   |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |     LOCATION     |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| deploy  | STOPPED |                      |                                               | CONTAINER | 0         | m2cluster-node01 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
[ rc0 ]-[root@m2cluster-node01]-[~] # lxc start deploy
[ rc0 ]-[root@m2cluster-node01]-[~] # lxc list
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
|  NAME   |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |     LOCATION     |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| deploy  | RUNNING | 192.168.64.12 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fef3:b64e (eth0) | CONTAINER | 0         | m2cluster-node01 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+

So far, so good. Then I started node02 back. Cluster member state was recovered, and the restored container is running on its new location, on node01.

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc cluster list
+------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
|       NAME       |            URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| m2cluster-node01 | https://192.168.64.6:8443 | database-leader | aarch64      | default        |             | ONLINE | Fully operational |
|                  |                           | database        |              |                |             |        |                   |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| m2cluster-node02 | https://192.168.64.7:8443 | database        | aarch64      | default        |             | ONLINE | Fully operational |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| m2cluster-node03 | https://192.168.64.8:8443 | database        | aarch64      | default        |             | ONLINE | Fully operational |
+------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
[ rc0 ]-[root@m2cluster-node01]-[~] # lxc list
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
|  NAME   |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |     LOCATION     |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| admin   | RUNNING | 192.168.64.14 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe9e:c48 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| backend | RUNNING | 192.168.64.15 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe29:c383 (eth0) | CONTAINER | 0         | m2cluster-node01 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| deploy  | RUNNING | 192.168.64.12 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fef3:b64e (eth0) | CONTAINER | 0         | m2cluster-node01 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| public  | RUNNING | 192.168.64.13 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe1c:e20 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+---------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+

Meanwhile, on node02 we have the old storage subvolume for container “deploy”:

[ rc0 ]-[root@m2cluster-node02]-[~] # btrfs su li /mnt/lxd
ID 257 gen  51 top level 5 path images/4ba589a5d05a4cc...
ID 271 gen 112 top level 5 path containers/web-dev_deploy

Now I tried to move the container to its original location, to node02:

[ rc0 ]-[root@m2cluster-node01]-[~] # lxc move deploy --target m2cluster-node02
Error: Rename instance operation failed: Rename instance: UNIQUE constraint failed: storage_volumes.storage_pool_id, storage_volumes.node_id, storage_volumes.project_id, storage_volumes.name, storage_volumes.type

[ rc1 ]-[root@m2cluster-node01]-[~] # lxc list
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
|                       NAME                       |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |     LOCATION     |
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| admin                                            | RUNNING | 192.168.64.14 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe9e:c48 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| backend                                          | RUNNING | 192.168.64.15 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe29:c383 (eth0) | CONTAINER | 0         | m2cluster-node01 |
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| lxd-move-of-869f0171-403e-4950-85d5-624016f6faf7 | STOPPED |                      |                                               | CONTAINER | 0         | m2cluster-node02 |
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+
| public                                           | RUNNING | 192.168.64.13 (eth0) | fdad:f3da:86ea:f4b3:216:3eff:fe1c:e20 (eth0)  | CONTAINER | 0         | m2cluster-node03 |
+--------------------------------------------------+---------+----------------------+-----------------------------------------------+-----------+-----------+------------------+

[ rc0 ]-[root@m2cluster-node02]-[~] # btrfs su li /mnt/lxd
ID 257 gen  51 top level 5 path images/4ba589a5d05a4cc...
ID 271 gen 112 top level 5 path containers/web-dev_deploy
ID 272 gen 120 top level 5 path containers/web-dev_lxd-move-of-869f0171-403e-4950-85d5-624016f6faf7

I can’t rename the container back to its original name, even after deleting the storage subvolume. It seems the database must be cleaned up somehow.

Is there a better procedure for the node failover? Should I simple wipe all the LXD content, create a new node and join the cluster?

Thank you very much - and sorry for the long description.

tomp · May 12, 2023, 9:52am

When using instances on local storage pools you cannot recover them onto a different instance, because the instance only exists on that offline cluster member.

You also should not have been able to import the instance from backup as the instance name should have conflicted with the existing instance on the offline member.

From your post what confused me is that after doing lxc import deploy.img the lxc list output only showed the newly imported instance and not the other ones that were previously shown.

So I tried recreating it…

Create a 3 member cluster:

v1:

root@v1:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this server? [default=10.21.203.9]: 
Are you joining an existing cluster? (yes/no) [default=no]: 
What member name should be used to identify this server in the cluster? [default=v1]: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: 
Name of the storage backend to use (dir, lvm, zfs, btrfs) [default=zfs]: btrfs
Create a new BTRFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GiB of the new loop device (1GiB minimum) [default=5GiB]: 
Do you want to configure a new remote storage pool? (yes/no) [default=no]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: 
Would you like to create a new Fan overlay network? (yes/no) [default=yes]: 
What subnet should be used as the Fan underlay? [default=auto]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

root@v1:~# lxc cluster add v2
Member v2 join token:
eyJzZXJ2ZXJfbmFtZSI6InYyIiwiZmluZ2VycHJpbnQiOiJlMTRjMzQ3Yjg1MmE5N2Q0NzczZTVlYWZhMDU4NjNjMmZmNWQ4ZTJlYTg4MjZkOTFmZmQzMmNiMTRmMGM5MDkxIiwiYWRkcmVzc2VzIjpbIjEwLjIxLjIwMy45Ojg0NDMiXSwic2VjcmV0IjoiNTEwNWZkMWM4ODJiOWNjM2E2ODMwZTk2NzAzMzQwNGY5MjkzODA3OTg2NTA2MWFiZGMzYzY4N2EyMmVkNjYxZCIsImV4cGlyZXNfYXQiOiIyMDIzLTA1LTEyVDEyOjUwOjAxLjU4ODU0OTg5MloifQ==

root@v1:~# lxc cluster add v3
Member v3 join token:
eyJzZXJ2ZXJfbmFtZSI6InYzIiwiZmluZ2VycHJpbnQiOiJlMTRjMzQ3Yjg1MmE5N2Q0NzczZTVlYWZhMDU4NjNjMmZmNWQ4ZTJlYTg4MjZkOTFmZmQzMmNiMTRmMGM5MDkxIiwiYWRkcmVzc2VzIjpbIjEwLjIxLjIwMy45Ojg0NDMiXSwic2VjcmV0IjoiMDBhM2FiMTljMmE5MWEyZDY4YmI0ZDc4ZTQ0NzE3MzU5MDMwODc3ZWZlMThiZDhmZDg5ZGQ0YTkzMmFmNzM4NSIsImV4cGlyZXNfYXQiOiIyMDIzLTA1LTEyVDEyOjUwOjAyLjQyNDA0MzAxOVoifQ==

v2:

root@v2:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this server? [default=10.21.203.8]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no/[token]) [default=no]: eyJzZXJ2ZXJfbmFtZSI6InYyIiwiZmluZ2VycHJpbnQiOiJlMTRjMzQ3Yjg1MmE5N2Q0NzczZTVlYWZhMDU4NjNjMmZmNWQ4ZTJlYTg4MjZkOTFmZmQzMmNiMTRmMGM5MDkxIiwiYWRkcmVzc2VzIjpbIjEwLjIxLjIwMy45Ojg0NDMiXSwic2VjcmV0IjoiNTEwNWZkMWM4ODJiOWNjM2E2ODMwZTk2NzAzMzQwNGY5MjkzODA3OTg2NTA2MWFiZGMzYzY4N2EyMmVkNjYxZCIsImV4cGlyZXNfYXQiOiIyMDIzLTA1LTEyVDEyOjUwOjAxLjU4ODU0OTg5MloifQ==
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

v3:

root@v3:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this server? [default=10.21.203.7]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no/[token]) [default=no]: eyJzZXJ2ZXJfbmFtZSI6InYzIiwiZmluZ2VycHJpbnQiOiJlMTRjMzQ3Yjg1MmE5N2Q0NzczZTVlYWZhMDU4NjNjMmZmNWQ4ZTJlYTg4MjZkOTFmZmQzMmNiMTRmMGM5MDkxIiwiYWRkcmVzc2VzIjpbIjEwLjIxLjIwMy45Ojg0NDMiXSwic2VjcmV0IjoiMDBhM2FiMTljMmE5MWEyZDY4YmI0ZDc4ZTQ0NzE3MzU5MDMwODc3ZWZlMThiZDhmZDg5ZGQ0YTkzMmFmNzM4NSIsImV4cGlyZXNfYXQiOiIyMDIzLTA1LTEyVDEyOjUwOjAyLjQyNDA0MzAxOVoifQ==
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

Now launch an instance:

root@v3:~# lxc launch images:ubuntu/jammy c1 --target=v3
root@v3:~# lxc list
+------+---------+-------------------+------+-----------+-----------+----------+
| NAME |  STATE  |       IPV4        | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+------+---------+-------------------+------+-----------+-----------+----------+
| c1   | RUNNING | 240.7.0.72 (eth0) |      | CONTAINER | 0         | v3       |
+------+---------+-------------------+------+-----------+-----------+----------+

Now export it for later and move it off of v3.

root@v3:~# lxc export c1 c1.tar.gz
Backup exported successfully!

Now stop v3 host and check its status from another one:

root@v1:~# lxc cluster list
+------+--------------------------+-----------------+--------------+----------------+-------------+---------+--------------------------------------------------------------------------+
| NAME |           URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION |  STATE  |                                 MESSAGE                                  |
+------+--------------------------+-----------------+--------------+----------------+-------------+---------+--------------------------------------------------------------------------+
| v1   | https://10.21.203.9:8443 | database-leader | x86_64       | default        |             | ONLINE  | Fully operational                                                        |
|      |                          | database        |              |                |             |         |                                                                          |
+------+--------------------------+-----------------+--------------+----------------+-------------+---------+--------------------------------------------------------------------------+
| v2   | https://10.21.203.8:8443 | database        | x86_64       | default        |             | ONLINE  | Fully operational                                                        |
+------+--------------------------+-----------------+--------------+----------------+-------------+---------+--------------------------------------------------------------------------+
| v3   | https://10.21.203.7:8443 | database        | x86_64       | default        |             | OFFLINE | No heartbeat for 21.726806449s (2023-05-12 09:55:08.588114431 +0000 UTC) |
+------+--------------------------+-----------------+--------------+----------------+-------------+---------+--------------------------------------------------------------------------+

root@v1:~# lxc list
+------+-------+------+------+-----------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+------+-------+------+------+-----------+-----------+----------+
| c1   | ERROR |      |      | CONTAINER | 0         | v3       |
+------+-------+------+------+-----------+-----------+----------+

Right, now lets try and import the backup onto a different cluster member:

root@v1:~# lxc import c1.tar.gz
root@v1:~# lxc ls                     
+------+---------+------+------+-----------+-----------+----------+
| NAME |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+------+---------+------+------+-----------+-----------+----------+
| c1   | STOPPED |      |      | CONTAINER | 0         | v1       |
+------+---------+------+------+-----------+-----------+----------+

Hrm, ok, so that worked, weird, ah I see now that in createFromBackup it calls internalImportFromBackup with the force argument set to true, which will delete any existing DB records:

github.com

lxc/incus/blob/399e6794b2fdbd0821f64910202478f2e25a023f/lxd/instances_post.go#L695


      
          		// a post hook that can be run once the instance has been created in the database to run any
          		// storage layer finalisations, and a revert hook that can be run if the instance database load
          		// process fails that will remove anything created thus far.
          		postHook, revertHook, err := pool.CreateInstanceFromBackup(*bInfo, backupFile, nil)
          		if err != nil {
          			return fmt.Errorf("Create instance from backup: %w", err)
          		}
          
          		runRevert.Add(revertHook)
          
          		err = internalImportFromBackup(s, bInfo.Project, bInfo.Name, true, instanceName != "")
          		if err != nil {
          			return fmt.Errorf("Failed importing backup: %w", err)
          		}
          
          		inst, err := instance.LoadByProjectAndName(s, bInfo.Project, bInfo.Name)
          		if err != nil {
          			return fmt.Errorf("Load instance: %w", err)
          		}
          
          		// Clean up created instance if the post hook fails below.

So that explains that. Although I’m not sure it should be setting force to true its guaranteed to leave the storage in an inconsistent state (as you noticed).

I think what I would do is this:

When the instance is in an error state because the cluster member is permanently offline, remove that member from the cluster:

Force-removing a cluster member will leave the member’s database in an inconsistent state (for example, the storage pool on the member will not be removed). As a result, it will not be possible to re-initialize LXD later, and the server must be fully reinstalled.

root@v1:~# lxc cluster remove --force v3
Forcefully removing a server from the cluster should only be done as a last
resort.

The removed server will not be functional after this action and will require a
full reset of LXD, losing any remaining instance, image or storage volume
that the server may have held.

When possible, a graceful removal should be preferred, this will require you to
move any affected instance, image or storage volume to another server prior to
the server being cleanly removed from the cluster.

The --force flag should only be used if the server has died, been reinstalled
or is otherwise never expected to come back up.

Are you really sure you want to force removing v3? (yes/no): yes
Member v3 removed

root@v1:~# lxc cluster list
+------+--------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| NAME |           URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+------+--------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| v1   | https://10.21.203.9:8443 | database-leader | x86_64       | default        |             | ONLINE | Fully operational |
|      |                          | database        |              |                |             |        |                   |
+------+--------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| v2   | https://10.21.203.8:8443 |                 | x86_64       | default        |             | ONLINE | Fully operational |
+------+--------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
root@v1:~# lxc ls
+------+-------+------+------+------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+------+-------+------+------+------+-----------+----------+

Now the instance records for that removed member are gone and we can import the instances from backup cleanly.

The offline member can then be destroyed, rebuilt and re-added to the cluster afresh.

tomp · May 12, 2023, 10:20am

@stgraber would you expect that a lxc import <file> should fail if there is a conflicting instance already existing, or should it delete the existing instance and import over the top?

tomp · May 23, 2023, 1:25pm

github.com/lxc/lxd

Instance: Reject import if conflicting DB records found

lxc:master ← tomponline:tp-import

opened 01:25PM - 23 May 23 UTC

tomponline

+16 -32

Discovered from https://discuss.linuxcontainers.org/t/how-to-recover-a-failed-cl…uster-node-after-lxc-import/17013/2?u=tomp `internalImportFromBackup` originally harked from the `lxd import` recovery (as well as still being used for `lxc import). However now that we have `lxd recover`, the `internalImportFromBackup` doesn't need to support a `force` option as when we are restoring an instance from a backup we should require that there are no conflicting DB entries before completing.