Phantom db entries after (somewhat) botched lxd recover

Aleks · May 22, 2023, 3:42pm

today I was recovering my cluster node after partial data loss and forgot to restore one storage volume, but afterwards I got phantom db entries and can not continue.

So, it started like this

[root@lxd11 ~]# lxd recover                                                                                                                                                                               
This LXD server currently has the following storage pools:                                                                                                                                                
 - ee (backend="zfs", source="")                                                                                                                                                                          
The recovery process will be scanning the following storage pools:                                                                                                                                        
 - EXISTING: "ee" (backend="zfs", source="")                                                                                                                                                              
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: yes                                                                                                                    
Scanning for unknown volumes...                                                                                                                                                                           
The following unknown volumes have been found:                                                                                                                                                            
 - Container "vaulttest" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                         
 - Container "vaulttest1" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                        
 - Container "vaulttest2" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                        
 - Container "vaulttest3" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                        
 - Container "vaulttest4" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                        
 - Container "vaulttest5" on pool "ee" in project "default" (includes 7 snapshots)                                                                                                                        
Would you like those to be recovered? (yes/no) [default=no]: yes                                                                                                                                          
Starting recovery...                                                                                                                                                                                      
Error: Failed import request: Failed creating instance "vaulttest" record in project "default": Failed creating instance record: Failed initialising instance: Failed add validation for device "testvol":
 Failed loading custom volume: Storage volume not found

oh well, ok, my bad, let’s restore the storage volume and retry

[root@lxd11 ~]# lxd recover
This LXD server currently has the following storage pools:
 - ee (backend="zfs", source="")
The recovery process will be scanning the following storage pools:
 - EXISTING: "ee" (backend="zfs", source="")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: yes
Scanning for unknown volumes...
Error: Failed validation request: Failed checking volumes on pool "ee": Instance "vaulttest3" in project "default" already has storage DB record

Ups, so it didn’t completely fail, but left ghost db entries. When I moved vaulttest3 away, it repeated with vaulttest4 and 5. At last is it succeeeded for vaulttest, vaulttest1 and 2, but now for vaulttest3,4 and 5 I have phantom entries that won’t go away, e.g.

[root@lxd11 ~]# lxc launch c8 vaulttest5 --target lxd11
Creating vaulttest5
Error: Failed creating instance from image: Error inserting volume "vaulttest5" for project "default" in pool "ee" of type "containers" into database "UNIQUE constraint failed: index 'storage_volumes_unique_storage_pool_id_node_id_project_id_name_type'"
[root@lxd11 ~]# lxc rm -f vaulttest5
Error: Failed checking instance exists "local:vaulttest5": Instance not found

how do I get rid of this? I see I have the following in my global db, is it enough if i just delete those or will I make it worse?

[root@lxd11 ~]# lxd sql global .dump | fgrep vaulttest5
INSERT INTO storage_volumes VALUES(911409,'vaulttest5',1,10,0,'',1,0,'2023-01-02 14:23:39.047069147+00:00');
[root@lxd11 ~]# lxd sql global .dump | fgrep 911409
INSERT INTO storage_volumes VALUES(911409,'vaulttest5',1,10,0,'',1,0,'2023-01-02 14:23:39.047069147+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911410,911409,'auto-20230430-230157','','0001-01-01 00:00:00+00:00','2023-04-30 23:01:57.938295922+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911411,911409,'auto-20230519-230234','','0001-01-01 00:00:00+00:00','2023-05-19 23:02:35.425842938+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911412,911409,'auto-20230520-230548','','0001-01-01 00:00:00+00:00','2023-05-20 23:05:48.687867843+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911413,911409,'auto-20230522-110210','','0001-01-01 00:00:00+00:00','2023-05-22 11:02:10.849646612+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911414,911409,'auto-20230522-120223','','0001-01-01 00:00:00+00:00','2023-05-22 12:02:24.120855158+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911415,911409,'auto-20230522-130223','','0001-01-01 00:00:00+00:00','2023-05-22 13:02:24.259932839+00:00');
INSERT INTO storage_volumes_snapshots VALUES(911416,911409,'auto-20230522-140242','','0001-01-01 00:00:00+00:00','2023-05-22 14:02:43.226609057+00:00');
INSERT INTO storage_volumes_config VALUES(1862,911409,'zfs.use_refquota','true');

tomp · May 23, 2023, 7:17am

Hi,

If you delete vaulttest5 from the storage_volumes table then it should also remove the associated entries from storage_volumes_snapshots and storage_volumes_config.

lxd sql global 'delete from storage_volumes where id = n'

Please can you also log an issue so we can fix the reversion logic on failure:

https://github.com/lxc/lxd/issues

Aleks · May 25, 2023, 3:22pm

i’m happy to report that after deleting the entries in storage volumes, lxd recover went fine

[root@lxd11 ~]# lxd sql global 'delete from storage_volumes where id = 911409'
Rows affected: 1
[root@lxd11 ~]# lxd sql global 'delete from storage_volumes where id = 911401'
Rows affected: 1
[root@lxd11 ~]# lxd sql global 'delete from storage_volumes where id = 911393'
Rows affected: 1
[root@lxd11 ~]# zfs rename lxd11/lxdold/containers/vaulttest3 lxd11/lxd/containers/vaulttest3
[root@lxd11 ~]# zfs rename lxd11/lxdold/containers/vaulttest4 lxd11/lxd/containers/vaulttest4
[root@lxd11 ~]# zfs rename lxd11/lxdold/containers/vaulttest5 lxd11/lxd/containers/vaulttest5
[root@lxd11 ~]# lxd recover 
This LXD server currently has the following storage pools:
 - ee (backend="zfs", source="")
The recovery process will be scanning the following storage pools:
 - EXISTING: "ee" (backend="zfs", source="")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: yes
Scanning for unknown volumes...
The following unknown volumes have been found:
 - Container "vaulttest4" on pool "ee" in project "default" (includes 7 snapshots)
 - Container "vaulttest5" on pool "ee" in project "default" (includes 7 snapshots)
 - Container "vaulttest3" on pool "ee" in project "default" (includes 7 snapshots)
Would you like those to be recovered? (yes/no) [default=no]: yes
Starting recovery...
[root@lxd11 ~]# lxc start vaulttest3 vaulttest4 vaulttest5
[root@lxd11 ~]# lxc list vaulttest
+------------+-------+-----------------------+------+-----------+-----------+----------+
|    NAME    | STATE |         IPV4          | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest  | READY | 192.168.222.72 (eth0) |      | CONTAINER | 9         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest1 | READY | 192.168.222.37 (eth0) |      | CONTAINER | 9         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest2 | READY | 192.168.222.92 (eth0) |      | CONTAINER | 9         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest3 | READY | 192.168.222.36 (eth0) |      | CONTAINER | 7         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest4 | READY | 192.168.222.86 (eth0) |      | CONTAINER | 7         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+
| vaulttest5 | READY | 192.168.222.58 (eth0) |      | CONTAINER | 7         | lxd11    |
+------------+-------+-----------------------+------+-----------+-----------+----------+

I’ve opened https://github.com/lxc/lxd/issues/11728 to get this fixed