Restoring lost VMs using Incus

xrd · November 22, 2024, 1:14pm

I was happily using lxc to run a bunch of containers and VMs. Then my hard disk failed. Fortunately, I was able to recover the data. I got the new drive plugged in, and I was able to import the zpool. So, I can see this when I run sudo zpool list

NAME            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
default        99.5G  13.7G  85.8G        -         -     2%    13%  1.00x    ONLINE  -
default2        186G  60.7G   125G        -      372G    73%    32%  1.00x    ONLINE  -
incus-default  29.5G   824K  29.5G        -         -     0%     0%  1.00x    ONLINE  -

Default2 is the data from the failed drive. As you can see, I’m trying to migrate from lxc to incus, but am not sure if I am doing things correctly with my zpools.

I can also see all my old containers in when I run sudo zfs list

...
default2/virtual-machines                                                                       53.3G   120G    24K  legacy
default2/virtual-machines/community-public-do-vm                                                7.56M  92.4M  7.56M  legacy
default2/virtual-machines/community-public-do-vm.block                                          14.3G   120G  14.5G  -
default2/virtual-machines/extrastatic-runner-01                                                 7.67M  92.3M  7.67M  legacy
default2/virtual-machines/extrastatic-runner-01.block                                           3.42G   120G  4.04G  -
default2/virtual-machines/github-runner                                                         7.67M  92.3M  7.68M  legacy
default2/virtual-machines/github-runner.block                                                   1.79G   120G  2.66G  -
default2/virtual-machines/jaanee-content                                                        7.67M  92.3M  7.67M  legacy
default2/virtual-machines/jaanee-content.block                                                  5.26G   120G  6.11G  -
default2/virtual-machines/lapis-testing                                                         7.67M  92.3M  7.67M  legacy
default2/virtual-machines/lapis-testing.block                                                    786M   120G  1.63G  -
default2/virtual-machines/reservation-build-system                                              7.67M  92.3M  7.67M  legacy
default2/virtual-machines/reservation-build-system.block                                        4.79G   120G  5.52G  -
default2/virtual-machines/storage                                                               7.67M  92.3M  7.67M  legacy
default2/virtual-machines/storage.block                                                          703M   120G  1.36G  -
...

And, I can mount (something) using the sudo mount -t zfs default2/virtual-machines/community-public-do-vm /media/zfsmounts

And, see files inside the mounted drive:

sudo ls /media/zfsmounts/
agent-client.crt  agent-client.key  agent.crt  agent.key  backup.yaml  config  metadata.yaml  OVMF_VARS.4MB.ms.fd  qemu.nvram  templates

But, I don’t know the steps to restore that VM from the backup drive.

I tried to start it, using this command:

$ sudo incus start   community-public-do-vm
Error: Failed to fetch instance "community-public-do-vm" in project "default": Instance not found
$ sudo incus start --project default2  community-public-do-vm
Error: Project not found

I assumed project was not the correct switch, but I don’t know how otherwise to specify that incus should “look” for that VM inside the default2 zpool.

Also, do I need to worry that the backup.yaml file for the VM has “default” specified rather than “default2” since I imported the zpool with a different name?

...
  devices:
    root:
      path: /
      pool: default
...

Thank you in advance for your help.

stgraber · November 22, 2024, 1:25pm

incus admin recover is the tool used to do this kind of stuff, it lets you define a lost storage pool (if any), then scan your existing and newly defined pools for instances missing in the database.

xrd · November 22, 2024, 1:31pm

That looks promising. But, it errors out with a Failed checking volumes on pool "default2": Instance "github-runner" in project "default" has pool name mismatch in its backup file ("default" doesn't match's pool's "default2" message.

I believe this issue is because the pool name in those backup files is default, which is the name lxc created for its own pool, not the pool name I used when importing. I am not using the default pool at all; should I delete and then rename the pool? Or, is there another way to import?

$ incus admin recover
This server currently has the following storage pools:
 - incus-default (backend="zfs", source="/var/lib/incus/disks/incus-default.img")
Would you like to recover another storage pool? (yes/no) [default=no]: yes
Name of the storage pool: default2
Name of the storage backend (lvm, lvmcluster, zfs, dir): zfs
Source of the storage pool (block device, volume group, dataset, path, ... as applicable): default2
Additional storage pool configuration property (KEY=VALUE, empty when done): 
Would you like to recover another storage pool? (yes/no) [default=no]: 
The recovery process will be scanning the following storage pools:
 - EXISTING: "incus-default" (backend="zfs", source="/var/lib/incus/disks/incus-default.img")
 - NEW: "default2" (backend="zfs", source="default2")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: yes
Scanning for unknown volumes...
Error: Failed validation request: Failed checking volumes on pool "default2": Instance "github-runner" in project "default" has pool name mismatch in its backup file ("default" doesn't match's pool's "default2")

xrd · November 22, 2024, 2:45pm

Seems like this is the solution.

xrd · November 22, 2024, 2:55pm

But, I’m editing these backup.yaml files, and there are projects named default, and storage pools named default and it is tricky to make sure I’m editing the correct one. This is tedious process to mount each system, change file permissions, edit the file manually, and then attempt to recover (and type in all that information about the pool to use, etc). I can’t really tell if I’m making progress.

xrd · November 22, 2024, 2:57pm

For example, inside the backup.yaml file I see this:

pool:
  config:
    size: 200GB
    source: /var/snap/lxd/common/lxd/disks/default.img
    volume.size: 400GB
    zfs.pool_name: default2
  description: ""
  name: default
  driver: zfs
  used_by: []
  status: Created
  locations:
  - none

Do I need to change only the zfs.pool_name to default2? Or, should the name be changed as well? They are all under the pool key, but it is unclear to me.

xrd · November 22, 2024, 4:39pm

I edited all the files and it got past that. And, I’m able to use my containers again. Thanks!