[LXD] New disaster recovery tool

tomp · June 9, 2021, 3:36pm


Project	LXD
Status	Implemented
Author(s)	@tomp
Approver(s)	@stgraber
Release	LXD 4.17
Internal ID	LX001

Abstract

Simplify the recovery of a LXD installation when the database has been removed but the storage pool(s) still exist with a new interactive lxd recover tool that will assist with accessing the storage pools and attempting to recreate the database records for those instances and custom volumes present.

The goal of lxd recover is disaster recovery. It is not intended to be a substitute for taking backups of the LXD database and its files. It will not be able recover all contents of a LXD database (such as profiles, networks and images). Things will be missing and the user will need to manually reconfigure them.

What it will need to handle are the bits which cannot be re-created/re-configured, which are: instances, instance snapshots, custom volumes and custom volumes snapshots.

Rationale

Currently the lxd import command provides the ability to recover instances that are stored on storage pools after the LXD database has been removed. However it requires the storage pools to be mounted (inside the LXD snap’s mount namespace) by the user before running the tool. This is non-trivial and requires the user to understand the intricacies of both snap packaging mount namespaces and LXD’s mount path layouts. This is a bad user experience at the best of times, and during a disaster recovery situation it is even more important for the user to be able to recover their LXD installation quickly and easily.

Specification

Design

Instances store a copy of their DB configuration (including a copy of the storage pool and volume config) in a file called backup.yaml in their storage volume. This can be used as the basis of recreating the missing DB records during a disaster recovery scenario.

However some storage pool types do not allows access to these backup.yaml files without first activating the pool itself and mounting the per-instance storage volume.

This creates something of a chicken/egg problem, as in order to access the backup.yaml file to restore the DB records we need the info that was in the DB previously.

In order to overcome this the recovery tool will need to ask the user questions about the storage pool(s) they want to recover in order to ascertain the minimum amount of info to allow the storage pool to be activated and the storage volume(s) mounted.

This design would introduce a new lxd recover command that would provide an interactive CLI (similar to lxd init) that would use both the new /internal/recover API endpoints (for pool and volume discovery and recovery) and the existing public API endpoints (to validate whether there are any existing or conflicting entities).

First it will list the existing pools in the DB (if any). Then it will ask the user to add any additional pools to recover.

For each it will ask the user the following questions:

Name of storage pool to recover.
Driver type of storage pool to recover (dir, zfs etc).
Source of the storage pool.

At this point the tool will check whether the storage pool already exists in the DB and that it matches the specified driver type.

If it does, then it will use the existing DB record config to access the storage pool and will not require asking the user additional questions.
If the storage pool exists in the DB already but the driver types do not match then an error will be displayed and recovery cannot continue.
If the storage pool doesn’t exist in the DB then the user will be asked additional questions:
1. Source of storage pool to recover (this should match what was in the source property of the original storage pool DB record).
2. Depending on the driver type there will also need to be per-driver questions (to be defined). These will be asked in a loop with each question/answer step providing one option value until the user is done. E.g. “Additional storage pool configuration property (KEY=VALUE, empty to skip)”.

Once these questions have been answered and all storage pools have been defined, the tool will make a call to the /recover/validate endpoint with a POST body containing the requested pool info. LXD will proceed to load the relevant storage pool drivers and attempt to mount the requested storage pools (without creating any new DB records).

If this succeeds then the storage pools will be inspected for their volumes and these will be compared to the current DB records and the following validation checks will be run for each instance found on the storage pool that is not already in the DB (in this way lxd recover can be run on an existing partially imported storage pool):

Check for entities referenced in its config file that don’t exist (e.g. networks, profiles, projects) - if any are found then an error will be shown and the user will be expected to fix this manually using the existing lxc commands. The user will be asked if they want to retry the validation once they have fixed the problem (to avoid them having to enter in all the information again).
Check that a conflicting record doesn’t exist on a different storage pool - if it does then an error will be shown, and it may be that the user decides to manually delete the conflicting instance or remove the volume from the storage pool that is being recovered.

Once these checks succeed, then a list of instances and volumes to be recovered from all pools will be displayed. At this point no modifications have been made to the database.

The user will be asked if they wish to proceed with the import.

If the user wishes to proceed then a request is made to the /recover/import endpoint with info about each pool, instance and custom volume DB record that will be recreated.

If the recovery fails the DB records created for that pool will be rolled back.

API changes

As this is an internal command and associated API routes there will be no public API changes.

There will be some additional internal API endpoints:

Validate

POST /internal/recover/validate
LXD iterate through all the existing and new pools, look for unknown volumes and report any missing dependencies that would prevent importing those volumes.

// RecoverValidatePost is used to initiate a recovery validation scan.
type RecoverValidatePost struct {
	Pools []StoragePoolsPost // Pools to scan (minimum population is Name for existing pool and Name, Driver and Config["source"] for unknown pools).
}

// RecoverValidateVolume provides info about a missing volume that the recovery validation scan found.
type RecoverValidateVolume struct {
	Name          string // Name of volume.
	Type          string // Same as Type from StorageVolumesPost (container, custom or virtual-machine).
	SnapshotCount int    // Count of snapshots found for volume.
	Project       string // Project the volume belongs to.
}

// RecoverValidateResult returns the result of the validation scan.
type RecoverValidateResult struct {
	UnknownVolumes   []RecoverValidateVolume // Volumes that could be imported.
	DependencyErrors []string                 // Errors that are preventing import from proceeding.
}

The request would only fail if a pool can’t be found at all, in all other cases it would return a RecoverValidateVolume which contains all that was found and all that’s missing (dependencies).

Import

POST /internal/recover/import
LXD will proceed with the recovery and have all unknown volumes imported into the database.

// RecoverImportPost initiates the import of missing storage pools, instances and volumes.
type RecoverImportPost struct {
	Pools []StoragePoolsPost // Pools to scan (minimum population is Name for existing pool and Name, Driver and Config["source"] for unknown pools).
}

The request would fail if any dependency is still missing (shouldn’t be possible) and then will process the import, failing if something fails to import for some reason. The Pools property will be used to:

Check if the storage pool already exists in the database, and doesn’t need to be created.
If the storage pool doesn’t exist, then it will be mounted from the info provided.
If there are no instances to be imported then the config supplied will be used to create a new storage pool DB record.
Otherwise LXD will prefer to recreate the storage pool DB record from the backup.yaml file stored with one of instances being recovered.

CLI changes

A new interactive lxd recover command will be added with an example user experience below:

This LXD server currently has the following storage pools:
 - "local" (backend="zfs", source="castiana/lxd")
 - "remote" (backend="ceph", source="ceph")
Would you like to recover another storage pool? (yes/no) [default=no]: **yes**
Name of the storage pool: **foo**
Name of the storage backend (btrfs, dir, lvm, zfs, ceph, cephfs): **btrfs**
Source of the storage pool (block device, volume group, dataset, path, ... as applicable): **/dev/sdb**
Additional storage pool configuration property (KEY=VALUE, empty when done): **btrfs.mount_options=noatime**
Additional storage pool configuration property (KEY=VALUE, empty to skip): ****
Would you like to recover another storage pool? (yes/no) [default=no]: **no**
The recovery process will be scanning the following storage pools:
 - EXISTING: "local" (backend="zfs", source="castiana/lxd")
 - EXISTING: "remote" (backend="ceph", source="ceph")
 - NEW: "foo" (backend="btrfs", source="/dev/sdb")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: **yes**
The following unknown volumes have been found:
 - Instance "bar" on pool "local" in project "blah"
 - Volume "blah" on pool "remote" in project "demo"
 - Instance "a1" on pool "foo" in project "default"
 - Instance "a2" on pool "foo" in project "default" (includes 3 snapshots)
 - Instance "a3" on pool "foo" in project "default"
 - Volume "vol1" on pool "foo" in project "blah" (includes 2 snapshots)
 - Volume "vol2" on pool "foo" in project "blah"
 - Volume "vol3" on pool "foo" in project "blah"
You are currently missing the following:
 - Network "lxdbr1" in project "default"
 - Project "demo"
 - Profile "bar" in project "blah"
Please create those missing entries and then hit ENTER:
You are currently missing the following:
 - Profile "bar" in project "blah"
Please create those missing entries and then hit ENTER:
The following unknown volumes have been found:
 - Instance "bar" on pool "local" in project "blah"
 - Volume "blah" on pool "remote" in project "demo"
 - Instance "a1" on pool "foo" in project "default"
 - Instance "a2" on pool "foo" in project "default" (includes 3 snapshots)
 - Instance "a3" on pool "foo" in project "default"
 - Volume "vol1" on pool "foo" in project "blah" (includes 2 snapshots)
 - Volume "vol2" on pool "foo" in project "blah"
 - Volume "vol3" on pool "foo" in project "blah"
Would you like those to be recovered? (yes/no) [default=no]: **yes**
All unknown volumes have now been recovered!

Database changes

No database changes are required.

Upgrade handling

The plan is to remove the lxd import command and it will return an error instructing the user to use lxd recover. The documentation will also be updated to reference lxd recover.

Further information and considerations

Unlike instance volumes, custom volumes do not have their DB configuration written to a backup.yaml file. This means we have to be able to derive all information required to recreate their DB records using just the supplied pool configuration and the name of the custom volume on the storage pool.

An issue exists due to the way we encode the project and LXD custom volume name into the underlying storage pool volume name using the underscore as a delimiter. The issue is that, unlike instance names (which must be valid hostnames), both projects and custom volume names are currently allowed to contain underscores.

This means it is impossible to ascertain where the project name ends and the custom volume name starts.

An example of the problem reversing the custom storage volume names back into database records can be exemplified with the ZFS driver. Currently creating a project called test_test and then creating a custom storage volume on a ZFS pool inside that project results in a ZFS volume called:

zfs/custom/test_test_test_test

However without having the database record available, it is impossible to ascertain where the project name ends and the custom volume name starts. The project could equally be called test with a volume called test_test_test.

To workaround this whilst trying to support as many existing custom volumes are possible we will take the following steps:

If the custom storage volume name only has 1 underscore in it we can know that the part before the underscore is the project name and the part after is the LXD volume name (because custom volumes always have their project prefixed).
If there are >1 underscores in the custom storage volume name we cannot know if we are splitting it correctly, so a warning will be displayed and the recovery of the volume will be skipped.

In order to prevent the 2nd scenario in the future we will prevent the use of underscores in new project names.

Jimbo · June 9, 2021, 4:31pm

Would it not be simpler to keep a small catalog of instances (e.g. recovery.xml) on each storage pool, then when adding a storage pool to LXD, you can specify an option to import/discover existing.

$ lxd init
Create a new ZFS pool? (yes/no) [default=yes]: no
Name of the existing ZFS pool or dataset: lxdpool
Existing instances have been found, do you want to import ? (yes/no) [default=yes]: no

So rather than a recovery tool, it becomes a discovery tool when adding a storage pool . The only time recovery.xml is modified is when an instance is created or deleted (or if data is modified that is required by recovery, e.g. ports or static ip addresses).

tomp · June 9, 2021, 4:50pm

We have discussed potentially storing custom volume meta data on a per-storage pool config volume (remember that not all storage pool types actually can ‘store’ files without creating a volume).

In terms of integrating this into lxd init rather than lxd recover, @stgraber what is the current expectation around ordering of with the lxd import command? Is it expected that lxd init be run first (to allow recreation of non-storage settings like networking) and then running lxc import, or is it the other way round?

There also needs to be a consideration around clustered filesystems (like ceph and cephfs) where we only need to recreate the DB records once (although again each volume can have per-cluster-member config so not strictly true), but we do need to re-create the directories on each cluster member (where the volumes will be mounted).

Jimbo · June 9, 2021, 4:58pm

If you can make a recovery volume, which is hidden, that should solve the problem about storing files across multiple storage pool types.

Surely any recovery would mean at some point to attach the existing storage pool to LXD, that is why i suggested adding to lxd init and anywhere else that a storage pool can be added (i use the API, but i guess there is a command somewhere).

This way the discovery/recovery happens when adding an existing storage pool. This should cover reinstalling lxd or the host operating system, as well recovering the storage pool from backups or something i guess.

tomp · June 9, 2021, 5:09pm

Yeah I see what you’re saying, that we could detect the existence of a recovery config volume when doing lxd storage create (or equivalent in lxd init) and restoring the instances and custom volumes then.

@stgraber any thoughts? I guess it still boils down to the expected workflow around lxc recover vs lxd init.

Jimbo · June 9, 2021, 5:11pm

I am just thinking if the end goal is to recreate the database from an existing pool. so that recovery can be carried out. The recovery tool does not need to mount any storage pools. it should only run on storage pools that users have already added. This removes complexity of everything, and a dry-run option can be run by lxd-init (and lxc command to add a storage pool) to detect if there are any instances on the pool

Jimbo · June 9, 2021, 5:33pm

If you skip the whole mounting thing, and only work on already mounted pools, the process could be as simple as this.

$ lxc recover default
- No database entry for `apache`, would you like to add? [y/n]

Listing the items that are not in the database.

tomp · June 9, 2021, 5:42pm

We will defo need to Mount the storage pools, that’s the whole point of the project, to avoid needing the user to do it for us like today with lxd import

Jimbo · June 9, 2021, 5:56pm

Maybe you are misunderstanding me or i am misunderstanding you. You original idea talks about asking for the user for storage pool details, types etc, in order to mount it temporarily to recover the database.

I am suggesting, that tool runs on exists pools that are available from lxc storage list, i am assuming if i deleted my os, then when i reinstall the os, and add the zfs partition, i run the lxc storage recover default which then scans the storage and adds missing database entries. And the same if the user is recovering another storage, they add with lxc storage create if its not default, and the recover tool scans that. My understanding is you want the tool to mount it to find the info, i am suggesting that the tool runs on the added storage pools. Not sure i am explainging properly?

tomp · June 9, 2021, 6:04pm

Yes perhaps it could be broken into two tools. The first being a tool to recreate the storage pools in the database from config files in an existing storage pool (via lxc storage create) and then lxd recover just scans the storage pools in the db afterwards mounting each instance volume it finds to restore the instance db records.

stgraber · June 10, 2021, 8:27pm

Alright, going to try to answer some of the points above

So, the goal absolutely isn’t to recreate all database records. We don’t want to end up duplicating all our DB records to text files. Users should be backing up the database in the first place. This is disaster recovery and will (and to an extent should) come with consequences. Just like the “lxd cluster” commands to recover a cluster, this is meant to be a last resort.

I don’t like the idea of relying on a recovery volume on the storage pool because it’s getting dangerously close to duplicating the database and also isn’t actually going to help with our most common recovery case. Currently the most common recovery case we’ve seen is someone manually restoring a pool (through zfs/btrfs send/receive or the like), either on a completely new system or on another system. So we can’t fully rely on all the volumes having actually belonged to the pool at the same time or the pool containing all the volumes it used to.

I also wouldn’t want to integrate disaster recovery features into “lxd init”. “lxd init” uses our stable public REST API and isn’t supposed to have low level recovery access to data. We also can’t actually make things as interactive as suggested above as all “lxd init” actually does is build up a preseed configuration file and then applying it in one go at the end. “lxd init” has no idea if a storage pool name is valid, if it already exists, … LXD only tells it that when it tries to create it.

I also don’t think we should allow re-using dirty storage pools at creation time (“lxd init” or “lxc storage create”) as if everything isn’t fully in line with LXD’s expectations (which can vary based on versions), we may find ourselves causing data loss down the line or just odd behaviors. This is definitely a risk when performing data recovery through lxd recover/import but that’s something we can clearly state in the disaster recovery docs. Suddenly making it easy for folks to cause similar potential damage through the production API seems problematic to me.

Now let me try to clarify what it is I think we want to see covered here.

The goal of “lxd recover” is disaster recovery. Things will be missing and the user will need to manually reconfigure things. What we need to handle are the bits which cannot be re-created/re-configured. That’s instances, instance snapshots, custom volumes and custom volumes snapshots. We won’t try to deal with images as there is no way to re-generate a tarball with the correct fingerpring solely from the data in a storage pool.
Only “lxd recover” should be allowed to create “dirty” storage pools, that is, storage pools which aren’t empty at the time LXD loads them.
In a complete recovery situation where the entire DB is gone, the user must first run “lxd recover”. After recovering their storage volumes, they can then either use “lxd init”, skipping the storage part or directly use “lxc config”, “lxc profile” and “lxc network” to setup the rest of the system to their liking.
In a partial recovery situation where the storage pools are defined but some data is missing, they can just run “lxd recover” again, not define any new pool through it and will be presented with anything that’s found on disk but is missing in the DB which they can then import.

There are a few places where chicken-and-egg type situations can happen during “lxd recover”, so I think we’ll look at something like:

List current storage pools
Allow entering additional storage pools
Access those storage pools temporarily and pull a list of everything that can be recovered
Validate everything in that list and build up a list of missing dependencies
If anything that we depend on is missing, tell the user and fail the recovery
If all dependencies are satisfied, proceed with the recovery

Dependencies here are things like projects, profiles, networks, … which we may need in order to re-create the database records successfuly but which aren’t themselves defined in the data we’re recovering. The user will need to manually re-create those bits and then run the recovery again.

If there’s something present which they don’t want to recover, they’ll have to manually go delete the offending dataset/directory/volume/… as I don’t think we want our recovery API to have the ability to perform such destructive actions on unknown volumes.

Jimbo · June 11, 2021, 7:25am

So if the user gets a list of what can be recovered from a temporarily mounted pool then what? Are you talking about copying data over to an existing mounted storage pool (i think that would be a bad idea because in this situation it would likely be all instances not just one and then you run into disk space problems.) or would the user need to add the storage pool to the storage pool list so that they could recover, if that is the case, then it makes no sense? Surely any recovery would be to use an existing pool as opposed to transfer data from it.

stgraber · June 11, 2021, 1:27pm

We’re just getting a list of things in the pools that we can recover, then we check for missing dependencies and finally we show the list to the user for confirmation.

Assuming there’s no missing dependency and the user tells us to go ahead with what we found, then things get imported into the database.

tomp · June 11, 2021, 1:28pm

Yep makes sense, thanks for the detail. I’ve been updating the design above today based on your notes and thinking around the process and API endpoints needed.

tomp · June 11, 2021, 1:35pm

RE the issue around reversing the custom volume names to ascertain the project and LXD volume name (because both can have underscores in them, and the underscore is used as the delimiter between project and LXD volume name):

If the custom storage volume name only has 1 underscore in it we can know that the part before the underscore is the project name and the part after is the LXD volume name (because custom volumes always have their project prefixed).
The problem presents itself it there are >1 underscores in the custom storage volume name. In which case we cannot know we are splitting it correctly.

In order to prevent scenario 2 for new volumes we could prevent the user of underscores in either the project name or the custom volume name. The latter option would align with instances (which already don’t allow underscores), meaning that we could continue to allow underscores in the project names.

For existing custom volumes that have >1 underscores in them we could skip over them when recovering them (with a warning).

stgraber · June 11, 2021, 1:48pm

I think restricting the project name will be less problematic as a lot less people are using projects that custom volumes. And from personal experience with those, I feel a lot more likely to have used an underscore in a volume name than in a project name.

As we’ve been using the underscore as delimiter for a lot of internal objects, I also don’t think putting the restricting on storage volumes would be quite enough. If we were going that direction, we’d also need to add it to networks and potentially any further object we decide to tie to projects.

Adding a naming restriction to projects seems like the easiest and less disruptive approach here.

tomp · June 11, 2021, 1:54pm

Sounds good! Will add those 2 points to the draft.

stgraber · June 11, 2021, 4:42pm

Rought draft of the UX:

This LXD server currently has the following storage pools:
 - "local" (backend="zfs", source="castiana/lxd")
 - "remote" (backend="ceph", source="ceph")
Would you like to recover another storage pool? (yes/no) [default=no]: **yes**
Name of the storage pool: **foo**
Name of the storage backend (btrfs, dir, lvm, zfs, ceph, cephfs): **btrfs**
Source of the storage pool (block device, volume group, dataset, path, ... as applicable): **/dev/sdb**
Additional storage pool configuration property (KEY=VALUE, empty to skip): **btrfs.mount_options=noatime**
Additional storage pool configuration property (KEY=VALUE, empty to skip): ****
Would you like to recover another storage pool? (yes/no) [default=no]: **no**
The recovery process will be scanning the following storage pools:
 - EXISTING: "local" (backend="zfs", source="castiana/lxd")
 - EXISTING: "remote" (backend="ceph", source="ceph")
 - NEW: "foo" (backend="btrfs", source="/dev/sdb")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: **yes**
The following unknown volumes have been found:
 - Instance "bar" on pool "local" in project "blah"
 - Volume "blah" on pool "remote" in project "demo"
 - Instance "a1" on pool "foo" in project "default"
 - Instance "a2" on pool "foo" in project "default" (includes 3 snapshots)
 - Instance "a3" on pool "foo" in project "default"
 - Volume "vol1" on pool "foo" in project "blah" (includes 2 snapshots)
 - Volume "vol2" on pool "foo" in project "blah"
 - Volume "vol3" on pool "foo" in project "blah"
You are currently missing the following:
 - Network "lxdbr1" in project "default"
 - Project "demo"
 - Profile "bar" in project "blah"
Please create those missing entries and then hit ENTER:
You are currently missing the following:
 - Profile "bar" in project "blah"
Please create those missing entries and then hit ENTER:
The following unknown volumes have been found:
 - Instance "bar" on pool "local" in project "blah"
 - Volume "blah" on pool "remote" in project "demo"
 - Instance "a1" on pool "foo" in project "default"
 - Instance "a2" on pool "foo" in project "default" (includes 3 snapshots)
 - Instance "a3" on pool "foo" in project "default"
 - Volume "vol1" on pool "foo" in project "blah" (includes 2 snapshots)
 - Volume "vol2" on pool "foo" in project "blah"
 - Volume "vol3" on pool "foo" in project "blah"
Would you like those to be recovered? (yes/no) [default=no]: **yes**
All unknown volumes have now been recovered!

This is not the most complex case as someone could define more than one additional storage pool, but it is otherwise pretty much a worst case scenario where we have found a whole bunch of things across both existing and new pools, are missing a bunch of dependencies and have a user who’s not solved all the dependencies on the first pass.

I’d expect the bulk of users to have a shorter user experience, either because they don’t need to define a pool or because they don’t have quite as many missing things to resolve.

stgraber · June 11, 2021, 4:51pm

As far as API, the tool would pretty much just do:

GET /1.0/storage-pools to get the existing pools at the beginning
POST /internal/recover/validate to have LXD iterate through all the existing and new pools, look for unknown volumes and report any missing dependencies
POST /internal/recovery/import to proceed with the recovery and have all unknown volumes imported into the database

I suspect both /validate and /import will take a similar struct which would primarily contain a list of storage pool definitions (for all the new ones). Validate would only fail if a pool can’t be found at all, in all other cases it would return a struct which contains all that was found and all that’s missing (dependencies). Import would fail if any dependency is still missing (shouldn’t be possible) and then will process the import, failing if something fails to import for some reason.

tomp · June 14, 2021, 1:31pm

Thanks I’ve updated the spec with those now and will think about the specifics of the POST struct for the doc.