[LXD] New disaster recovery tool

PR to prevent underscores in new project names:

@stgraber the existing lxd import tool has a client-side restriction to only allow running it as the root user. Should this also be maintained for the new lxd recover tool?

No, this can be dropped. I’m not even completely sure why we had it for lxd import in the first place (unless it was actually performing direct disk access at some point?).

@stgraber cool thanks. I think IIRC that internalImport() was writing an “importing” file to the instance’s root volume to prevent deletion of it on failure. That might be the reason.

Only other thing I can think of (that may still be relevant), is the probing of supported storage pool drivers using storageDrivers.SupportedDrivers() which I am using to populate the possible storage driver option question, and lxd import used it too. See

Ah yeah, for the supported drivers thing, I wonder if we shouldn’t just expose that as a comma separated list in /1.0 so both lxd init and lxd recover can just fetch it over the API and avoid direct probing. What do you think?

I’m happy to expose it over an API endpoint, the function itself comes with a warning “This can take a long time if a driver is not supported.” so perhaps /1.0 isn’t the most appropriate place for it? Although the slow parts should only occur on first call of that function (as the storage driver should then cache the features/version supported).

Yeah, I definitely don’t want to hit a slow path with each /1.0 call, but having it done on daemon startup with the result kept in memory would be fine and in line with other things we expose through /1.0.

1 Like

I think we have a var that contains that already that we can just expose indeed.

That would be good too as it would remove the (for me, annoying) pause during lxd init as it probes for available storage engines. This way it will have been done already when LXD starts up.

Yeah, that pause is a bit annoying :wink:

The PR that implements returning available storage driver info in the API is here:

Well now I’m “salty” Feature Request: API Returning Supported Storage Drivers · Issue #5955 · lxc/lxd · GitHub :wink:

Handy addition though!

1 Like

Ah yeah, it’s a bit different than your original ask as it’s not listing you what the API support so much as it’s listing you what the system supports, but indeed for you that’s probably equivalent in this case :slight_smile:

For us, it will allow lxd recover and lxd init to build up a list of local and remote storage drivers without having to use the internal logic they currently had.

Something that occurred to me today. There can be a scenario where storage pool (such as LVM) can have volumes on it that use a different filesystem than the default filesystem (from the pool or the LXD default if the pool DB record is missing as well).

In these cases we are in a chicken/egg situation, as we cannot mount the instance volume without knowing what filesystem it uses, in order to read the backup.yaml file (which would contain the volume config containing the filesystem to use).

And for custom volumes its worse because there is no config file at all.

In cases like this, I can see the recovery process being blocked for all volumes because the mount process for the first such volume will fail the entire recovery process.

Perhaps we should skip over volumes that cannot be mounted and present a list of volumes that are considered unknown but cannot be recovered, and then allow the ones that can be recovered to proceed?

Or we could have a function which guesses the filesystem :slight_smile:
We only really support 3 filesystems (ext4, xfs or btrfs), so even if we had to iterate until one of them mounts properly, it wouldn’t be too bad. But we should also be able to just use auto and let the kernel superblock parser figure it out then just record whatever ended up being used.

1 Like

Fair enough, I’ll look at integrating some logic into the relevant storage drivers mount logic.

This is my approach to provide the option to probe filesystem type when mounting a volume without the original DB records available:

1 Like

PRs for this feature is here: