How to properly backup/restore on incus-controlled zfs-pools

mjperez · January 21, 2025, 9:00pm

I’m trying to do a full restore of incus from backups taken from /var/lib/incus and exported containers, but I’m not sure if I’m doing this properly. The following is how I initialized the incus server, the way I tried restoring, and the observations from my attemps.

Setup:

disk0 - main host storage, debian, just for running incus server

disk1 - 1tb ssd, passed by-id to incus to create incus zfs pool, (incus pool default)

disk2,3 - 2x 8TB HDD, passed by-id to incus to create mirrored zfs pool, (incus pool hachi)

default_images set to disk1(default)

default_backups set to disk2,3(hachi)

hachi used for data volumes (nas, www/data, etc) for instances

Backups taken from tar’d /var/lib/incus and exported containers (stored in BackBlaze)

To restore, I

stopped/disabled incus service/socket
wiped /var/lib/incus
rebooted in case
unpacked tar to /var/lib/incus (with sudo to keep permissions/owner as root)
started incus service/socket

After restoring, I noticed that backing up /var/lib/incus again afterward appeared to include all my data, including what’s on disk2,3. I verified with incus storage info hachi still had the same device by-id’s.

My understanding from the incus docs was that zfs pools would need to be backed up and restored separately.

Some questions:

Did I do this correctly or did I miss a step? If so, what is the proper restore procedure for this setup?
Does tar’ing /var/lib/incus include all zfs pools (and volumes) managed by incus?
Do I need to destroy and recreate the zfs pools after wiping /var/lib/incus?
How can ensure my tar of /var/lib/incus excludes specific volumes?

Thanks in advance for any help

Andrew_Wilson · January 22, 2025, 8:15pm

If you have your instances in a zfs dataset, then it should be possible to fully recover them.

I have done this following approach several times so I know it CAN work. I don’t know the official way to do it, so hopefully an expert can advise BUT, here’s what I do. (Note you must have an intact zpool of your incus instances - if not, please don’t even consider this):-

DO NOT MAKE CHANGES TO YOUR ZPOOL DIRECTLY. Assuming it does/did work, then the following for me has been solid:

Export your zpool #Probably not essential, but I like to be sure it’s not going to get affected
apt remove --purge incus - just get rid of it.
apt autoremove #This is to clear it all out - I found this is ESSENTIAL
rm -r /val/lib{incus-directories} - anything to do with incus, just delete it all.

Consider a reboot, especially if the setup was somehow corrupted, just to make sure you start with a fresh OS that has nothing left of the old incus still stuck in a dead process or whatever (probably overkill).

apt install incus
zpool import (pool)
incus admin-init #We do a PARTIAL INIT here:
do NOT init storage (select ‘n’ for storage)
But do setup incusbr0 to your liking until the init finishes.

Do not launch an instance - you have no storage yet, so it won’t work. your default profile will look something like this:-

incus profile show default
config: {}
description: Default Incus profile
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
name: default
used_by:
project: default

If you have more than a default profile in your old setup, you can either manually re-create them (if you remember them), or what I do is just ‘incus profile copy default {new-profile}’ just so they exist - they don’t have to be correct yet. You have to recreate these for the recovery to work. The good news is it will error if there’s a profile missing - just create one (incus profile copy default ) and then you can retry the recover command, below f this fails first time:

incus admin recover

Point it to the zpool/dataset and instruct it to recover (follow prompts). Make sure you call the storage pool the same as before (probably ‘default’?) - that’s also important.

…And for me, it imports everything. But your’re not done yet.

The profiles are all broken, so you need to add the default storage back: add something like this to it:

  root:
    path: /
    pool: default
    type: disk

So your default will look something like:

incus profile show default
config: {}
description: Default Incus profile
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by:
- /1.0/instances/{instance names}
project: default

Also, if you did have other profiles, they too now need to be fixed else the instances will not work properly, i.e. storage (definitely) at least, but however you did them before. But otherwise, it should all work - list, start, stop instances.

The is NO WARRANTY on this method. It’s just what I now do after e.g. a full OS upgrade - I just export my zpool before I flash a new OS, then I run incus admin recover after I reinstall and do a basic init, then a recovery, then profile fix. It’s much faster than me copying giant instances from my backup servers. It’s never failed me so far - touch wood. Everything else I have tried always seems to error out and leaves me in no mans land. So I always do this now. I am not saying you should! LOL

Let me know, however, if you do try this. So far, it’s been solid for me. But it’ll be interesting to hear what the pro’s say about this admittedly blunt-force approach.

Good Luck!

Andrew

Andrew_Wilson · January 22, 2025, 8:17pm

PS - you have to keep pointing incus admin recover to each of your zpools in turn. It recovers them one a a time. I failed to mention that. I recover typically 1-3 pools per system. I have never lost an instance SO FAR.

mjperez · January 28, 2025, 12:49am

Hi Andrew,

thanks for the thorough reply!

I will keep your solution in mind, however I don’t think that’s really what I was trying to do.

In your solution it sounds like you’re fully leveraging zfs for the actual data recovery and incus recover to sort of re-sync with the underlying data.

I would like to do a full recovery from incus backups onto a broken system where I can’t guarantee the zfs pools have been untouched. I want to ‘copy giant instances from my backup servers’. As this is the current state of my system.

hereisjames · January 30, 2025, 2:19pm

Not sure if this helps you either, but my backup leverages the very good suggestions here : https://www.cyberciti.biz/faq/how-to-backup-and-restore-lxd-containers/
… with appropriate changes when I migrated from LXD to Incus.

mjperez · January 30, 2025, 3:37pm

Hi James,

Yes! this is closer to the kind of restore I’m trying to do, however it doesn’t cover having separate storage pools on zfs as well and whether backing up the /var/lib/incus (in the article, /var/snap/lxd/common/lxd) is effectively includes backing storage pool data across different zfs pools (and how to restore them). The article doesn’t mention the zfs pool again deleting it in the first half.

PS - I’ve pretty much started from scratch despite having the backups. Will most likely try to depend on Andrew’s solution moving forward however I can imagine other people may have had similar situations.

Would be great to have better documentation on this or a trail for someone to follow.