Encrypted ZFS dataset: incus copy --refresh fails

I have two incus hosts where I use encrypted ZFS datasets as storage. I can copy instances from one host to the other, but if I use the flag --refresh afterwards, it fails with an error message:

Error: Failed instance creation: Error transferring instance data:
Failed migration on target: Failed creating instance on target:
Failed receiving volume "testvm": Problem with zfs receive:
([exit status 1 write |1: broken pipe]) cannot receive new
filesystem stream: zfs receive -F cannot be used to destroy
an encrypted filesystem or overwrite an unencrypted one
with an encrypted one

Steps to reproduce:
I tested this with a new container (shown below) and also with a new vm, but in both cases, the error occurs:

server-1:~$ incus launch images:ubuntu/22.04 testcontainer
server-2:~$ incus copy server-1:testcontainer testcontainer --stateless
(works)
server-1:~$ incus exec testcontainer -- touch /root/testfile-01
server-2:~$ incus copy server-1:testcontainer testcontainer --stateless --refresh

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Failed receiving volume "testcontainer": Problem with zfs receive: ([exit status 1 write |1: broken pipe]) cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one

server-1 runs Ubuntu 22.04.3 LTS, server-2 Debian 12; both use Incus version 0.5.1 (on server-1 migrated from LXD, on server-2 newly installed).

I don’t know ZFS very well, so I’m not sure if this is relevant, but during my search for the error message I found in the ZFS github:

So, is it expected that incus copy ... --refresh does not work on encrypted ZFS datasets? Does anybody else have the same problems?

Yeah, this is something I’ve tried to resolve in the past without too much success.

When a dataset is encrypted, you can transfer it over but it will be transferred encrypted without the target server knowing how to decrypt it, at least not without a manual zfs load-key being run for the dataset on target.

The reason is that during refresh we need to:

  • Revert the target to the most recent snapshot
  • Transfer any new snapshots
  • Transfer a temporary migration snapshot for the current state of the dataset
  • Get rid of the temporary snapshot

The revert isn’t possible as it needs access to encrypted data. The rest would be fine, so if all we were doing is transfer or remove snapshots, that’d be fine, but the fact that refresh also needs to sync the state of the dataset itself is what’s causing issues.

It’s possible that there’s something we can do in the event where the key for the dataset is already loaded on the target but it’s definitely pretty tricky and pretty new logic…

Thanks for the explanation! Because it seems that I can’t choose to use rsync instead of zfs send/receive with incus copy ... --refresh, I have to think about alternatives; maybe zfs on luks…

Yeah, we should still be able to detect this situation and have the target server decline zfs as a migration protocol instead forcing a fallback to rsync. That won’t be particularly fast but should be possible at least.

I just hit this copying between two pools on the same host. Both are encrypted and both happen to use the same keyfile in the same location. Odd!

I know this is closed but may I suggest an alternative approach that still retains encrypted status of the container? The good news is, once it’s done, it’s 100% transparent. The bad news is, you do need an encrypted OS install (i.e re-install your host and select lvm luks-encrypt during install procedure). Hear me out before you dis-me:

  1. Install your OS using lvm-luks encrypted server
  2. Optionally use tang-clevis (I do) to manage your server key decryption service
    => Which adds a great deal of user-configurable convenience to de-crypting OS at boot
    => mine basically auto-decrypt, but I can stop that on a dime when I want or need to.
  3. Create your zpool on a luks encrypted disk on your new luks-backed host
    => The key for which is in your OS /root/ folder, which is readable after manual or tang-service unlocking, so you basically have not just the OS disk luks-encrypted, ALL of them are
  4. Install incus and create containers on the luks-backed pool (100% user-transparent)
  5. copy -refresh now works (also 100% transparent).
  6. All files in the OS luks drive and luks pools are encrypted at rest (100% transparent). But during operation, all files are decrypted. No need to use complicated zfs encryption as it’s no better and less convenient. Incus snapshot create/restore, copy, move etc. - they all work - you don’t even know they are encrypted until after you power-off the server (when it’s all unreadable).
  7. If either the host or remote server go down, your luks disk auto-lock everything on that server
    => files are secure from at-rest prying eyes until you manually (or with tang-clevis) decrypt and boot

This works. It creates a little extra hassle setting up tang-clevis on a fresh OS install, but it’s 100% transparent after that. It’s how I run ALL my systems. Mine is super convenient.

V/R

Andrew