Encrypted ZFS dataset: incus copy --refresh fails

fmhgk · February 3, 2024, 2:39pm

I have two incus hosts where I use encrypted ZFS datasets as storage. I can copy instances from one host to the other, but if I use the flag --refresh afterwards, it fails with an error message:

Error: Failed instance creation: Error transferring instance data:
Failed migration on target: Failed creating instance on target:
Failed receiving volume "testvm": Problem with zfs receive:
([exit status 1 write |1: broken pipe]) cannot receive new
filesystem stream: zfs receive -F cannot be used to destroy
an encrypted filesystem or overwrite an unencrypted one
with an encrypted one

Steps to reproduce:
I tested this with a new container (shown below) and also with a new vm, but in both cases, the error occurs:

server-1:~$ incus launch images:ubuntu/22.04 testcontainer
server-2:~$ incus copy server-1:testcontainer testcontainer --stateless
(works)
server-1:~$ incus exec testcontainer -- touch /root/testfile-01
server-2:~$ incus copy server-1:testcontainer testcontainer --stateless --refresh

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Failed receiving volume "testcontainer": Problem with zfs receive: ([exit status 1 write |1: broken pipe]) cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one

server-1 runs Ubuntu 22.04.3 LTS, server-2 Debian 12; both use Incus version 0.5.1 (on server-1 migrated from LXD, on server-2 newly installed).

I don’t know ZFS very well, so I’m not sure if this is relevant, but during my search for the error message I found in the ZFS github:

github.com/openzfs/zfs

zfs receive -F cannot be used to destroy an encrypted filesystem

opened 01:35PM - 27 Oct 17 UTC

sjau

Type: Documentation

## System information Type | Version/Name -…-- | --- Distribution Name | Nixos Distribution Version | Unstable Small Linux Kernel | 4.9.58 Architecture | x64 ZFS Version | 0.7.0-1 SPL Version | 0.7.0-1 ### Describe the problem you're observing Using an encrypted dataset with several child sets on my notebook and homeserver. I wanted to setup automatic snapshot backup services to my homeset using znapzend. However it complains about: `cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem` ### Describe how to reproduce the problem I setup the rules for the first dataset for testing: ``` #!/usr/bin/env bash znapzendzetup create \ --recursive \ --tsformat='%Y-%m-%d_%H-%M-%S' \ SRC '1h=>15min,1d=>1h,7d=>1d' tank/encZFS/VMs \ DST:notebook '1h=>15min,1d=>1h,30d=>1d' root@10.200.0.3:serviTank/encZFS/BU/subi/VMs ``` and then I run `znapzend --runonce=tank/encZFS/VMs -d --autoCreation` ### Include any warning/errors/backtraces from the system logs That's the log output I got: ``` root@subi:~/.nixos# znapzend --runonce=tank/encZFS/VMs -d --autoCreation [Fri Oct 27 15:33:56 2017] [info] znapzend (PID=16948) starting up ... [Fri Oct 27 15:33:56 2017] [info] refreshing backup plans... [Fri Oct 27 15:33:57 2017] [info] found a valid backup plan for tank/encZFS/VMs... [Fri Oct 27 15:33:57 2017] [info] znapzend (PID=16948) initialized -- resuming normal operations. [Fri Oct 27 15:33:57 2017] [debug] snapshot worker for tank/encZFS/VMs spawned (17097) [Fri Oct 27 15:33:57 2017] [info] creating recursive snapshot on tank/encZFS/VMs # zfs snapshot -r tank/encZFS/VMs@2017-10-27_15-33-57 [Fri Oct 27 15:34:03 2017] [debug] snapshot worker for tank/encZFS/VMs done (17097) [Fri Oct 27 15:34:03 2017] [debug] send/receive worker for tank/encZFS/VMs spawned (18065) [Fri Oct 27 15:34:03 2017] [info] starting work on backupSet tank/encZFS/VMs # zfs list -H -r -o name -t filesystem,volume tank/encZFS/VMs [Fri Oct 27 15:34:03 2017] [debug] sending snapshots from tank/encZFS/VMs to root@10.200.0.3:serviTank/encZFS/BU/subi/VMs # zfs list -H -o name -t snapshot -s creation -d 1 tank/encZFS/VMs # ssh -o batchMode=yes -o ConnectTimeout=30 root@10.200.0.3 zfs list -H -o name -t snapshot -s creation -d 1 serviTank/encZFS/BU/subi/VMs # zfs send tank/encZFS/VMs@2017-10-27_15-33-57|ssh -o batchMode=yes -o ConnectTimeout=30 'root@10.200.0.3' 'zfs recv -F serviTank/encZFS/BU/subi/VMs' cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem warning: cannot send 'tank/encZFS/VMs@2017-10-27_15-33-57': signal received [Fri Oct 27 15:34:03 2017] [warn] ERROR: cannot send snapshots to serviTank/encZFS/BU/subi/VMs on root@10.200.0.3 # ssh -o batchMode=yes -o ConnectTimeout=30 root@10.200.0.3 zfs list -H -o name -t snapshot -s creation -d 1 serviTank/encZFS/BU/subi/VMs [Fri Oct 27 15:34:03 2017] [debug] cleaning up snapshots on root@10.200.0.3:serviTank/encZFS/BU/subi/VMs [Fri Oct 27 15:34:03 2017] [warn] ERROR: suspending cleanup source dataset because at least one send task failed [Fri Oct 27 15:34:03 2017] [info] done with backupset tank/encZFS/VMs in 0 seconds [Fri Oct 27 15:34:03 2017] [debug] send/receive worker for tank/encZFS/VMs done (18065) ```

So, is it expected that incus copy ... --refresh does not work on encrypted ZFS datasets? Does anybody else have the same problems?

stgraber · February 3, 2024, 11:00pm

Yeah, this is something I’ve tried to resolve in the past without too much success.

When a dataset is encrypted, you can transfer it over but it will be transferred encrypted without the target server knowing how to decrypt it, at least not without a manual zfs load-key being run for the dataset on target.

The reason is that during refresh we need to:

Revert the target to the most recent snapshot
Transfer any new snapshots
Transfer a temporary migration snapshot for the current state of the dataset
Get rid of the temporary snapshot

The revert isn’t possible as it needs access to encrypted data. The rest would be fine, so if all we were doing is transfer or remove snapshots, that’d be fine, but the fact that refresh also needs to sync the state of the dataset itself is what’s causing issues.

It’s possible that there’s something we can do in the event where the key for the dataset is already loaded on the target but it’s definitely pretty tricky and pretty new logic…

fmhgk · February 4, 2024, 12:24pm

Thanks for the explanation! Because it seems that I can’t choose to use rsync instead of zfs send/receive with incus copy ... --refresh, I have to think about alternatives; maybe zfs on luks…

stgraber · February 4, 2024, 2:02pm

Yeah, we should still be able to detect this situation and have the target server decline zfs as a migration protocol instead forcing a fallback to rsync. That won’t be particularly fast but should be possible at least.

Pricey · February 12, 2024, 1:49pm

I just hit this copying between two pools on the same host. Both are encrypted and both happen to use the same keyfile in the same location. Odd!

Andrew_Wilson · February 23, 2024, 12:19pm

I know this is closed but may I suggest an alternative approach that still retains encrypted status of the container? The good news is, once it’s done, it’s 100% transparent. The bad news is, you do need an encrypted OS install (i.e re-install your host and select lvm luks-encrypt during install procedure). Hear me out before you dis-me:

Install your OS using lvm-luks encrypted server
Optionally use tang-clevis (I do) to manage your server key decryption service
=> Which adds a great deal of user-configurable convenience to de-crypting OS at boot
=> mine basically auto-decrypt, but I can stop that on a dime when I want or need to.
Create your zpool on a luks encrypted disk on your new luks-backed host
=> The key for which is in your OS /root/ folder, which is readable after manual or tang-service unlocking, so you basically have not just the OS disk luks-encrypted, ALL of them are
Install incus and create containers on the luks-backed pool (100% user-transparent)
copy -refresh now works (also 100% transparent).
All files in the OS luks drive and luks pools are encrypted at rest (100% transparent). But during operation, all files are decrypted. No need to use complicated zfs encryption as it’s no better and less convenient. Incus snapshot create/restore, copy, move etc. - they all work - you don’t even know they are encrypted until after you power-off the server (when it’s all unreadable).
If either the host or remote server go down, your luks disk auto-lock everything on that server
=> files are secure from at-rest prying eyes until you manually (or with tang-clevis) decrypt and boot

This works. It creates a little extra hassle setting up tang-clevis on a fresh OS install, but it’s 100% transparent after that. It’s how I run ALL my systems. Mine is super convenient.

V/R

Andrew