Reading through the Incus documentation and source code I’ve come to reflect a bit about the interplay of ZFS encryption with migration and also delegation. The dataset of the storage volume carrying a system container instance can contain further datasets, which now can bring their own encryptionroot or keylocation. This poses difficulties for automatic replication of instances/datasets across storage pools and Incus nodes, which in the past already surfaced here and there.
This document is mainly a collection of resources on the subject, an investigation into how they play together and a sketch of possible resolution vectors that come to mind.
- Current situation
- Edge cases
- Resolution vectors
Hoping that following this reflection we end up with a better understanding of the different parts at play.
Current situation
The Incus ZFS driver is a wrapper with custom logic around the ZFS binaries zfs and zpool. It inherits all of their side effects. In it there are special handlers for volumes and migrations.
Limitations of the ZFS driver
It has few limitations with regards to restoring from older snapshots, observing I/O quotas and feature support of different ZFS versions.
Encryption of a storage pool
The ZFS encryption of a storage pool is currently transparent to incus, as long as it is unlocked beforehand.
We will take this presumption at face value later.
Replication of an instance to another host
In daily operation, an Incus server administrator deals with preparation for desaster recovery scenarios. This includes maintaining several instance copies of various forms. We are primarily looking at using
copy, which gives us the means to act on “instances within or in between servers”. The --refresh* and --stateless options control the flow of the operation. It would be nice if we could schedule replication like snapshots.
Other ways to move data between Incus hosts
Instead of building soley on snapshotting, a full import/export cycle may be preferable for a complete backup. Instances and storage volumes (and storage buckets, not as relevant here) can be im- and exported.
Instances:
import,export
Volumes:storage volume import,storage volume export
(Buckets:storage bucket import,storage bucket export)Please make sure to encrypt your backups separately, which is highly suggested. Here we want to focus on the replication case before, as it involves the delicate situation of native encryption handling in a distributed setting.
Migration of an instance to another host
Concerning the migration of an instance, here we can focus on the case to move existing Incus instances between servers.
move also operates on “instances within or in between servers”. The --instance-only and --stateless options control the flow here.
Snapshots of an instance and volumes
snapshot helps to maintain the lifecycle of instance snapshots. This becomes especially useful around and together with a snapshot restore.
Individual snapshots of the instance and storage volumes can be handled independently with the storage volume snapshot command. E.g. for a lightweight way to to back up custom storage volumes.
Replication and migration of volumes to another host
Using sending and receiving ZFS dataset snapshots for the mechanics, Incus also knows to move or copy storage volumes between Incus servers.
The transfer of volumes can similar to above for instances be conducted with commands named storage volume copy and storage volume snapshot move.
The storage volume copy command knows the same --refresh* parameters as above, which similarly qualifies it for automatic scheduling.
Internals
The Incus migration API covers clients and the server. The server also contains a lower-level ZFS driver, to which we return at the end. The high-level API is really concise.
High-level definitions and invocations of the migration API
incus/client/incus_instances.go at 5d701f72540897b947d75c45a814f0379d5f3f4c · lxc/incus · GitHub
incus/cmd/incusd/instances_post.go at 5d701f72540897b947d75c45a814f0379d5f3f4c · lxc/incus · GitHub
incus/cmd/incus/move.go at 5d701f72540897b947d75c45a814f0379d5f3f4c · lxc/incus · GitHub
The tests show us how its being used.
Watch out for an easter egg in the first file migration.sh.
incus/test/suites/migration.sh at main · lxc/incus · GitHub
incus/test/suites/clustering_move.sh at main · lxc/incus · GitHub
incus/test/suites/container_move.sh at main · lxc/incus · GitHub
The tests do not distinguish between separate cases for un-/encrypted source and/or target volumes with or without recursive datasets. In case encryption is present, we also distinguish between with or without encryption keys loaded.
- Volume: source, target
- Recursive: yes, no
- Encryption: not present, key not available, key available
This equates to 3 ✕ 2 ✕ 2 equals 12 possible such cases. Please correct, when wrong.
In the case of recursive datasets, e.g. with using delegation, creative combinations of encryptionroot and keystatus on host and guest system can be assumed. This point will return again.
Edge cases
Both Incus and ZFS bring edge cases with their implementations, whose side-effects are not strictly isolated from each other and cannot be. This is due to the tight coupling of the Incus storage pool and volume mechanics together with the pool and dataset mechanics of the underlying file system.
While the high-level Incus API streamlines operations by making educated opinionated choices, it also carries a weight of presumptions, which eventually do not hold in all potential use cases, esp. with regards to those employing ZFS encryption. Incus does not offer an API to manipulate encryption properties of a storage pool.
One problematic aspect of this is the perceived general instability of the ZFS encryption implementation, which can only be called rather incomplete from an operational point of view. The main hindrances are, that the Initialisation Vector (IV) of a dataset is tied to its lifetime and it is not possible to provide alternative key slots. It is often also not known that inheriting a key does not just tie the secret for opening the key slot to its descendants, in the sense that it is merely referenced and not copied, but that descendant datasets become tied to the Initialisation Vector only present in their encryptionroot dataset higher up.
- Multiple encryption keys/key methods · Issue #6824 · openzfs/zfs · GitHub
- Encryption keys/roots management tools needed · Issue #12649 · openzfs/zfs · GitHub
- Tool for Emergency Master-Key Recovery · Issue #15952 · openzfs/zfs · GitHub
These two factors together often work against each other in cases when an Incus migration meets encrypted pools.
Incus constraints
When a ZFS version is sufficiently new, all transfers will be considered raw (-w).
This poses the first challenge for encrypted workloads.
Any send of a dataset with sub-datasets will be provided as a replication stream package (-R):
We find this pattern in other places, e.g.
This poses the second challenge in (partially) encrypted environments.
ZFS constraints
ZFS is not a distributed file system per se–see former OpenEBS cStor–and all snapshot and replication handling has to be done by external systems. ZFS itself will not life cycle snapshots, sends and receives for you, but it will conduct them.
We know that encryption handling for raw replication streams comes with dangers and pitfalls that have in the past led to loss of user data. Prominent examples were given above.
The details of these currently open issues are worrysome, but insightful:
- zfs receive -F cannot be used to destroy an encrypted filesystem · Issue #6793 · openzfs/zfs · GitHub
- Replicating encrypted child dataset + change-key + incremental receive overwrites master key of replica, causes permission denied on remount · Issue #12614 · openzfs/zfs · GitHub
- Use recv_fix_encryption_hierarchy for non-recursive send/receive so that encryptionroot is inherited on received side · Issue #15687 · openzfs/zfs · GitHub
- Enable zfs send|receive for encrypted root dataset · Issue #17724 · openzfs/zfs · GitHub
What is especially worrysome, that there are so many ongoing problems with ZFS encryption here and above, with no available resources being dedicated to the case right now. The encryption code can as well be considered unmaintained, as noone is directly appointed to it.
In other places it was said that this part of Incus code is fairly new. Also the existence of issues like Repair encryption hierarchy of 'send -Rw | recv -d' datasets that do not bring their encryption root · Issue #12000 · openzfs/zfs · GitHub , in which receiving using -d (sister of -e) for a while led to raw sends of replication packages without their encryptionroot and with that without access to their IV. Tricky, but fixed meanwhile and not in use by Incus. But a dunning example for just how recent fixes and ongoing concerns with the implementation of ZFS encryption are, despite it perfoming well within its bounds.
Both Incus and ZFS are not stable around handling replication and encryption together, each one amplifying the downsides of the other. Careful deliberation might help to identify some cases, in which we can work around or beyond these limitations. We’ve shown many modifyable constraints above. Which can we maybe loosen to bring movement into the situation?
Forward
With a bit of forum exegesis and with the background of former and known regressions, we might be able to boil down the actual error condition seen on replicated instances, where the target dataset does not decrypt despite available key material, due to a possible loss of the original IV.
Side-effects and perceived regressions
Let’s consider this source layout:
mpool # per convention unencrypted pool, encryptionroot -
mpool/USERDATA # second-level encryptionroot, per convention, holds IV
mpool/USERDATA/incus # Incus storage pool root, encryptionroot mpool/USERDATA
mpool/USERDATA/incus/containers # encryptionroot mpool/USERDATA
mpool/USERDATA/incus/containers/instance # encryptionroot mpool/USERDATA
And this target layout:
npool # per convention unencrypted pool, encryptionroot -
npool/USERDATA # encryptionroot, holds _different_ IV or unencrypted
npool/USERDATA/incus # encryptionroot npool/USERDATA or unencrypted
npool/USERDATA/incus/containers # encryptionroot npool/USERDATA or unencrypted
npool/USERDATA/incus/containers/instance # replicated instance, raw
The replicated instance npool/USERDATA/incus/containers/instance needs the mpool/USERDATA IV to decrypt, which isn’t available. Is it possible that we keep keymaterial for unlocking the IV and itself intact during copy and move operations, but detach it from its encryption hierarchy during the raw send?
Would that hypothesis hold? Can we find counter-examples?
Returning to the introductory quote, can we still assume that:
I hope this post contributes to shining light at some edge cases that we invite for when using raw sends with encrypted datasets.
Here we are ultimately blocked by the lack of key material tooling in ZFS. Every LUKS admin is used to keep a backup of their encryption headers. Key exchange, eventually through a KMS like OpenBao, else would seem a possible option.
Could the availability of delegation also mean, that during initial creation of an instance’s dataset the storage driver provisions its own key material and puts the encryptionroot right at the level of the dataset, in so the IV does never leave it? Would that count as a confidential/trusted compute hardening of the Incus cloud plattform?
Implicitly Incus does now allow users to bring their own keys, which complicates the situation even further. The key will naturally be available to the host system, but a guest cannot access other users keys. This is now a valid course of action:
$ incus storage show default | yq '@json' | jq '{config, driver}' | yq -P
config:
source: rpool/ROOT/ubuntu_d4psvq/var/incus
volatile.initial_source: rpool/ROOT/ubuntu_d4psvq/var/incus
volume.zfs.delegate: "true"
zfs.pool_name: rpool/ROOT/ubuntu_d4psvq/var/incus
driver: zfs
$ incus launch images:ubuntu/noble u1 -c security.nesting=true -c security.syscalls.intercept.mknod=true -c security.syscalls.intercept.setxattr=true
$ incus exec u1 -- bash
Continuing inside:
root@u1:~# apt update
root@u1:~# apt upgrade -y
root@u2:~# apt install -y curl wget
root@u1:~# curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc
root@u1:~# sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc
EOF'
root@u1:~# apt update
root@u1:~# apt install -y incus zfsutils-linux jq yq
root@u1:~# zfs list -Ho name /
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1
root@u1:~# zfs create -o canmount=off -o encryption=on -o keylocation=prompt -o keyformat=passphrase rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA
Enter new passphrase:
Re-enter new passphrase:
root@u1:~# zfs get -Ho value keystatus rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA
available
Back on the host system, we see the same, which is always to be kept in mind:
$ zfs get -Ho value keystatus rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA
available
Continuing inside:
root@u1:~# systemctl stop incus.service incus.socket # not yet initialised; just in case to start fresh
root@u1:~# rm -rf /var/lib/incus
root@u1:~# zfs create -o mountpoint=/var/lib/incus rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus
root@u1:~# zfs create rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools
root@u1:~# zfs create rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools/s1
root@u1:~# systemctl start incus.service incus.socket
root@u1:~# incus admin init --preseed <<< '
config: {}
networks:
- name: incusbr0
type: bridge
config:
ipv4.address: auto
ipv6.address: auto
storage_pools:
- config:
source: rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u2/USERDATA/incus/storage-pools/s1
description: ""
name: default
driver: zfs
profiles:
- devices:
eth0:
name: eth0
network: incusbr0
type: nic
root:
path: /
pool: default
type: disk
name: default
cluster: null
'
root@u1:~# incus storage show default | yq '@json' | jq 'fromjson | {config, driver}' | yq -y
config:
source: rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools/s1
volatile.initial_source: rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools/s1
zfs.pool_name: rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools/s1
driver: zfs
# We cannot delegate ZFS to it this level down and skip it.
root@u1:~# incus launch images:ubuntu/noble u2 -c security.nesting=true -c security.syscalls.intercept.mknod=true -c security.syscalls.intercept.setxattr=true
Launching u2
root@u1:~# zfs get -Ho value encryptionroot rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA/incus/storage-pools/s1/containers/u2
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/USERDATA
We find that the encryptionroot and with it the IV of the u2 container’s dataset on the u1 incus host lives outside of the root of its storage pool. When the Incus host in the u1 container generates and sends a replication stream package from the u2 dataset, its IV at the encryptionroot is lost and the dataset from a replicated Incus instance from an originally encrypted dataset cannot be unlocked anymore. There is no tooling that allows to work around this.
This kind of nested setup becomes useful, when a someone wants to offer their users the same level of logical separation and flexibility that they use themselves.
root@u1:~# incus launch images:fedora/43 f1 -c security.nesting=true -c security.syscalls.intercept.mknod=true -c security.syscalls.intercept.setxattr=true
root@u1:~# incus exec f1 sh -- -c 'dnf install --assumeyes podman'
root@u1:~# incus exec f1 sh -- -c 'podman run hello-world | head -n 1'
!... Hello Podman World ...!
One way forward that meets with the constraints posed by Incus, see following quote, is having it deal with encryption itself by providing (separate) keymaterial per creation of an instance’s dataset. Then a new IV is also generated and together with the rest of the dataset replicated within the replication stream package. The replicated dataset of an encrypted instance can later successfuly be unlocked, is the assumption.
Another crude way then could be to create the encryptionroot in the dataset for the Incus storage pool. Given careful planning, this dataset could be raw-replicated across Incus hosts, before they initialise their storage pools on them. They will all carry the same IV and thus allow their child datasets to use it for decryption as their interchangeable encryptionroot. Which in theory would also be available to datasets of migrated Incus instances.
Assuming the original hypothesis stands uncorrected and this is the behaviour that is at work here.
Commenting on the first premise, maybe there actually are situations, where trust exists? The scenarios described here often came from people in posession of valid credentials. Between my own hosts (in a cluster) there is already a trust relationship and I’m reinforcing it by providing additional proof with the supplied key material. Eventually some of the guarantees that Incus tries to achieve here can be shifted towards adding encryption into the equation.
If the zero trust requirement can be pushed one layer down, to ZFS, it would open up the restrictions and constraints of the higher-level implementation, Incus. Which could, for some cases, free us from the assumption, that we always want to conduct raw sends. Regular incremental sends, with decrypting in transit and reencrypting the dataset at rest when keys are loaded on both ends, work just fine, but need external scheduling and esp. recursive enumeration of snapshots on source and target with their respective incremental ranges.
Maybe the assumption that load-key would be sufficient for a raw encrypted dataset on a remote machine does not hold up? “without too much success” and “pretty tricky” suggest the mental model used when developing against this surface didn’t often match up with reality. Was IV placement considered?
A physical revert, if I understand it correctly, happens with rolling back a dataset. This worked:
root@u1:/srv# zfs mount rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT
cannot mount 'rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT': encryption key not loaded
root@u1:/srv# zfs list -t snap rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT
NAME USED AVAIL REFER MOUNTPOINT
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@0 144K - 192K -
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@1 112K - 228K -
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@2 0B - 264K -
root@u1:/srv# zfs rollback -r rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@1
root@u1:/srv# zfs list -t snap rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT
NAME USED AVAIL REFER MOUNTPOINT
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@0 144K - 192K -
rpool/ROOT/ubuntu_d4psvq/var/incus/containers/u1/ROOT@1 0B - 228K -
Did I get something wrong? Maybe encrypted --refresh syncs can happen fine as well?
Is this observation eventually another occurrence of mixing up the keyslot of a dataset with its IV, which will be different on both servers, even when set up with the same key material?
Potential resolution vectors
The methods concerned with sending and receiving are:
They are called from:
Could making them encryption-aware possibly help with working around some of the deficiencies in using Incus together with encrypted Incus pools, wherever IV and encryptionroot may live?
That’s what I was wondering. Many thanks for your interest and keep up the good spirit.