ZFS Replication for CT and VM with Incus?

Hello,

I’m in the process of evaluating Incus as an alternative to Proxmox. Proxmox has served me well for over a decade now, but the lack of Docker image (OCI image) support is a real burden. The latest release of Incus, with its support for OCI, is a game changer.

I’ve already started exploring Incus and its web UI. However, I haven’t found much information about its replication support. In Proxmox, I use this feature to replicate the storage on 2 or 3 nodes. This is useful for two scenarios: (1) When I need to perform maintenance on one node, I can migrate all the containers and VMs to different nodes where the data is already replicated, making the migration much faster. (2) It also serves as a backup to recover in case of data loss.

Is there anything similar I could do with Incus?

1 Like

I think you should look for Incus clusters. Three nodes minimum per cluster, and you have the choice for Ceph if you can set it up.

I’ve already started exploring Incus and its web UI. However, I haven’t found much information about its replication support

It’s difficult to find it in the documentation, even when you know it’s there, but another option is to use zfs or btrfs storage with snapshots, and perform incremental copies with the --refresh flag.

See optimized storage volume transfer and periodic snapshots.

Unfortunately, you’d still have to script a periodic call to incus copy --refresh foo bar: for each instance. But it might be worth considering.

In the long run, if incus gains a storage driver for linstor that would be great. That has been mooted for a long time.

BTW, personally I avoid incus clustering, as it adds new failure modes which can be difficult to recover from, and is only really needed if you have shared storage. You should be able to use incus copy --refresh between independent incus nodes (or independent incus clusters) just fine.

EDIT: if you copy to an independent node or cluster, then the copy can have the same name as the original. This may either be helpful or confusing, depending on your use case. If you copy within a cluster, even to a different node, then the copy must have a different name.

We have recently moved 2 of our 3 hosting setups from LXC without replication resp. LXC with iSCSI-blockdevices backed by zVols on a TrueNAS storage, to Proxmox, but we continue to have problems with unexpected hard reboots, so Incus is on our radar too.

Currently the easy and managable ZFS replication between nodes is the “killer feature” that keeps us with Proxmox, I agree with Patrik that easy replication and efficient node-switching is crucial. We could live without Proxmox’ HA feature (although it’s quite nice) and manual switching, and we’re almost agnostic between ZFS and BTRFS (although there’s an IP risk hovering over ZFS).

I wouldn’t call CephFS an option. We’ve made 2 or 3 attempts at it, all ended with a clear NO, too complicated, too slow, too many risks.
Same for DRBD which we left several years ago, after all kinds of negative experiences, including a bug that had been reported 3+ times but had not been taken seriously until we hired a kernel expert (and commiter) to track it down, fix it, and get it fixed in vanilla upstream (and even backported to LTS kernels).

So, Thanks Simos for the Incus Clusters link, looks like it’s becoming a real option (it was too much of an infant when we decided for Proxmox as intermediate to midtem strategy).

ZFS has been in Ubuntu since I think 2018, and nothing material has happened, so I don’t think there’s anything to worry about. In any case, it’s unclear who would sue whom.

(I am not a lawyer, don’t take any of this as legal advice, always seek your own legal advice)

A common misconception with the whole ZFS thing is that Oracle could get grumpy about it and cause trouble. That’s incorrect. ZFS is released under the CDDL license which is a pretty permissive Open Source license.

The issue with the Linux kernel is about ZFS integrating (linking) against GPL code as this may constitute a derived work and violate the GPLv2 license of the Linux kernel.

Whether ZFS can be considered a derived work of the Linux kernel when it originated from a completely different operating system is where things get pretty murky and generally is an area that most think is better left grey as a decision in one direction or another may have significant impact on many other pieces of software.

Now if ZFS was found to be in violation of the Linux kernel’s GPLv2 license, the harm caused wouldn’t have been to Oracle but instead to the individual Linux kernel developers who developed the kernel APIs and code that ZFS uses. Those are the individuals who would hold a valid claim against whoever is distributing ZFS linked with the Linux kernel.

On that last one, a common loop-hole used by a number of distributions is to not ship ZFS pre-linked, this in theory avoids the whole issue by instead having the individual users performing the linking (or in the case of DKMS, the entire build) locally on their system, shifting the potential license violation onto the individual end user and avoiding ever distributing potentially problematic binaries.