Incus containers carp protocol?

ckruijntjens · May 31, 2025, 10:27am

Hi All,

I can not use ceph for redundantie. I now have 2 systems that both use zfs backend.

Is it possible to use for example the carp protocol to have a failover for containers and vm machines?

And what would i do to keep the containers and vms in sync on disk?

Could you advise please?

candlerb · June 1, 2025, 8:09am

I can not use ceph for redundantie.

Aside: if you want to deploy networked block storage which is not as complicated as Ceph, incus now supports Linstor as a storage backend. Linstor configures DRBD replication, which is basically RAID1-style mirroring over the network.

I now have 2 systems that both use zfs backend.

And are those two systems independent incus hosts, not clustered? That’s simple and reliable (and what I do). Incus clustering adds additional failure modes which can bring down the whole cluster.

However, if you want to use Incus + Linstor or Ceph, you will need a cluster I believe.

Is it possible to use for example the carp protocol to have a failover for containers and vm machines?

It depends.

If your containers are all bridged onto the same IP network, then you can run container foo1 on host1, and container foo2 on host2 using a different IP on the same subnet, and within those containers you can run keepalived or carp with a third shared virtual IP address, and then you point the users of those applications to the virtual address. It works fine, but you have to configure it yourself in each pair of containers, and each pair needs its own virtual address.

If you can put all your applications behind a reverse proxy which has its own health-checking built in, then you only need to make the proxy itself be a vrrpd pair. The proxy could also be configured for full live-live operation, i.e. send some requests to one backend and some requests to the other.

And what would i do to keep the containers and vms in sync on disk?

Since you have zfs backend, you could do incremental snapshot copies between the two hosts using incus optimized volume transfer, basically incus copy ... --refresh. You can configure incus to generate the snapshots on a schedule, although you’d have to periodically run incus copy ... --refresh yourself.

However, the copies would then be exact clones (including their IP addresses). That would be OK if in a failure situation you decided to shutdown foo1 and start the replica foo2. (In which case, you don’t want CARP or VRRP anyway).

I think that is a simple, reliable and easy-to-understand approach. You won’t get automated failover if a host dies, but with 2 nodes that would be a dangerous thing anyway. (Read up about “split brain” to understand why).

If you want to use keepalived/carp then maybe you just want to replicate the important application state out-of-band, e.g. using rsync or similar tool. If your applications use databases then there will be database-level replication mechanisms that you can configure. In all cases you’d have to be careful to turn around the replication on failover; data loss is very likely when this occurs.

It all very much depends on the specifics of what you’re doing. There’s no simple one-size-fits-all solution, and high availability is very hard to get right.