Each night, we are lx copying a large GitLab container (server with DB), and its separated lx storage volumes (total size: 1.6 TB) from host1 to host2 (“cold” standby in case of emergency). Our hope is to be ready on host2 within minutes, should host1 fail.
We create snapshots (type container) of the server and all its attached volumes (type custom). As I understand it, a snapshot records deltas between the source dataset and the point in time the snapshot was taken (COW). Thus, to avoid any inconsistencies between the GitLab server (with its DB) and the attached volumes, we stop everything while the backup is running (‘lxc storage volume copy …’, followed by ‘lxc copy …’ to host2). However, this backup job takes too long and the downtime becomes inacceptable.
How does one handle consistency between a container snapshot and its separated storage volume snapshots? Can snapshots help here at all? Do we apply snapshots the wrong way?
Versions: Ubuntu 20.04 LTS, LXD 4.3, ZFS fs backend