Incus backup server

paulo_bruck · August 20, 2024, 12:11pm

Hi

I have been reading documentation and have some thoughts about backup.

Based on How to back up instances - Incus documentation

snapshots are really nice , but if I have a problem with incus server disk it will be in a great trouble.
export and import from incus server to incus backup.
Solve my problem, but depending on how much time to restore and number of containers this could be a problem.
Copy an instance to a backup server
Reading this I should have a incus cluster… not my option because great majority of my clients have only 2 servers…
What about drbd, ceph or any other system? Could it solve my problem? Looking at /var/lib/incus I think that I could insert this inside a ceph/drbd/ and use storage like dir… or if I use lvm I could use both containers at lvm + /var/lib/incus inside a drbd/ceph system.

Number 4 is feasible? or there is another solutions?

I saw this post about it:

But I get confused cause doc says that incus cluster should have 3 machines at minimum…

It should be really nice if I could use incus cluster, but problem is use 3 servers…

best regards

mratt · August 20, 2024, 2:08pm

Hi

This is one way we do it and there are other ways that might be more appropriate to you.

Our servers have local storage of different types (SSD, NVMe, SAS HDD) and on top of them we run Incus managed ZFS storage pools. The cluster we just call incus-cluster. The backup server at the same site we call site-local stand-alone which is spec’d to handle all site-local snapshots and be able to take on workloads should a physical server be down. The remote DR server in relation to the incus-cluster we call site-remote stand-alone, which is a minimal spec’d Incus server just with loads of storage to be able to receive and store snapshots from the production sites.

Snapshots are local to the server the instance or volume is on, so best practice would be to send them to another place. The destination does not have to be a cluster nor a member of the cluster that the instances, volumes & their snapshot are on. You can use the concept of the Incus client and its’ Remotes to copy instances, volumes and their snapshots to a Remote, or a remote client can push, pull or relay them via their Remote feature, and their transfers can be incremental with the --refresh parameter.

As you mentioned there are additional options like export/import option and you also have the file system layer options depending on what you’re using, e.g. ZFS replication.

Sticking to Incus tools which we find very convenient, we have a clusters doing frequent snapshots then these are incrementally copied (well pulled) to a site-local backup Incus server using incus copy... --refresh. On that server we do different things with the snapshots such as:

turn some daily snapshots into weekly backups (take static copy)
turn some weekly snapshots into monthly backups (take static copy)
…yearly (take static copy)

and then we ship some of them off to an off-site remote for off-site DR, again using incus copy

We are able to restore all of our environments at the DR site very quickly, and easily and from these snapshots without changing platform requirements regardless of where they are housed, so long as we use Incus and the CPU architecture is the same so copy/restore them to.

Hope this helps as an option.

paulo_bruck · August 20, 2024, 6:36pm

Hi Remmy

It helps a lot your description. I’ll try some of these commands and I would like to do a next step using rsync, drbd or ceph. But you help me a lot. As soon as I try these I’ll insert a note .
regards

paulo_bruck · August 26, 2024, 10:25pm

Hi Remmy

Some news about backup.

try incus export at server_a
incus export cerberus
ok it generate an image cerberus.tar.gz
copy rhis image to server_b

incus import cerberus.tar.gz
Error: Failed importing backup: Failed loading profiles for instance: Profile not found

humm at server_b which is a new one only for backup images there is no profiles.

I can create same profiles at server_a, but is this the best method?

backup incus
at server_a
incus config trust server_b
Client server_b certificate add token:
xxxxxx

at server_b
incus remote add server_a --accept-certificate

incus copy --mode=pull server_a:cerberus cerberus
Error: Failed instance creation: Requested profile “11” doesn’t exist

humm profile error again…

which is the best method to backup profiels? a tar.gz from all /vart/lib/incus from server_a ? or is there another method?

regards

mratt · August 27, 2024, 8:42am

Hi, for profiles, we’ve not settled on a specific backup format for them so we’re doing 2 types:

incus profile copy to within the same project name on the site-local stand-alone (backup) server. I think you need an incus profile copy to get it copied first and then an incus profile copy --refresh thereafter to maintain it on the backup server.
incus profile show {profile-name} > {some/path}/{exported-profile}.yaml to backup as YAML.This happens on the cluster member with database-leader role (just to choose a server) and then it’s copied into a custom storage volume that’s included with the rest of the storage volume backups.

With the first option, you can schedule and keep them sync’d much like the instance and its snapshots. This seems to be the easiest option.

Other things like networks don’t have a copy option that I’m aware of, so those get exported into YAMLs as described with profiles.

candlerb · August 27, 2024, 6:49pm

Regarding 3: copying to a backup server does not require the server to be in the same cluster. incus copy srv1:foo srv2: works just fine (and indeed, the instance can be called “foo” on both source and destination, which would not work within a cluster). See incus copy.

Using incus copy --refresh will give an incremental copy when supported, e.g. when the source pool and destination pool are both ZFS and there’s at least one snapshot, so only the differences are copied. It’s a bit hard to find in the documentation, unless you know the term you’re looking for is optimized volume transfer.

Combined with scheduled snapshots this makes a very nice replication/DR option, although you still have to schedule the copies yourself. For DR you can quickly bring up any container on the target node or cluster.

Regarding 4: drbd and ceph are not backups - they are replicated storage. If you delete or corrupt a file, those changes will be instantly propagated. Combined with snapshots though it’s “almost” backup. You certainly don’t want to run the “dir” driver on top of CephFS though.

There’s no native drbd support but maybe one day linstor will be added, and you might even be able to use it manually today.

Using vanilla drbd you could replicate a whole partition across to another server, but then you won’t be able to move individual containers. (Possibly for a 2-node setup you could use DRBD in dual-primary mode with LVM cluster driver; I’ve never tried it and it sounds a bit hairy. That would require an incus cluster though, and every cluster really needs a third node, even if it’s just a tiny RPi to act as a tiebreaker)

You might as well use RAID1 within a server, and if that server fails (i.e. your case 1), move the disks over to another server. Using two disks in the same server allows you to use zmirror, which is much more robust than RAID1, since it can detect and correct errors.

But really I think of RAID1/zmirror/drbd as “high availability” not “backup”. That is, your server keeps running if a disk fails, but it doesn’t protect you against data deletion, corruption, ransomware etc.

paulo_bruck · August 27, 2024, 8:11pm

Hi Brian

You are correct.

I am thinking to learn from 1 simple incus server to incus server + backup and later a high availability incus server.
Certanly DRBD, RAID1 and Ceph are really not backups.
I started playing with incus LVMcluster + DRBD but is not that simple. It envolves pacemaker, corosync, incus , lvm, drm and so on…8(
Certanly the best should be drbd native, but I’ll try drbd and incus lvm driver just to see if it can help me while there is no drbd driver for incus.

lvm
drbd
o.s.

As soon as I finished tests I’ll describe results here.

thanks for clarification

candlerb · August 27, 2024, 8:21pm

Which part requires those? I thought lvmcluster required only lvmlockd and sanlock.

paulo_bruck · August 27, 2024, 8:24pm

Hi Remmy

Just to answer with some examples.

In case of backup at incus server ( server_a)

incus profiles list
11 12 24 default ( my profiles)
incus profile show 11 > 11.yaml

With these at worst cenario I can open 11.yalm, take config and recreate the profile at backup ( server_b)

server_b as backup:
after “join” server_b as server_as client, I could import profiles:

at server_b ( backup server)
incus profile copy server_a:11 local:11 --refresh

Voilà . With this I could copy profile 11 from server_a to server_b (backup incus server)

Perfect, now I have a backup server with containers and profiles up to date…80)

Thanks for helping me to understand a little more incus.

paulo_bruck · August 27, 2024, 8:37pm

Yes, lvmcluster requires only lvmlockd or sanlock, but if you want a High availability you will need pacemaker, corosing and others.

Reading man page:
man LVMLOCKD(8)
…
DESCRIPTION
LVM commands use lvmlockd to coordinate access to shared storage.
When LVM is used on devices shared by multiple hosts, locks will:
• coordinate reading and writing of LVM metadata
• validate caching of LVM metadata
• prevent conflicting activation of logical volumes
lvmlockd uses an external lock manager to perform basic locking.

   Lock manager (lock type) options are:
   • sanlock: places locks on disk within LVM storage.
   • **dlm: uses network communication and a cluster manager.**

Maybe I am wrong, but I think that you will need pacemaker+corosync+dlm+lvmlock to start with incus using lvmcluster…

Unfortunately there in not much information about how to set a lvmlock _ drbd. I could see Suse Docs ( the best that I could see and based : SLE HA 15 SP2 | Administration Guide | DRBD).

candlerb · August 27, 2024, 9:10pm

Sorry, I don’t get it. What’s wrong with using sanlock, instead of dlm?

Unfortunately there in not much information about how to set a lvmlock _ drbd.

drbd doesn’t care. It provides shared block storage (when running dual-primary). It’s up to the lvmcluster layer to partition it, and lvmlockd/sanlock protects concurrent access by the LVM manager.

paulo_bruck · September 6, 2024, 2:18pm

Hi

Sorry for delay…

drbd + lmvlock + sanlock works like a charm…

Next step is to include a second machine to see how does it work with High Availability…

As soon as I do , I 'll post here…80)

paulo_bruck · September 11, 2024, 12:41pm

Hi again

O.S. Ubuntu24.04LTS

Playing with lmvcluster + lvmlock + sanlock.

First incus is ok.

drbd as primary-primary - ok
#) cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 211FB288A383ED945B83420
0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:2621440 nr:317066365 dw:317066245 dr:2646506 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

lvmlockd - ok
sanlockok - ok

I can create as many machines as I like.
as example:
— Logical volume —
LV Path /dev/incus/containers_painel
LV Name containers_painel
VG Name incus
LV UUID FzxnG1-zm8Y-FoHa-Cxos-YP0Q-0rTc-lBM56X
LV Write Access read/write
LV Creation host, time zeus, 2024-09-03 15:39:31 -0300
LV snapshot status source of
containers_painel-snap1 [active]
containers_painel-snap0 [active]
LV Status available

open 1

LV Size 10,00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto

currently set to 256
Block device 252:25

BUT
second machine:

drbd ok:

#) cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 211FB288A383ED945B83420
0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:2621440 nr:317066365 dw:317066245 dr:2646506 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

lvmlockd I think it is ok. Do not kwon how to check,

systemctl status lvmlockd

● lvmlockd.service - LVM lock daemon
Loaded: loaded (/usr/lib/systemd/system/lvmlockd.service; disabled; preset: enabled)
Active: active (running) since Wed 2024-09-11 09:11:28 -03; 18min ago
Docs: man:lvmlockd(8)
Main PID: 182953 (lvmlockd)
Tasks: 3 (limit: 38157)
Memory: 628.0K (peak: 1.1M)
CPU: 9ms
CGroup: /system.slice/lvmlockd.service
└─182953 /usr/sbin/lvmlockd --foreground

set 11 09:11:28 pauloric systemd[1]: Starting lvmlockd.service - LVM lock daemon…
set 11 09:11:28 pauloric lvmlockd[182953]: [D] creating /run/lvm/lvmlockd.socket
set 11 09:11:28 pauloric lvmlockd[182953]: 1726056688 lvmlockd started
set 11 09:11:28 pauloric systemd[1]: Started lvmlockd.service - LVM lock daemon.

sanlockd I think it is ok. Do not kwon how to check,

systemctl status sanlock.service

● sanlock.service - Shared Storage Lease Manager
Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; preset: enabled)
Active: active (running) since Wed 2024-09-11 09:11:55 -03; 20min ago
Docs: man:sanlock(8)
Process: 183001 ExecStart=/usr/sbin/sanlock daemon $sanlock_opts (code=exited, status=0/SUCCESS)
Main PID: 183002 (sanlock)
Tasks: 6 (limit: 38157)
Memory: 13.9M (peak: 14.9M)
CPU: 106ms
CGroup: /system.slice/sanlock.service
├─183002 /usr/sbin/sanlock daemon
└─183003 /usr/sbin/sanlock daemon

set 11 09:11:55 pauloric systemd[1]: Starting sanlock.service - Shared Storage Lease Manager…
set 11 09:11:55 pauloric systemd[1]: Started sanlock.service - Shared Storage Lease Manager.

#) lvdisplay. Humm I can see all machines, but they are “Not available.”
— Logical volume —
LV Path /dev/incus/containers_painel
LV Name containers_painel
VG Name incus
LV UUID FzxnG1-zm8Y-FoHa-Cxos-YP0Q-0rTc-lBM56X
LV Write Access read/write
LV Creation host, time zeus, 2024-09-03 15:39:31 -0300
LV snapshot status source of
containers_painel-snap1 [INACTIVE]
containers_painel-snap0 [INACTIVE]
LV Status NOT available
LV Size 10,00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto

Questions.

Should they be all as not available?
at second incus I could not create incus using:
incus admin init… ( creating or not storage …).

Appreciate any help for this subject 8)

paulo_bruck · September 11, 2024, 1:07pm

ok some commands that could help
at firt incus that is running ok:
zeus:~# lvmlockctl -i
VG incus lock_type=sanlock uAqoPm-vxzU-rwsm-PBVa-S2DY-aPUX-y00uOA
LS sanlock lvm_incus
LK VG un ver 55
LK LV ex XW6etX-TBhq-LVih-AebN-M96C-BaWx-OMbcbR
LK LV ex ZSfxhU-pgoD-oHSS-JhBc-Mt9n-TS8n-4dkMV9
LK LV ex FzxnG1-zm8Y-FoHa-Cxos-YP0Q-0rTc-lBM56X
LK GL un ver 0

zeus# sanlock client host_status
lockspace lvm_incus
5 timestamp 5896

zeus:~# sanlock client status
daemon 737fc3e8-8bc4-449c-8dc0-692292346911.zeus
p -1 helper
p -1 listener
p 4626 lvmlockd
p -1 status
s lvm_incus:5:/dev/mapper/incus-lvmlock:0
r lvm_incus:FzxnG1-zm8Y-FoHa-Cxos-YP0Q-0rTc-lBM56X:/dev/mapper/incus-lvmlock:71303168:11 p 4626
r lvm_incus:ZSfxhU-pgoD-oHSS-JhBc-Mt9n-TS8n-4dkMV9:/dev/mapper/incus-lvmlock:74448896:8 p 4626
r lvm_incus:XW6etX-TBhq-LVih-AebN-M96C-BaWx-OMbcbR:/dev/mapper/incus-lvmlock:72351744:13 p 4626

second incus:
pauloric:~# lvmlockctl -i
humm nothing…

pauloric:~# sanlock client status
daemon 6cab46ef-3905-4a5c-b3b7-6576e6e7e686.pauloric
p -1 helper
p -1 listener
p -1 status

pauloric:~# sanlock client host_status
pauloric:~#
nothing

paulo_bruck · September 13, 2024, 12:27pm

HI, me again.

Of course it was something stupid from my part…

pauloric#) lvmlockctl -i

nothing

pauloric#) vgchange --lock-start

# lvmlockctl -i
VG incus lock_type=sanlock uAqoPm-vxzU-rwsm-PBVa-S2DY-aPUX-y00uOA
LS sanlock lvm_incus
LK VG un ver 64

pauloric#) sanlock client status

daemon 75fa3d65-6a51-4812-89ba-7b6bd01dc761.pauloric
p -1 helper
p -1 listener
p 386478 lvmlockd
p -1 status
s lvm_incus:3:/dev/mapper/incus-lvmlock:0