LXC snapshots.schedule recommendations and questions

Hi there!

I just enabled automated snapshots and I was wondering if anyone could provide any safe recommendations for the values of the different parameters.

My current values are:

lxc config set c1 snapshots.expiry “1w”
lxc config set c1 snapshots.schedule “1 * * * *”
lxc config set c1 snapshots.schedule.stopped false
lxc config set c1 snapshots.pattern “snapshot-%d”

Questions:

  • Is it possible to specify an alternative destination/target drive/pool for snapshots?
  • How hard does it get on the storage I/O? I am worried about snapshots for DB containers.
  • Any example of a slightly more sophisticated pongo2 pattern?
  • Is there any reason not to I include this on the default profile so it’s used by every running container?
  • Is there a way to deleted expired snapshots “to a point”? (for example, stopped containers would produce no new snapshots, hence after a week, those will be deleted and none will remain, is there a way to keep, let’s say, the last 10 snapshots, even if expired?)

Thank you!

  1. No, snapshots are always stored on the same pool as the volume/container they belong, that’s because the majority of backends require that
  2. Depends on the storage backend in use, it’s virtually free on zfs/btrfs, a bit more costly on lvm/ceph and insanely costly on dir
  3. Not off the top of my head. The only exported property is creation_date, so you can format the date whichever way you want but that’s about it
  4. Not really, especially on a storage backend where it’s cheap, having snapshots taken regularly can be quite useful
  5. We don’t currently have such a logic, no. You could have an external script handling this though, or we may add a feature to set a minimum number of snapshots to keep at some later point.
1 Like

Thanks a lot @stgraber!

Good for me, I’m using ZFS.

I left all the containers on the cluster creating snapshots every minute for a few hours and it’s running smooth, will leave it going all night, just because I can… and then set it to a snapshot per hour.

There’s only one possible issue I can think of: when adding those settings on the default profile, every container on the cluster will snapshot at the very same time, it might be a good idea to enable some way of delaying (cascading) this event based on the number of containers using the profile.

Cheers!

Whether done through the profile or individually per container, we’re still effectively waking up once per minute and processing them all at that time.

The initial implementation had second granularity which was avoiding that problem, but it was also causing a full process/cpu wake every second, completely destroying the battery life of laptop shipping with LXD (like chromebooks) and having LXD consume a much higher than normal amount of resources even on servers.

I believe we’re currently doing those snapshots sequentially, so things should be too resource intensive when they trigger.

Got it! I’m amazed by the depth of LXD, every day I learn some extra capabilities it brings to LXC, you rock!

After letting the cluster make snapshots every minute for every container for one night (about 18 containers in 3 nodes) I’ve notice some discrepacnies, for example:

| test01 | STOPPED | | | PERSISTENT | 939 | host01 |
±----------------------------------------±--------±-----------------------±-----±-----------±----------±---------+
| test02 | STOPPED | | | PERSISTENT | 944 | host02 |
±----------------------------------------±--------±-----------------------±-----±-----------±----------±---------+
| test31 | STOPPED | | | PERSISTENT | 956 | host01 |
±----------------------------------------±--------±-----------------------±-----±-----------±----------±---------+
| test32 | STOPPED | | | PERSISTENT | 961 | host02 |

All containers started snapshots at the very same time (I aplied the schedule to the default profile), but the number of snapshots differs…

Another issue: As per the docs:
snapshots.expiry string - no snapshot_expiry Controls when snapshots are to be deleted (expects expression like 1M 2H 3d 4w 5m 6y )

So I would guess 1M is one minute and 1m is one month (tested both by applying it to the default profile)
The expected result was: all snapshots older than 1 minute would be deleted.

However nothing happen at all.

Does this mean the snapshot itself is given an expiry date and will live that long even if you change the expiry date in the profile after it was created?

Is it posible to set more than one parameter at a time with lxc profile set?

lxc profile set m5k limits.cpu 1 (works fine)

lxc profile set m5k limits.cpu 1 limits.cpu.priority 0
returns: Error: Invalid key=value configuration: limits.cpu

lxc profile set m5k limits.cpu 1, limits.cpu.priority 0
also returns: Error: Invalid key=value configuration: limits.cpu

limits.cpou=1 limits.cpu.priority=0 should work.

snapshots.expiry indeed sets the default expiry_date for the snapshot when it gets created (if none is provided by the user), existing snapshots will not be altered.

As for the different number of snapshots, that’s interesting. Have all those containers been running the entire time? If so, then look at lxd.log to see if an error got logged.

limits.cpu=1 limits.cpu.priority=0 should work.

Yes it does! Thank you. However, it just weird that it accepts it without “=” on single parameters and only with it on multiple parameters, it’s kind of confusing (it was to me).

snapshots.expiry indeed sets the default expiry_date for the snapshot when it gets created
Undertood.

Question now is, how do I delete all those snapshots from the containers I want to keep?
I tried using:

lxc rm con01/snap*
and
lxc rm con01/*

but it doesn’t like the asterisc
Error: Failed to fetch snapshot “snap*” of instance “con01” in project “default”: No such object