LXD stopped taking containers snapshots

Hi

I have lxd running on Ubuntu 21.04 with zfs. Since 4/13/21 no container snapshots are being taken. My lxd is a snap package, which updates automatically. Was there any change in the last update that affected snapshot taking. Currently my snapshot setup is as follows:

  snapshots.expiry: 2w
  snapshots.pattern: arch-snapshot-%d
  snapshots.schedule: 0 6,12-22/2 * * *
  snapshots.schedule.stopped: "no"

Any Ideas how to restore snapshot taking?
I can take manual snapshots with the command “lxc snapshot” The scheduled snapshots do not work.

Anything in lxd.log that would point to what’s wrong with the snapshot scheduling?

I’ve checked on mine and my daily snapshots are still happening but I don’t have as complex a pattern as yours.

Looking at the lxd.log
I am just getting messages about pruning old snapshots not about creating ones as follows:

t=2021-04-21T19:13:21-0700 lvl=info msg="Started container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T18:13:43-0700
t=2021-04-21T19:22:31-0700 lvl=info msg="Shut down container" action=stop created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:13:21-0700
t=2021-04-21T19:22:35-0700 lvl=info msg="Starting container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:13:21-0700
t=2021-04-21T19:22:35-0700 lvl=info msg="Started container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:13:21-0700
t=2021-04-21T19:24:14-0700 lvl=info msg="Shut down container" action=stop created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:22:35-0700
t=2021-04-21T19:31:59-0700 lvl=info msg="Pruning expired instance backups" 
t=2021-04-21T19:31:59-0700 lvl=info msg="Updating images" 
t=2021-04-21T19:31:59-0700 lvl=info msg="Done pruning expired instance backups" 
t=2021-04-21T19:31:59-0700 lvl=info msg="Done updating images" 
t=2021-04-21T20:00:25-0700 lvl=info msg="Pruning expired instance snapshots" 
t=2021-04-21T20:00:25-0700 lvl=info msg="Deleting container" created=2021-04-07T20:00:06-0700 ephemeral=false instance=arch/arch-snapshot-130 instanceType=container project=default used=0001-01-01T00:00:00+0000
t=2021-04-21T20:00:25-0700 lvl=info msg="Done pruning expired instance snapshots" 
t=2021-04-21T20:00:25-0700 lvl=info msg="Deleted container" created=2021-04-07T20:00:06-0700 ephemeral=false instance=arch/arch-snapshot-130 instanceType=container project=default used=0001-01-01T00:00:00+0000
t=2021-04-21T20:31:59-0700 lvl=info msg="Updating images" 
t=2021-04-21T20:31:59-0700 lvl=info msg="Pruning expired instance backups" 
t=2021-04-21T20:31:59-0700 lvl=info msg="Done updating images" 
t=2021-04-21T20:31:59-0700 lvl=info msg="Done pruning expired instance backups" 
t=2021-04-21T21:31:59-0700 lvl=info msg="Pruning expired instance backups" 
t=2021-04-21T21:31:59-0700 lvl=info msg="Updating images" 
t=2021-04-21T21:31:59-0700 lvl=info msg="Done updating images" 
t=2021-04-21T21:31:59-0700 lvl=info msg="Done pruning expired instance backups" 
t=2021-04-21T21:50:39-0700 lvl=info msg="Starting container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:22:35-0700
t=2021-04-21T21:50:40-0700 lvl=info msg="Started container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T19:22:35-0700
t=2021-04-21T21:55:14-0700 lvl=info msg="Starting container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T21:50:39-0700
t=2021-04-21T21:55:15-0700 lvl=info msg="Started container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T21:50:39-0700
t=2021-04-21T22:08:05-0700 lvl=info msg="Starting container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T21:55:14-0700
t=2021-04-21T22:08:05-0700 lvl=info msg="Started container" action=start created=2021-04-19T11:49:05-0700 ephemeral=false instance=manjaro instanceType=container project=default stateful=false used=2021-04-21T21:55:14-0700
t=2021-04-22T07:10:42-0700 lvl=info msg="Updating images" 
t=2021-04-22T07:10:42-0700 lvl=info msg="Pruning expired instance backups" 
t=2021-04-22T07:10:42-0700 lvl=info msg="Done updating images" 
t=2021-04-22T07:10:42-0700 lvl=info msg="Done pruning expired instance backups" 

Where is the script for generating the snapshots may be I can run manually and see what is the issue.

Ok, can you run lxc monitor --type=logging --pretty in a terminal and let it run until at least the next scheduled snapshot should happen? That may give us some more details on what’s going on.

LXD doesn’t run scripts to do this kind of stuff, we have an internal scheduler which interprets those cron patterns and then triggers the snapshot when needed. I suspect we may have broken something when adding the new alias logic.

My gut feeling is that the comma in the pattern is the issue but maybe I’m wrong :slight_smile:

at 6pm a snapshot of the arch container was supposed to be created. It deleted on old snapshot, but no new snapshot was created. Here is The log.

DBUG[04-22|18:01:19] New task Operation: 9169cae9-2207-4b89-bf10-88021289c8d3 
INFO[04-22|18:01:19] Pruning expired instance snapshots 
DBUG[04-22|18:01:19] Started task operation: 9169cae9-2207-4b89-bf10-88021289c8d3 
INFO[04-22|18:01:19] Deleting container                       used="0001-01-01 00:00:00 +0000 UTC" created="2021-04-08 18:00:41.181594745 -0700 PDT" ephemeral=false instance=arch/arch-snapshot-131 instanceType=container project=default
INFO[04-22|18:01:19] Done pruning expired instance snapshots 
DBUG[04-22|18:01:19] DeleteInstanceSnapshot started           pool=lxd project=default driver=zfs instance=arch/arch-snapshot-131
DBUG[04-22|18:01:19] Deleting instance snapshot volume        pool=lxd project=default snapshotName=arch-snapshot-131 volName=arch driver=zfs instance=arch/arch-snapshot-131
DBUG[04-22|18:01:19] DeleteInstanceSnapshot finished          driver=zfs instance=arch/arch-snapshot-131 pool=lxd project=default
DBUG[04-22|18:01:19] UpdateInstanceBackupFile started         instance=arch pool=lxd project=default driver=zfs
DBUG[04-22|18:01:19] Skipping unmount as in use               driver=zfs pool=lxd refCount=1
DBUG[04-22|18:01:19] Success for task operation: 9169cae9-2207-4b89-bf10-88021289c8d3 
DBUG[04-22|18:01:19] UpdateInstanceBackupFile finished        driver=zfs instance=arch pool=lxd project=default
INFO[04-22|18:01:19] Deleted container                        created="2021-04-08 18:00:41.181594745 -0700 PDT" ephemeral=false instance=arch/arch-snapshot-131 instanceType=container project=default used="0001-01-01 00:00:00 +0000 UTC"

I have a temporary workaround
I changed

snapshots.schedule: 0 6,12-22/2 * * *
to
snapshots.schedule: ‘@hourly

Now I am getting snapshots. There is an issue with the old syntax.

stgraber@castiana:~$ lxc list a -c n,snapshots.schedule,S
+------+--------------------+-----------+
| NAME | SNAPSHOTS SCHEDULE | SNAPSHOTS |
+------+--------------------+-----------+
| a1   | 0 6,12-22/2 * * *  | 0         |
+------+--------------------+-----------+
| a2   | 0 6-22/2 * * *     | 0         |
+------+--------------------+-----------+
| a3   | 0 6,12-22 * * *    | 0         |
+------+--------------------+-----------+
| a4   | 0 6-22 * * *       | 0         |
+------+--------------------+-----------+
| a5   | 0 6,7,8,9,10 * * * | 0         |
+------+--------------------+-----------+

Running this here, let’s see if any of them trigger here.

Confirmed that the comma is the issue, it’s breaking the cron pattern.

https://github.com/lxc/lxd/pull/8711 should fix your issue and also properly allow for multiple schedules.