We seem to be running into a possible filesystem race condition when creating snapshots of running LXD containers. To reproduce, I started a container then ran two concurrent jobs in it to create and delete files:
$ while true; do for i in $(seq 1 100); do rm -f file$i; done; done &
$ while true; do for i in $(seq 1 100); do touch file$i; done; done &
While that’s running, create a (stateless) snapshot as usual. Occasionally, it’ll fail with the following message:
$ lxc snapshot test snap
Error: Create instance snapshot: lstat /var/lib/lxd/storage-pools/default/containers/test/rootfs/root/file23: no such file or directory
I’ve reproduced this on the 5.0.2 LTS and we’ve also run into it on the older 4.0.7 LTS as well. We’re using btrfs storage, and the underlying btrfs snapshot is created successfully, which prevents creating a snapshot of the same name unless it is cleaned up manually.
Stopping the container is a sufficient workaround for us, and retrying may also be an option - just wanted to bring some awareness in case this isn’t the intended behaviour.