Running LXD container snapshot - race condition

We seem to be running into a possible filesystem race condition when creating snapshots of running LXD containers. To reproduce, I started a container then ran two concurrent jobs in it to create and delete files:

$ while true; do for i in $(seq 1 100); do rm -f file$i; done; done &
$ while true; do for i in $(seq 1 100); do touch file$i; done; done &

While that’s running, create a (stateless) snapshot as usual. Occasionally, it’ll fail with the following message:

$ lxc snapshot test snap
Error: Create instance snapshot: lstat /var/lib/lxd/storage-pools/default/containers/test/rootfs/root/file23: no such file or directory

I’ve reproduced this on the 5.0.2 LTS and we’ve also run into it on the older 4.0.7 LTS as well. We’re using btrfs storage, and the underlying btrfs snapshot is created successfully, which prevents creating a snapshot of the same name unless it is cleaned up manually.

Stopping the container is a sufficient workaround for us, and retrying may also be an option - just wanted to bring some awareness in case this isn’t the intended behaviour.

1 Like

I have encountered the same issue, thanks for explaining it :slight_smile:

1 Like

Are you able to reproduce this using latest/edge channel?

Please note you should not upgrade your existing system to this channel, but instead use a fresh system that can be destroyed afterwards

sudo snap refresh lxd  --channel=latest/edge

I’m able to reproduce on the following revision:

$ sudo snap list lxd
Name  Version      Rev    Tracking     Publisher   Notes
lxd   git-49b9c78  24817  latest/edge  canonical✓  -

Specifically with the btrfs driver. I decided to give it a try with zfs and it didn’t reproduce, but with btrfs it took only a few seconds to fail when creating and deleting snapshots in a loop.

Thanks for confirming.

Please can you open an issue here https://github.com/lxc/lxd/issues

Done, https://github.com/lxc/lxd/issues/11682

1 Like