Transient Error -- Error: LXD is shutting down

It might just be a coincidence but I’ve just started using the security.nesting=true workaround to get ArchLinux containers to work correctly on Ubuntu (see No IPv4 on unprivileged Arch container) and after a while I’m getting the following intermittent error when I try and execute an lxc command:

Error: LXD is shutting down

In one case I have a CI script that fires up a container and then uses lxd file push + lxc exec four times to test some scripts inside the container and then it occasionally fails with the error on the lxc stop at the end during cleanup.

In another case I have a local container which I have been jumping in and out of with lxc exec to test some script error handling and after a handful of execs I got the error.

I’ve not managed to directly repro this by just running lots of lxc commands in short succession, but the CI build only lasts 30 mins and is a fresh container each time. Likewise the local container was up overnight but within about the same time frame (purely anecdotal, sorry) the error appeared.

All three machines I’ve seen this on are:

$ lxd --version
$ lsb_release -d
Description:    Ubuntu 18.04.5 LTS

This feels a bit woolly so is there something else I can do or run when it appears to help unearth what might be going on?

Note: I don’t remember seeing this error before in the last 12 months since starting to use LXD and googling didn’t turn anything up either which surprised me.

Can you enable debug logging and see what the logs say when you get that error:

sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log

It does indeed appear to be purely a coincidence and I’m not sure how I’ve not run into this before. After doing some more digging I ran snap changes on my own machine and it said:

ID   Status  Spawn               Ready               Summary
252  Done    today at 10:26 GMT  today at 10:31 GMT  Auto-refresh snap "lxd"

I went and checked the other machines and they all had a “snap refresh” during the time when the CI build failed. Hence I’m guessing it was simply bad luck that I managed to be running an lxc command during the refresh on multiple machines!

LXD is pretty much the only thing I’m using snap for at the moment so I guess I need to do some reading :slightly_smiling_face: . Sorry for the disturbance.

You’ll get that error for up to 5 minutes when LXD is being updated.
Basically that’s a 5min period for any client (lxc exec mostly) to disconnect before LXD forcefully disconnects everything.

During that time you can do lxc operation list to get an idea of what’s going on, once the list is empty, LXD will restart.

1 Like