LXD failing to start due to failing snap update

I’m seeing an issue where its seems a snap update has failed and is not succeeding on subsequent tries. The update caused my LXD to shutdown, but snap never restarts it because the update is failing.

$ snap changes
ID   Status  Spawn                   Ready                   Summary
117  Error   yesterday at 16:53 CDT  yesterday at 17:14 CDT  Auto-refresh snap "lxd"
118  Error   yesterday at 19:33 CDT  yesterday at 20:00 CDT  Auto-refresh snap "lxd"
119  Error   today at 00:43 CDT      today at 01:03 CDT      Auto-refresh snap "lxd"
120  Error   today at 11:48 CDT      today at 12:13 CDT      Auto-refresh snaps "core20", "lxd"
121  Error   today at 12:18 CDT      today at 12:39 CDT      Auto-refresh snap "lxd"

Some interesting syslog messages:

Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: Trying to enqueue job snap.lxd.daemon.service/stop/replace
Oct  5 15:44:20 box1 systemd[1]: Added job snap.lxd.daemon.service/stop to transaction.
Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: Installed new job snap.lxd.daemon.service/stop as 7378
Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: Enqueued job snap.lxd.daemon.service/stop as 7378
Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: About to execute: /usr/bin/snap run --command=stop lxd.daemon
Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: Forked /usr/bin/snap as 36905
Oct  5 15:44:20 box1 systemd[36905]: snap.lxd.daemon.service: Executing: /usr/bin/snap run --command=stop lxd.daemon
Oct  5 15:44:20 box1 systemd[1]: snap.lxd.daemon.service: Changed running -> stop
Oct  5 15:44:20 box1 systemd[1]: Stopping Service for snap application lxd.daemon...
Oct  5 15:44:20 box1 lxd.daemon[36905]: WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
Oct  5 15:44:20 box1 lxd.daemon[36905]: => Stop reason is: snap refresh
Oct  5 15:44:20 box1 lxd.daemon[36905]: => Stopping LXD
Oct  5 15:44:21 box1 lxd.daemon[2952]: => LXD exited cleanly
Oct  5 15:44:21 box1 systemd[1]: Received SIGCHLD from PID 2952 (daemon.start).
Oct  5 15:44:21 box1 systemd[1]: Child 2952 (daemon.start) died (code=exited, status=0/SUCCESS)
Oct  5 15:44:21 box1 systemd[1]: snap.lxd.daemon.service: Child 2952 belongs to snap.lxd.daemon.service.
Oct  5 15:44:21 box1 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=0/SUCCESS
Oct  5 15:44:22 box1 lxd.daemon[36905]: ==> Stopped LXD
Oct  5 15:44:22 box1 systemd[1]: systemd-journald.service: Received EPOLLHUP on stored fd 40 (stored), closing.
Oct  5 15:44:22 box1 systemd[1]: Received SIGCHLD from PID 36905 (daemon.stop).
Oct  5 15:44:22 box1 systemd[1]: Child 36905 (daemon.stop) died (code=exited, status=0/SUCCESS)
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Child 36905 belongs to snap.lxd.daemon.service.
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Control process exited, code=exited, status=0/SUCCESS
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Got final SIGCHLD for state stop.
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Succeeded.
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Service restart not allowed.
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Changed stop -> dead
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Job 7378 snap.lxd.daemon.service/stop finished, result=done
Oct  5 15:44:22 box1 systemd[1]: Stopped Service for snap application lxd.daemon.
Oct  5 15:44:22 box1 systemd[1]: snap.lxd.daemon.service: Control group is empty.
Oct  5 15:44:22 box1 systemd[1]: Failed to read pids.max attribute of cgroup root, ignoring: No such file or directory
Oct  5 15:44:22 box1 systemd[1]: Found unit snap.lxd.daemon.service at /etc/systemd/system/snap.lxd.daemon.service (regular file)
Oct  5 15:44:22 box1 systemd[1]: Preset files don't specify rule for snap.lxd.daemon.service. Enabling.
...
Oct  5 15:54:28 box1 snapd[2024]: taskrunner.go:271: [change 113 "Mount snap \"lxd\" (21624)" task] failed: snap-lxd-21624.mount failed to stop: timeout
Oct  5 15:54:33 box1 snapd[2024]: handlers.go:512: Reported install problem for "lxd" as 70612784-261e-11ec-be98-fa163e983629 OOPSID
...
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Failed to load configuration: No such file or directory
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Trying to enqueue job snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start/fail
Oct  5 17:39:09 box1 systemd[3284]: Added job snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start to transaction.
Oct  5 17:39:09 box1 systemd[3284]: Pulling in -.slice/start from snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start
Oct  5 17:39:09 box1 systemd[3284]: Added job -.slice/start to transaction.
Oct  5 17:39:09 box1 systemd[3284]: Pulling in shutdown.target/stop from snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Installed new job snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start as 434
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Enqueued job snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start as 434
Oct  5 17:39:09 box1 systemd[3284]: Failed to read pids.max attribute of cgroup root, ignoring: No such file or directory
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope changed dead -> running
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Job 434 snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope/start finished, result=done
Oct  5 17:39:09 box1 systemd[3284]: Started snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope.
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: cgroup is empty
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Succeeded.
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope changed running -> dead
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Consumed 34ms CPU time.
Oct  5 17:39:09 box1 systemd[3284]: snap.lxd.lxc.2e3841b9-b8e1-467d-b70e-4bc22e49cfdd.scope: Collecting.
...
Oct  5 17:56:52 box1 kernel: [76153.267445] audit: type=1400 audit(1633474612.900:143): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-devel_</var/snap/lxd/common/lxd>" name="/run/systemd/unit-root/var/cache/private/fwupdmgr/" pid=42006 comm="(fwupdmgr)" flags="rw, nosuid, remount, bind"

These seem suspicious:

Oct  5 19:38:00 box1 systemd[1]: var-snap.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: var-snap-lxd.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: var-snap-lxd-21497.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: snap.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: snap-lxd.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: var-lib-snapd-snaps.mount: Failed to load configuration: No such file or directory
Oct  5 19:38:00 box1 systemd[1]: var-lib-snapd-snaps-lxd_21624.snap.mount: Failed to load configuration: No such file or directory

And lots of these in syslog about 4 a second.

Oct  5 15:44:22 box1 systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1/unit/snap_2elxd_2edaemon_2eservice interface=org.freedesktop.DBus.Properties member=GetAll cookie=1 reply_cookie=0 signature=s error-name=n/a error-message=n/a
Oct  5 15:44:22 box1 systemd[1]: Failed to read pids.max attribute of cgroup root, ignoring: No such file or directory

Now all lxc commands error with:

$ sudo lxc list
[sudo] password for user:
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

I’ve also recently enabled cgroups v2 and disabled v1 from hijacking the controllers, but apparently LXD shouldn’t care. It looks like this might be more a snapd issue than LXD, but I’m not sure. I can see that my containers are started as I see for example this process in ps output [lxc monitor] /var/snap/lxd/common/lxd/containers devel. This is probably because they were started before LXD went down. What can I do to get LXD up and running again?

Hi,
You can debug the error to stop the daemon and execute the lxd process by manual and post the output of the error in here.

sudo systemctl stop snap.lxd.daemon
lxd --debug --group lxd

It seems to startup fine. Perhaps this is a snapd issue? And why is it automatically deleting images?

Terminating the LXD process run manually above and then restarting the systemd unit, the unit starts fine and LXD is accessible.

$ sudo lxd --debug --group lxd
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
DBUG[10-08|01:42:17] Connecting to a local LXD over a Unix socket
DBUG[10-08|01:42:17] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[10-08|01:42:17] LXD is starting                          version=4.18 mode=normal path=/var/snap/lxd/common/lxd
INFO[10-08|01:42:17] Kernel uid/gid map:
INFO[10-08|01:42:17]  - u 0 0 4294967295
INFO[10-08|01:42:17]  - g 0 0 4294967295
INFO[10-08|01:42:17] Configured LXD uid/gid map:
INFO[10-08|01:42:17]  - u 0 1000000 1000000000
INFO[10-08|01:42:17]  - g 0 1000000 1000000000
INFO[10-08|01:42:17] Kernel features:
INFO[10-08|01:42:17]  - closing multiple file descriptors efficiently: yes
INFO[10-08|01:42:17]  - netnsid-based network retrieval: yes
INFO[10-08|01:42:17]  - pidfds: yes
INFO[10-08|01:42:17]  - uevent injection: yes
INFO[10-08|01:42:17]  - seccomp listener: yes
INFO[10-08|01:42:17]  - seccomp listener continue syscalls: yes
INFO[10-08|01:42:17]  - seccomp listener add file descriptors: yes
INFO[10-08|01:42:17]  - attach to namespaces via pidfds: yes
INFO[10-08|01:42:17]  - safe native terminal allocation : yes
INFO[10-08|01:42:17]  - unprivileged file capabilities: yes
INFO[10-08|01:42:17]  - cgroup layout: cgroup2
WARN[10-08|01:42:17]  - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored
WARN[10-08|01:42:17]  - Couldn't find the CGroup network priority controller, network priority will be ignored
INFO[10-08|01:42:17]  - shiftfs support: yes
INFO[10-08|01:42:17] Initializing local database
DBUG[10-08|01:42:17] Refreshing local trusted certificate cache
INFO[10-08|01:42:17] Set client certificate to server certificate fingerprint=a0e0caf67ec256f05a90ab6ea80679e442c4fe77d7f90162fc9a1e26d0c9cbec
DBUG[10-08|01:42:17] Initializing database gateway
INFO[10-08|01:42:17] Starting database node                   id=1 address=1 role=voter
WARN[10-08|01:42:17] Failed settting up shared mounts         err="no such file or directory"
DBUG[10-08|01:42:17] Connecting to a local LXD over a Unix socket
DBUG[10-08|01:42:17] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[10-08|01:42:17] Detected stale unix socket, deleting
INFO[10-08|01:42:17] Starting /dev/lxd handler:
INFO[10-08|01:42:17]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-08|01:42:17] REST API daemon:
INFO[10-08|01:42:17]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-08|01:42:17] Initializing global database
DBUG[10-08|01:42:17] Dqlite: attempt 1: server 1: connected
INFO[10-08|01:42:18] Firewall loaded driver                   driver=nftables
INFO[10-08|01:42:18] Initializing storage pools
DBUG[10-08|01:42:18] Initializing and checking storage pool   pool=pool1
DBUG[10-08|01:42:18] Mount started                            driver=btrfs pool=pool1
DBUG[10-08|01:42:18] Mount finished                           driver=btrfs pool=pool1
INFO[10-08|01:42:18] Initializing daemon storage mounts
INFO[10-08|01:42:18] Loading daemon configuration
INFO[10-08|01:42:18] Initializing networks
DBUG[10-08|01:42:18] Start                                    project=default driver=bridge network=brlxd1
DBUG[10-08|01:42:18] Setting up network                       project=default driver=bridge network=brlxd1
DBUG[10-08|01:42:18] Stable MAC generated                     project=default driver=bridge network=brlxd1 hwAddr=00:16:3e:42:f6:33 seed=a0e0caf67ec256f05a90ab6ea80679e442c4fe77d7f90162fc9a1e26d0c9cbec.0.1
DBUG[10-08|01:42:18] Clearing firewall                        project=default driver=bridge network=brlxd1
DBUG[10-08|01:42:19] Setting up firewall                      project=default driver=bridge network=brlxd1
DBUG[10-08|01:42:19] New task Operation: 6002d2c6-d67d-4e42-9b95-754334ed244b
INFO[10-08|01:42:19] Pruning leftover image files
DBUG[10-08|01:42:19] Started task operation: 6002d2c6-d67d-4e42-9b95-754334ed244b
INFO[10-08|01:42:19] Done pruning leftover image files
DBUG[10-08|01:42:19] Initialized inotify with file descriptor 23
DBUG[10-08|01:42:19] Registering running instances
DBUG[10-08|01:42:19] Success for task operation: 6002d2c6-d67d-4e42-9b95-754334ed244b
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    device=root project=default instance=devel driver=disk
DBUG[10-08|01:42:19] MountInstance started                    driver=btrfs pool=pool1 project=default instance=devel
DBUG[10-08|01:42:19] MountInstance finished                   driver=btrfs pool=pool1 project=default instance=devel
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    project=default instance=grub-devel driver=disk device=root
DBUG[10-08|01:42:19] MountInstance started                    driver=btrfs pool=pool1 project=default instance=grub-devel
DBUG[10-08|01:42:19] MountInstance finished                   driver=btrfs pool=pool1 project=default instance=grub-devel
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    driver=disk device=bootloader-src project=default instance=grub-devel
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    project=default instance=grub-devel driver=disk device=grub-src
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    driver=disk device=root project=default instance=grub-devel-debian11
DBUG[10-08|01:42:19] MountInstance started                    driver=btrfs pool=pool1 project=default instance=grub-devel-debian11
DBUG[10-08|01:42:19] MountInstance finished                   driver=btrfs pool=pool1 project=default instance=grub-devel-debian11
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    driver=disk device=bootloader-src project=default instance=grub-devel-debian11
DBUG[10-08|01:42:19] Initialising mounted disk ref counter    instance=grub-devel-debian11 driver=disk device=grub-src project=default
INFO[10-08|01:42:19] Started seccomp handler                  path=/var/snap/lxd/common/lxd/seccomp.socket
DBUG[10-08|01:42:19] Refreshing trusted certificate cache
DBUG[10-08|01:42:19] New task Operation: be7607e8-c6f3-4f80-b123-e63e17ab942a
INFO[10-08|01:42:19] Pruning expired images
DBUG[10-08|01:42:19] Started task operation: be7607e8-c6f3-4f80-b123-e63e17ab942a
INFO[10-08|01:42:19] Done pruning expired images
DBUG[10-08|01:42:19] New task Operation: 5eb288dd-829d-4e98-9b63-e487eed867e2
INFO[10-08|01:42:19] Pruning expired instance backups
DBUG[10-08|01:42:19] Started task operation: 5eb288dd-829d-4e98-9b63-e487eed867e2
INFO[10-08|01:42:19] Done pruning expired instance backups
DBUG[10-08|01:42:19] New task Operation: 3656c138-29ca-434a-9deb-9b7174dee85a
DBUG[10-08|01:42:19] Success for task operation: 5eb288dd-829d-4e98-9b63-e487eed867e2
DBUG[10-08|01:42:19] New task Operation: 90396f4b-ef8b-4515-840c-12bd8679ad1e
INFO[10-08|01:42:19] Updating instance types
DBUG[10-08|01:42:19] Started task operation: 3656c138-29ca-434a-9deb-9b7174dee85a
DBUG[10-08|01:42:19] New task Operation: 07172d5f-5317-41af-891c-d114f47be0b1
INFO[10-08|01:42:19] Pruning resolved warnings
DBUG[10-08|01:42:19] Started task operation: 90396f4b-ef8b-4515-840c-12bd8679ad1e
INFO[10-08|01:42:19] Expiring log files
DBUG[10-08|01:42:19] Started task operation: 07172d5f-5317-41af-891c-d114f47be0b1
INFO[10-08|01:42:19] Done expiring log files
INFO[10-08|01:42:19] Done updating instance types
INFO[10-08|01:42:19] Done pruning resolved warnings
DBUG[10-08|01:42:19] New task Operation: ed7961dc-8671-4b5b-a6c1-3430c8867ca2
DBUG[10-08|01:42:19] DeleteImage started                      driver=btrfs pool=pool1 fingerprint=b858d693a55fcf9b1a656fd13d577104fd1dc6f2f1481f5ca35a08ec25d28a6c
DBUG[10-08|01:42:19] Success for task operation: 90396f4b-ef8b-4515-840c-12bd8679ad1e
INFO[10-08|01:42:19] Updating images
DBUG[10-08|01:42:19] Started task operation: ed7961dc-8671-4b5b-a6c1-3430c8867ca2
DBUG[10-08|01:42:19] Success for task operation: 07172d5f-5317-41af-891c-d114f47be0b1
INFO[10-08|01:42:19] Done updating images
DBUG[10-08|01:42:19] Processing image                         protocol=simplestreams alias=debian/11 fingerprint=aaa432b342c26f8c39fa7aa6f115cc14fd6f26359089ade52fd24f2906328336 server=https://images.linuxcontainers.org
DBUG[10-08|01:42:19] Connecting to a remote simplestreams server URL=https://images.linuxcontainers.org
DBUG[10-08|01:42:20] DeleteImage finished                     driver=btrfs pool=pool1 fingerprint=b858d693a55fcf9b1a656fd13d577104fd1dc6f2f1481f5ca35a08ec25d28a6c
DBUG[10-08|01:42:20] Success for task operation: be7607e8-c6f3-4f80-b123-e63e17ab942a
DBUG[10-08|01:42:33] Acquiring lock for image download of "39ecd099da82c617355262b300ff90c2a85859a2831f0c74a520491070dba1ab"
DBUG[10-08|01:42:33] Lock acquired for image download of "39ecd099da82c617355262b300ff90c2a85859a2831f0c74a520491070dba1ab"
DBUG[10-08|01:42:33] Database error: &errors.errorString{s:"No such object"}
DBUG[10-08|01:42:33] Database error: &errors.errorString{s:"No such object"}
INFO[10-08|01:42:33] Downloading image                        server=https://images.linuxcontainers.org alias=debian/11
DBUG[10-08|01:42:34] Success for task operation: 3656c138-29ca-434a-9deb-9b7174dee85a
DBUG[10-08|01:42:52] Handling                                 method=GET url=/1.0 ip=@ username=crass protocol=unix
DBUG[10-08|01:42:52] Handling                                 username=crass protocol=unix method=GET url="/1.0/instances?recursion=2" ip=@
DBUG[10-08|01:42:52] GetInstanceUsage started                 driver=btrfs pool=pool1 project=default instance=test-tumbleweed
DBUG[10-08|01:42:52] GetInstanceUsage started                 driver=btrfs pool=pool1 project=default instance=keepass2-devel
DBUG[10-08|01:42:52] GetInstanceUsage finished                driver=btrfs pool=pool1 project=default instance=test-tumbleweed
DBUG[10-08|01:42:52] GetInstanceUsage started                 pool=pool1 driver=btrfs project=default instance=devel
DBUG[10-08|01:42:52] GetInstanceUsage finished                driver=btrfs pool=pool1 project=default instance=keepass2-devel
DBUG[10-08|01:42:52] GetInstanceUsage started                 driver=btrfs pool=pool1 project=default instance=grub-devel
DBUG[10-08|01:42:52] GetInstanceUsage finished                pool=pool1 driver=btrfs project=default instance=devel
DBUG[10-08|01:42:52] GetInstanceUsage finished                driver=btrfs pool=pool1 project=default instance=grub-devel
DBUG[10-08|01:42:52] GetInstanceUsage started                 driver=btrfs pool=pool1 project=default instance=grub-devel-debian11
DBUG[10-08|01:42:52] GetInstanceUsage finished                driver=btrfs pool=pool1 project=default instance=grub-devel-debian11
DBUG[10-08|01:43:29] EnsureImage started                      driver=btrfs pool=pool1 fingerprint=39ecd099da82c617355262b300ff90c2a85859a2831f0c74a520491070dba1ab
DBUG[10-08|01:43:29] Database error: &errors.errorString{s:"No such object"}
DBUG[10-08|01:43:29] Running filler function                  driver=btrfs pool=pool1 dev= path=/var/snap/lxd/common/lxd/storage-pools/pool1/images/39ecd099da82c617355262b300ff90c2a85859a2831f0c74a520491070dba1ab
DBUG[10-08|01:43:34] Database error: &errors.errorString{s:"No such object"}
DBUG[10-08|01:43:34] EnsureImage finished                     driver=btrfs pool=pool1 fingerprint=39ecd099da82c617355262b300ff90c2a85859a2831f0c74a520491070dba1ab
INFO[10-08|01:43:34] Image downloaded                         alias=debian/11 server=https://images.linuxcontainers.org
DBUG[10-08|01:43:34] DeleteImage started                      driver=btrfs pool=pool1 fingerprint=aaa432b342c26f8c39fa7aa6f115cc14fd6f26359089ade52fd24f2906328336
DBUG[10-08|01:43:34] DeleteImage finished                     driver=btrfs pool=pool1 fingerprint=aaa432b342c26f8c39fa7aa6f115cc14fd6f26359089ade52fd24f2906328336
DBUG[10-08|01:43:34] Database error: &errors.errorString{s:"No such object"}
EROR[10-08|01:43:34] Failed to get image                      err="No such object" project=default fingerprint=b858d693a55fcf9b1a656fd13d577104fd1dc6f2f1481f5ca35a08ec25d28a6c
DBUG[10-08|01:43:34] Success for task operation: ed7961dc-8671-4b5b-a6c1-3430c8867ca2

May be, related with image interval.
Have a look at it.
https://linuxcontainers.org/lxd/docs/master/image-handling

LXD deletes old images once it has updated them to the latest version.

Any ideas how to interpret these snap errors @stgraber ?

The snap mount failure is quite suspicious and likely the problem. The pids.max errors are a bit odd but they seem to just be warnings.

We test LXD on cgroup2 systems and it usually works fine, so it could be systemd having gotten very confused about things or an issue with snapd.

Normally when in such state, the easiest way to get a somewhat reasonable state back would be a reboot, then see if things are still failing when systemd and snapd are both in a clean state.

Well, I really disfavor rebooting, that’s a Windows™ solution. I have been known to do it, but there generally has to be a real need. So I might depending on how bad this gets.

Looks like snap shutdown down again (to replace it?) and failed to bring it back up.

I got a bunch of these messages

Oct  8 03:29:03 box1 systemd[1]: Got message type=method_call se
nder=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1/uni
t/snap_2elxd_2edaemon_2eunix_2esocket interface=org.freedesktop.DBus.Properties 
member=GetAll cookie=1 reply_cookie=0 signature=s error-name=n/a error-message=n
/a
Oct  8 03:29:03 box1 systemd[1]: Failed to read pids.max attribu
te of cgroup root, ignoring: No such file or directory
Oct  8 03:29:03 box1 systemd[1]: Sent message type=method_return
 sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a member=n
/a cookie=1 reply_cookie=1 signature=a{sv} error-name=n/a error-message=n/a
Oct  8 03:29:03 box1 systemd[1]: Bus private-bus-connection: cha
nging state RUNNING → CLOSING
Oct  8 03:29:03 box1 systemd[1]: Bus private-bus-connection: cha
nging state CLOSING → CLOSED
Oct  8 03:29:03 box1 systemd[1]: Got disconnect on private conne
ction.
Oct  8 03:29:03 box1 systemd[1]: Bus private-bus-connection: cha
nging state UNSET → OPENING
Oct  8 03:29:03 box1 systemd[1]: Bus private-bus-connection: cha
nging state OPENING → AUTHENTICATING
Oct  8 03:29:03 box1 systemd[1]: Accepted new private connection.
Oct  8 03:29:03 box1 systemd[1]: Bus private-bus-connection: changing state AUTHENTICATING → RUNNING

And then this, which it when lxd was shutdown.

Oct  8 03:29:03 box1 systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=StopUnit cookie=1 reply_cookie=0 signature=ss error-name=n/a error-message=n/a
Oct  8 03:29:03 box1 systemd[1]: snap.lxd.daemon.service: Trying to enqueue job snap.lxd.daemon.service/stop/replace
Oct  8 03:29:03 box1 systemd[1]: Added job snap.lxd.daemon.service/stop to transaction.
Oct  8 03:29:03 box1 systemd[1]: snap.lxd.daemon.service: Installed new job snap.lxd.daemon.service/stop as 21780
Oct  8 03:29:03 box1 systemd[1]: snap.lxd.daemon.service: Enqueued job snap.lxd.daemon.service/stop as 21780

After which the first messages which had been happening for 10 minutes, then stopped. That was a couple hours after I restarted lxd service manually. I’ve noticed that the snap.lxd.daemon.unix.socket service was inactive, so I actived it and start the lxd service manually again. I’m thinking the lxd or snap service was trying to connect to the unix socket and failed too often, so it shutdown lxd.

I’ve got them both on and then I did a snap refresh to manually trigger an update. Snap downloaded a new update, but stalled with the message 2021-10-09T15:32:37-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.

The systemctl status snap.lxd.daemon.unix.socket shows it as dead, but when I try to restart it, it fails. In the journal I get these messages:

Oct 09 15:38:22 box1 systemd[1]: snap.lxd.daemon.unix.socket: Socket service snap.lxd.daemon.service already active, refusing.
Oct 09 15:38:22 box1 systemd[1]: Failed to listen on Socket unix for snap application lxd.daemon.

And snap is still waiting on snap.lxd.daemon.service to stop…

Looks like I didn’t have a choice about the reboot. When the system came back up, lxd is running apparently fine. Btw, I’m on revision 21497 and snap wants to upgrade to 21624, current stable.

However, when I do snap refresh, I get similar behavior as above. WIth output:

2021-10-09T22:04:13-05:00 INFO Waiting for snap.lxd.activate.service to stop.
2021-10-09T22:04:32-05:00 INFO Waiting for snap.lxd.activate.service to stop.
2021-10-09T22:04:42-05:00 INFO snap.lxd.activate.service refused to stop,
killing.
2021-10-09T22:04:48-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
2021-10-09T22:05:07-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
...
2021-10-09T22:14:27-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
2021-10-09T22:14:50-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
...
2021-10-09T22:24:30-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-09T22:24:50-05:00 INFO snap.lxd.daemon.service refused to stop, killing.
error: cannot perform the following tasks:
- Mount snap "lxd" (21624) (snap-lxd-21624.mount failed to stop: timeout)
- Stop snap "lxd" services (snap.lxd.daemon.unix.socket failed to stop: timeout)

Here’s a syslog message from when I started the snap refresh:

Oct  9 22:04:47 box1 systemd[1]: snap.lxd.daemon.unix.socket: Su
cceeded.
Oct  9 22:04:47 box1 systemd[1]: Closed Socket unix for snap application lxd.daemon.
...
Oct  9 22:20:05 box1 snapd[2055]: snap.go:182: cannot get full a
pp info for snap "lxd": cannot get status of services of app "activate": cannot 
get unit status: bad line "Bus n/a: changing state UNSET → OPENING" in ‘systemct
l show’ output
...
Oct  9 22:24:55 box1 snapd[2055]: taskrunner.go:271: [change 133 "Stop snap \"lxd\" services" task] failed: snap.lxd.daemon.unix.socket failed to stop: timeout
Oct  9 22:24:56 box1 snapd[2055]: taskrunner.go:271: [change 133
 "Mount snap \"lxd\" (21624)" task] failed: snap-lxd-21624.mount failed to stop:
 timeout
Oct  9 22:24:56 box1 snapd[2055]: handlers.go:512: Reported inst
all problem for "lxd" as already-reported

Looks like some unexpected output there, but that happens after waiting a long time, so I don’t think that is the cause of the issue. It looks like the unix socket is closed, but snap isn’t recognizing that. And if that socket is for communicating with lxd, why is it shutdown first? I would think snap might need it to communicate to lxd that it needs to be shutdown.

Another thing that seems perhaps odd, is it seems I’ve got two version of the snap installed. I’m guessing its fine and for the case where I want to revert to a previous version. I see messages like:

Oct  9 21:56:00 box1 systemd[2583]: Remounted /run/systemd/unit-root/snap/lxd/21390 read-only.

This is an odd message that happened at boot, but it didn’t prevent lxd from starting. That’s the new revision whose update is failing, so perhaps its related to a partial install.

Oct  9 21:55:57 box1 kernel: [   12.674227] /var/lib/snapd/snaps/lxd_21624.snap: Can't open blockdev

Seeing these regularly, not really correlated with anything:

Oct  9 21:55:58 box1 systemd[2260]: snap-lxd.mount: Failed to load configuration: No such file or directory

It seems this is a reproducible issue for me (ie reboot doesn’t help). I don’t think its related to cgroups v2. Could it be the revision has an issue with its systemd scripts?

And why is it that when snap.lxd.daemon.unix.socket and snap.lxd.daemon.service are both inactive, that snap refresh still stalls on Waiting for snap.lxd.daemon.unix.socket to stop.? Shouldn’t it see that lxd is stopped and do the upgrade? Here’s that output:

$ snap refresh
2021-10-10T00:05:41-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
2021-10-10T00:06:01-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
2021-10-10T00:06:21-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
2021-10-10T00:06:41-05:00 INFO Waiting for snap.lxd.daemon.unix.socket to stop.
Stop snap "lxd" services                                                       /error: cannot perform the following tasks:
- Mount snap "lxd" (21624) (snap-lxd-21624.mount failed to stop: timeout)
- Stop snap "lxd" services (snap.lxd.daemon.unix.socket failed to stop: timeout)

Why is it trying to stop 21624? That revision hasn’t successfully been installed yet. Perhaps there’s been a partial install that’s messing things up. Do I need to remove that systemd mount unit?

Does that revision show up in snap list --all?
snapd by default keeps two revision of the snap (current and a previous one).

Yes, it does and under notes it says disabled. So that seems fine.

Here’s some snap output on what snap is doing during refresh for one of the failed refresh jobs:

$ snap change 137
Status  Spawn               Ready               Summary
Done    today at 10:01 CDT  today at 10:22 CDT  Ensure prerequisites for "lxd" are available
Undone  today at 10:01 CDT  today at 10:22 CDT  Download snap "lxd" (21624) from channel "latest/stable"
Done    today at 10:01 CDT  today at 10:22 CDT  Fetch and check assertions for snap "lxd" (21624)
Error   today at 10:01 CDT  today at 10:22 CDT  Mount snap "lxd" (21624)
Undone  today at 10:01 CDT  today at 10:22 CDT  Run pre-refresh hook of "lxd" snap if present
Error   today at 10:01 CDT  today at 10:22 CDT  Stop snap "lxd" services
Hold    today at 10:01 CDT  today at 10:22 CDT  Remove aliases for snap "lxd"
Hold    today at 10:01 CDT  today at 10:22 CDT  Make current revision for snap "lxd" unavailable
Hold    today at 10:01 CDT  today at 10:22 CDT  Copy snap "lxd" data
Hold    today at 10:01 CDT  today at 10:22 CDT  Setup snap "lxd" (21624) security profiles
Hold    today at 10:01 CDT  today at 10:22 CDT  Make snap "lxd" (21624) available to the system
Hold    today at 10:01 CDT  today at 10:22 CDT  Automatically connect eligible plugs and slots of snap "lxd"  
Hold    today at 10:01 CDT  today at 10:22 CDT  Set automatic aliases for snap "lxd"
Hold    today at 10:01 CDT  today at 10:22 CDT  Setup snap "lxd" aliases
Hold    today at 10:01 CDT  today at 10:22 CDT  Run post-refresh hook of "lxd" snap if present
Hold    today at 10:01 CDT  today at 10:22 CDT  Start snap "lxd" (21624) services
Hold    today at 10:01 CDT  today at 10:22 CDT  Remove data for snap "lxd" (21390)
Hold    today at 10:01 CDT  today at 10:22 CDT  Remove snap "lxd" (21390) from the system
Hold    today at 10:01 CDT  today at 10:22 CDT  Clean up "lxd" (21624) install
Hold    today at 10:01 CDT  today at 10:22 CDT  Run configure hook of "lxd" snap if present
Hold    today at 10:01 CDT  today at 10:22 CDT  Run health check of "lxd" snap
Done    today at 10:01 CDT  today at 10:22 CDT  Consider re-refresh of "lxd"

......................................................................
Mount snap "lxd" (21624)

2021-10-10T10:22:22-05:00 INFO Waiting for snap-lxd-21624.mount to stop.
2021-10-10T10:22:23-05:00 ERROR snap-lxd-21624.mount failed to stop: timeout

......................................................................
Stop snap "lxd" services

2021-10-10T10:19:37-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:19:57-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:20:17-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:20:37-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:20:57-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:21:17-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:21:37-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:21:57-05:00 INFO Waiting for snap.lxd.daemon.service to stop.
2021-10-10T10:22:17-05:00 INFO snap.lxd.daemon.service refused to stop, killing.
2021-10-10T10:22:22-05:00 ERROR snap.lxd.daemon.unix.socket failed to stop: timeout

May be the root cause is snap-lxd-21624.mount failing to mount?

What’s odd is that I see messages indicating success, even though it fails:

Oct 10 10:01:41 box1 systemd[1]: Mounting Mount unit for lxd, re
vision 21624...
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: Changed d
ead -> tentative
Oct 10 10:01:41 box1 systemd[3462]: snap.mount: Failed to load c
onfiguration: No such file or directory
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd.mount: Failed to lo
ad configuration: No such file or directory
Oct 10 10:01:41 box1 systemd[3462]: blockdev@dev-loop23.target: 
Failed to load configuration: No such file or directory
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: Change
d dead -> mounted
Oct 10 10:01:41 box1 systemd[3462]: snap.mount: Collecting.
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd.mount: Collecting.
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: lost dependency Wants=dev-loop23.device
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: lost dependency WantedBy=snap-lxd-21624.mount
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: lost dependency Before=snap-lxd-21624.mount
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: lost dependency ReferencedBy=snap-lxd-21624.mount
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: lost dependency After=dev-loop23.device
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: lost dependency After=blockdev@dev-loop23.target
Oct 10 10:01:41 box1 systemd[3462]: blockdev@dev-loop23.target: lost dependency Before=snap-lxd-21624.mount
Oct 10 10:01:41 box1 systemd[3462]: blockdev@dev-loop23.target: lost dependency ReferencedBy=snap-lxd-21624.mount
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: lost dependency References=dev-loop23.device
Oct 10 10:01:41 box1 systemd[3462]: snap-lxd-21624.mount: lost dependency References=blockdev@dev-loop23.target
Oct 10 10:01:41 box1 systemd[1]: Mounted Mount unit for lxd, revision 21624.
Oct 10 10:01:41 box1 systemd[3462]: Added job dev-loop23.device/nop to transaction.
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: Installed new job dev-loop23.device/nop as 466
Oct 10 10:01:41 box1 systemd[3462]: sys-devices-virtual-block-loop23.device: Changed dead -> plugged
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: Changed tentative -> plugged
Oct 10 10:01:41 box1 systemd[3462]: dev-loop23.device: Job 466 dev-loop23.device/nop finished, result=done

I seem to be running into this same condition with snap failing to upgrade lxd (21780). Did you ever resolve it?

Nope, failing on upgrade to 21780 also. What’s even more annoying is that I told snap not to attempt an upgrade for 2 months, but it seems to have disregarded that when 21780 came out. This causes all my containers to become inaccessible because lxd is shutdown and never comes back up (yes, I could setup ssh to get around this). I wish LXD weren’t tied to snap as it is…

Do you have kernel debug on? I found this after I posted here and it seems to have fixed the issue for me:

Wow, thank you! I have not had the time to look at this deeply yet, but it looks highly probably that this is the culprit. I just recently added debug to the kernel command line and this failure coincides with it.

What’s a little confusing is that it seems its an issue with systemd having log_level=debug, which is set that way when debug appears on the kernel command line. However, I’ve set systemd back to info using systemctl log-level info, which I would think might get rid of the issue. Regardless, I think this is the missing part to solving this.

Ok, I finally found decent solution that allows me to keep the debug kernel parameter and for snap refresh lxd to continue working. Systemd recognizes the systemd.log_level kernel parameter, so I put systemd.log_level=info on the command line to effectively have systemd ignore the debug kernel parameter. One thing to note though, this parameter MUST come after the debug parameter. It seems as though systemd goes through parameters from left to right, so the last one will override any previous parameters affecting the log level.