Running LXD as a long running service is instable

We run LXD on a server as a service and use it to provision LXD containers (~3k/month).
And from time to time the LXD socket becomes unresponsive.

This shows as all commands via LXC fail due to a timeout.

Just a simple snap restart LXD is not enough (also it takes a long time and sometimes we need to help it by killing all LXD related processes).

One thing I observed is that $ /usr/bin/snap run --debug-log lxd.activate fails with
cannot send command 1 to helper process: Broken pipe (full log at the end)

A full restart of the system seems ā€˜solve’ the issue.
Any idea what the problem can be here?
Which logs should I collect next time this happens? (snap log does not seem to have anything useful)

FULL LOG

$ /usr/bin/snap run --debug-log lxd.activate
2022/08/26 09:29:08.829152 cmd_run.go:486: DEBUG: enabled debug logging of early snap startup
2022/08/26 09:29:08.830069 cmd_run.go:1035: DEBUG: executing snap-confine from /snap/snapd/16292/usr/lib/snapd/snap-confine
2022/08/26 09:29:08.830843 cmd_run.go:438: DEBUG: SELinux not enabled
2022/08/26 09:29:08.830942 cmd_run.go:1204: DEBUG: service app not tracked by systemd
2022/08/26 09:29:08.830960 tracking.go:46: DEBUG: creating transient scope snap.lxd.activate
2022/08/26 09:29:08.830990 tracking.go:189: DEBUG: session bus is not available: cannot find session bus
2022/08/26 09:29:08.831001 tracking.go:191: DEBUG: falling back to system bus
2022/08/26 09:29:08.832345 tracking.go:196: DEBUG: using system bus now, session bus was not available
2022/08/26 09:29:08.835032 tracking.go:319: DEBUG: create transient scope job: /org/freedesktop/systemd1/job/1947357
2022/08/26 09:29:08.975167 tracking.go:419: DEBUG: job result is "done"
2022/08/26 09:29:08.975201 tracking.go:426: DEBUG: transient scope snap.lxd.activate.d6cca01f-9b43-4292-b745-2002ee0c7c30.scope created
2022/08/26 09:29:08.975723 tracking.go:146: DEBUG: waited 143.250548ms for tracking
DEBUG: umask reset, old umask was  022
DEBUG: security tag: snap.lxd.activate
DEBUG: executable:   /usr/lib/snapd/snap-exec
DEBUG: confinement:  non-classic
DEBUG: base snap:    core20
DEBUG: ruid: 0, euid: 0, suid: 0
DEBUG: rgid: 0, egid: 0, sgid: 0
DEBUG: apparmor label on snap-confine is: /snap/snapd/16292/usr/lib/snapd/snap-confine
DEBUG: apparmor mode is: enforce
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock file: /run/snapd/lock/.lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: acquiring exclusive lock (scope (global), uid 0)
DEBUG: sanity timeout reset and disabled
DEBUG: ensuring that snap mount directory is shared
DEBUG: unsharing snap namespace directory
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: releasing lock 5
DEBUG: opened snap-update-ns executable as file descriptor 5
DEBUG: opened snap-discard-ns executable as file descriptor 6
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock file: /run/snapd/lock/lxd.lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: acquiring exclusive lock (scope lxd, uid 0)
DEBUG: sanity timeout reset and disabled
DEBUG: initializing mount namespace: lxd
DEBUG: setting up device cgroup
DEBUG: libudev has current tags support
DEBUG: no devices tagged with snap_lxd_activate, skipping device cgroup setup
DEBUG: forked support process 2971122
DEBUG: unsharing the mount namespace (per-snap)
DEBUG: changing apparmor hat to mount-namespace-capture-helper
DEBUG: helper process waiting for command
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: scratch directory for constructing namespace: /tmp/snap.rootfs_kkBzn4
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: pivot_root /tmp/snap.rootfs_kkBzn4 /tmp/snap.rootfs_kkBzn4//var/lib/snapd/hostfs
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: rmdir /var/lib/snapd/hostfs//tmp/snap.rootfs_kkBzn4
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: calling snapd tool snap-update-ns
DEBUG: waiting for snapd tool snap-update-ns to terminate
DEBUG: requesting changing of apparmor profile on next exec to snap-update-ns.lxd
common.go:60: DEBUG: locking mount namespace of snap "lxd"
common.go:81: DEBUG: freezing processes of snap "lxd"
freezer.go:197: DEBUG: freeze, skipping own group /sys/fs/cgroup/system.slice/snap.lxd.activate.d6cca01f-9b43-4292-b745-2002ee0c7c30.scope
change.go:503: DEBUG: desired mount entries
change.go:503: DEBUG: desired mount entries (sorted)
change.go:578: DEBUG: desiredIDs: map[]
change.go:579: DEBUG: reuse: map[]
change.go:636: DEBUG: processing mount entries
change.go:503: DEBUG: mount entries ordered as they will be applied
common.go:89: DEBUG: unlocking mount namespace of snap "lxd"
common.go:91: DEBUG: thawing processes of snap "lxd"
sanity timeout expired: Interrupted system call
DEBUG: snap-update-ns finished successfully
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: saved mount namespace meta-data to /run/snapd/ns/snap.lxd.info
DEBUG: sending command 1 to helper process (pid: 2971122)
cannot send command 1 to helper process: Broken pipe

What LXD version please?

The contents of /var/snap/lxd/common/lxd/logs/lxd.log should be useful too when it happens.

$ snap list
Name    Version      Rev    Tracking       Publisher   Notes
core20  20220805     1611   latest/stable  canonicalāœ“  base
lxd     5.2-79c3c3b  23155  5.2/stable     canonicalāœ“  -

I will check next time right now the log is full with

time="2022-08-26T14:53:48+02:00" level=error msg="Failed closing listener connection" err="close unix /var/snap/lxd/common/lxd/unix.socket->@: use of closed network connection" listener=81385d15-1a08-415d-ba6c-330bad2214c4

Maybe thats not a good sign in the first place.

Please can you upgrade to LXD 5.5 (which is supported unlike 5.2) and check if you still have problems.

I put it on snap refresh lxd --channel=latest/stable which gives me 5.4. But tbh I don’t think it matters too much im a fairly certain we seen the exact same issue on systems where we tracked latest/stable.

You could try it on latest/candidate which has LXD 5.5 as that will be going to stable shortly.

The reason for that is that error ā€œFailed closing listener connectionā€ no longer exists in LXD, so would be good to get a fresh error condition.

2 Likes