Running LXD as a long running service is instable

We run LXD on a server as a service and use it to provision LXD containers (~3k/month).
And from time to time the LXD socket becomes unresponsive.

This shows as all commands via LXC fail due to a timeout.

Just a simple snap restart LXD is not enough (also it takes a long time and sometimes we need to help it by killing all LXD related processes).

One thing I observed is that $ /usr/bin/snap run --debug-log lxd.activate fails with
cannot send command 1 to helper process: Broken pipe (full log at the end)

A full restart of the system seems ‘solve’ the issue.
Any idea what the problem can be here?
Which logs should I collect next time this happens? (snap log does not seem to have anything useful)

FULL LOG

$ /usr/bin/snap run --debug-log lxd.activate
2022/08/26 09:29:08.829152 cmd_run.go:486: DEBUG: enabled debug logging of early snap startup
2022/08/26 09:29:08.830069 cmd_run.go:1035: DEBUG: executing snap-confine from /snap/snapd/16292/usr/lib/snapd/snap-confine
2022/08/26 09:29:08.830843 cmd_run.go:438: DEBUG: SELinux not enabled
2022/08/26 09:29:08.830942 cmd_run.go:1204: DEBUG: service app not tracked by systemd
2022/08/26 09:29:08.830960 tracking.go:46: DEBUG: creating transient scope snap.lxd.activate
2022/08/26 09:29:08.830990 tracking.go:189: DEBUG: session bus is not available: cannot find session bus
2022/08/26 09:29:08.831001 tracking.go:191: DEBUG: falling back to system bus
2022/08/26 09:29:08.832345 tracking.go:196: DEBUG: using system bus now, session bus was not available
2022/08/26 09:29:08.835032 tracking.go:319: DEBUG: create transient scope job: /org/freedesktop/systemd1/job/1947357
2022/08/26 09:29:08.975167 tracking.go:419: DEBUG: job result is "done"
2022/08/26 09:29:08.975201 tracking.go:426: DEBUG: transient scope snap.lxd.activate.d6cca01f-9b43-4292-b745-2002ee0c7c30.scope created
2022/08/26 09:29:08.975723 tracking.go:146: DEBUG: waited 143.250548ms for tracking
DEBUG: umask reset, old umask was  022
DEBUG: security tag: snap.lxd.activate
DEBUG: executable:   /usr/lib/snapd/snap-exec
DEBUG: confinement:  non-classic
DEBUG: base snap:    core20
DEBUG: ruid: 0, euid: 0, suid: 0
DEBUG: rgid: 0, egid: 0, sgid: 0
DEBUG: apparmor label on snap-confine is: /snap/snapd/16292/usr/lib/snapd/snap-confine
DEBUG: apparmor mode is: enforce
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock file: /run/snapd/lock/.lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: acquiring exclusive lock (scope (global), uid 0)
DEBUG: sanity timeout reset and disabled
DEBUG: ensuring that snap mount directory is shared
DEBUG: unsharing snap namespace directory
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: releasing lock 5
DEBUG: opened snap-update-ns executable as file descriptor 5
DEBUG: opened snap-discard-ns executable as file descriptor 6
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: opening lock file: /run/snapd/lock/lxd.lock
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: acquiring exclusive lock (scope lxd, uid 0)
DEBUG: sanity timeout reset and disabled
DEBUG: initializing mount namespace: lxd
DEBUG: setting up device cgroup
DEBUG: libudev has current tags support
DEBUG: no devices tagged with snap_lxd_activate, skipping device cgroup setup
DEBUG: forked support process 2971122
DEBUG: unsharing the mount namespace (per-snap)
DEBUG: changing apparmor hat to mount-namespace-capture-helper
DEBUG: helper process waiting for command
DEBUG: sanity timeout initialized and set for 30 seconds
DEBUG: scratch directory for constructing namespace: /tmp/snap.rootfs_kkBzn4
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: pivot_root /tmp/snap.rootfs_kkBzn4 /tmp/snap.rootfs_kkBzn4//var/lib/snapd/hostfs
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: rmdir /var/lib/snapd/hostfs//tmp/snap.rootfs_kkBzn4
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: performing operation: (disabled) use debug build to see details
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: calling snapd tool snap-update-ns
DEBUG: waiting for snapd tool snap-update-ns to terminate
DEBUG: requesting changing of apparmor profile on next exec to snap-update-ns.lxd
common.go:60: DEBUG: locking mount namespace of snap "lxd"
common.go:81: DEBUG: freezing processes of snap "lxd"
freezer.go:197: DEBUG: freeze, skipping own group /sys/fs/cgroup/system.slice/snap.lxd.activate.d6cca01f-9b43-4292-b745-2002ee0c7c30.scope
change.go:503: DEBUG: desired mount entries
change.go:503: DEBUG: desired mount entries (sorted)
change.go:578: DEBUG: desiredIDs: map[]
change.go:579: DEBUG: reuse: map[]
change.go:636: DEBUG: processing mount entries
change.go:503: DEBUG: mount entries ordered as they will be applied
common.go:89: DEBUG: unlocking mount namespace of snap "lxd"
common.go:91: DEBUG: thawing processes of snap "lxd"
sanity timeout expired: Interrupted system call
DEBUG: snap-update-ns finished successfully
DEBUG: set_effective_identity uid:0 (change: no), gid:0 (change: yes)
DEBUG: saved mount namespace meta-data to /run/snapd/ns/snap.lxd.info
DEBUG: sending command 1 to helper process (pid: 2971122)
cannot send command 1 to helper process: Broken pipe

What LXD version please?

The contents of /var/snap/lxd/common/lxd/logs/lxd.log should be useful too when it happens.

$ snap list
Name    Version      Rev    Tracking       Publisher   Notes
core20  20220805     1611   latest/stable  canonical✓  base
lxd     5.2-79c3c3b  23155  5.2/stable     canonical✓  -

I will check next time right now the log is full with

time="2022-08-26T14:53:48+02:00" level=error msg="Failed closing listener connection" err="close unix /var/snap/lxd/common/lxd/unix.socket->@: use of closed network connection" listener=81385d15-1a08-415d-ba6c-330bad2214c4

Maybe thats not a good sign in the first place.

Please can you upgrade to LXD 5.5 (which is supported unlike 5.2) and check if you still have problems.

I put it on snap refresh lxd --channel=latest/stable which gives me 5.4. But tbh I don’t think it matters too much im a fairly certain we seen the exact same issue on systems where we tracked latest/stable.

You could try it on latest/candidate which has LXD 5.5 as that will be going to stable shortly.

The reason for that is that error “Failed closing listener connection” no longer exists in LXD, so would be good to get a fresh error condition.

2 Likes