Status on issues with LXD 3.4 snap

stgraber · August 17, 2018, 6:29pm

Here is a summary of the issues we’ve found on the 3.4 snap and their status:

FIXED: Complete failure to start on s390x

This issue was caused by a big-endian handling issue in libdqlite, the issue has now been fixed upstream and and the snap updated with a fix.

FIXED: Some unusable executables on systems lacking unprivileged file caps

On systems lacking unprivileged file capabilities (most kernels prior to 4.14), LXD will still shift the capabilities and update the xattr but in a format which the kernel can’t understand.

The result is that such binaries become impossible to execute in the container.

A fix has been merged upstream and an updated snap published.

FIXED: Database errors on startup, leading to 500 errors on all API calls

This was tracked down to a bug in dqlite and go-dqlite with handling of some sqlite3 messages. The issue was fixed upstream and the snap has been updated with the fix.

FIXED: Upgrade issue with leftover `lxd` process

In some cases the upgrade ends up starting a second copy of the daemon while an older one is left running, this ends up making both fail with database related errors.

A change to the LXD startup logic has been added to the snap, this should identify leftover LXD instances and kill them during startup.

Snap upgrade ordering/timing issue with LXD clustering

LXD clusters require all nodes run the same LXD version (currently 3.4), unfortunately as the snap is updated randomly within a 24h period, cluster users are likely to end up with only some of their nodes running the newer release.

When this happens, all LXD API calls are held (frozen) until the remaining nodes have completed the upgrade.

Currently the best way to get past this is to run snap refresh lxd on all nodes, at which point the cluster will come back online.

We’re investigating ways to have LXD trigger a refresh automatically when this happens which would avoid a similar issue happening when we release LXD 3.5.