Deb to snap. Problems and workarounds

Firstly, thank you to all involved in the development of lxd. It’s a fantastic tool.

We’re currently running lxd via the deb package in ubuntu 18.04. When we upgrade to 20.04, lxd will only be available as a snap. We’ve encountered problems when migrating to the snap version of lxd. Below are the problems and our workarounds. Perhaps this will be useful for others preparing to upgrade.

Only one layer of nesting

We are using lxd to simulate a computer network. Previously we had four levels of nesting. Nested lxd made it easy to segment the network and delegate administration to different team members.

The snap version of lxd only supports lxd in lxd. The deb version allows multiple levels of nesting. It took a long time to discover that this was a regression and not a mistake caused by us. The only notice of this we could find is buried as a comment on a github issue. I think this should be acknowledged more prominently, especially as 20.04 comes closer.

We failed to find a workaround to regain nested containers. We eventually settled on reworking our architecture to a single layer of nesting, with multiple network bridges and containers all at the same level in the hierarchy. This works, but the previous architecture was so much easier to manage.

lxc exec container bash mysteriously dying

Snap refresh will kill active lxc exec connections. The default snap behaviour is to attempt a refresh multiple times a day. If there is an lxd update, active lxc exec connections die. This is obviously frustrating when you’re in the middle of working on something inside a container.

There is no way to disable snap refresh. You can delay it for 60 days (640kb ought to be enough for anyone) with snap set system refresh.hold=2038-01-01T00:00:00+00:00, and then manually invoke snap refresh at a convenient time.

Alternatively, memory holing api.snapcraft.io via /etc/hosts will also work to disable unexpected refreshes.

Only one layer of nesting

On the nesting front, you are correct, this is an apparmor restriction where only one level of apparmor namespacing may be used. For the deb, LXD effectively would degrade when nesting deeper than that, no longer providing individual apparmor namespace per container. With the snap, this isn’t possible as snapd requires a working apparmor namespace to function.

There may be a workaround you can pull there though. Snapd does work on systems without apparmor, so if you can trick snapd into thinking it’s on such a system, then you’d be back to being able to nest LXD up to 32 levels deep (kernel limit).

Background refreshes

Indeed, background refreshes is a core feature of snaps and while containers themselves don’t get restarted, current connections to the LXD API do get killed during refresh. There are some improvements we can work on there and that we’ve been investigating so that the most critical operations do not get killed halfway through (think of container creation, image unpack, …), but exec sessions aren’t something we can really block on as those may just never get disconnected and would cause bugfixes/security fixes to potentially get held indefinitely.

I’ve been thinking about notifying the client in some way that a restart is in progress and that we will terminate the session in say 30s. The problem with that is how to indicate this to the client without messing with the normal screen output from the exec session. You wouldn’t want a script talking to a LXD container to suddenly get some output that didn’t originate from within the container.

As for refresh control, for major releases, we now publish tracks so you can avoid LXD moving automatically to the next major. For bugfixes within a major release, your best bet is indeed to set a maintenance window or otherwise prevent refreshes. For larger organizations, an option is to run the commercial snap proxy that then lets you have all your machines pull from a local server on which you can override the wanted revision of any snap and any channel. Combined with the refresh window, this gives you exact control of what revision will be deployed at what time.