Repeatable LXD installations

You can install a specific current version and stay on it see Pinned Feature Channels in the afore mentioned guide about snap management. However for the same reasons we don’t allow new installs of old versions.

As things currently stand, we’d rather not keep those old channels open after we stop supporting those releases. This gets particularly problematic when we need to deal with security updates.

If there was a way to just hide a channel or to mark a channel as read-only and deprecated (showing a suitable warning/confirmation on install), then we could do that. But as snapd doesn’t offer such facilities at the moment, we intend to stick to only guaranteeing that the current and past feature releases will be available, along with your choice of LTS releases.

2 Likes

Hi everyone,

Thanks for taking to time to help us, once again. I have been reviewing the “Managing the LXD snap” document hoping to understand the best course of action.

If I understand correctly, in order to install a specific version and keep it forever we need:

  • to determine which is the latest version (let’s call it X)
  • to execute
sudo snap install lxd --channel=[X-1]/stable

By installing the release before the latest we ensure it will never be updated (if we did this with the latest it would get updated at some point according to the document).

If X=2.0 or 3.0 or 4.0 we would be able to do this as many times as necessary, but since we need lxd >= 4.9 we are unable to target a specific version for several installations that would get done in the course of months, because as new versions are released the older ones are deleted.

Am I correct?

Assuming I am correct in interpreting the document this would mean that the next opportunity to target a specific release would be when 5.1 is available. When 5.1 is available, doing:

sudo snap install lxd --channel=5.0/stable

would install 5.0 in such a way that it would not get updated without explicit action from us. But at that point, if a bug was found in 5.0 that would be fixed, say in 5.2, we would again be unable to target a specific version, until 6.1.

So, if my understanding is correct we need to forget the idea of targeting the same version for the different installations that we will do during the course of a year. We will be able to keep a certain version fixed on each installation with:

sudo snap install lxd --channel=[X-1]/stable

where X is the latest version, but we will need to do acceptance tests on version X-1 for every installation.

I kindly ask you to confirm if this is the correct interpretation of the document.

Thanks again.

Not quite.

There are two types of LXD releases: feature and LTS.
The feature releases are approximately once a month and are taken from the main branch and given a release number like 4.x (e.g. 4.16, 4.17 etc).
The LXD team also maintaine several LTS series (currently 4.0, 3.0 and 2.0) for 5 years, each series receiving less frequent periodic releases (e.g. 4.0.1, 4.0.2 etc).

You can see these here:

These release are then packaged into a snap and are made available from the snap store via “channels”.
These channels include the release, plus additional cherry-picks for fixes that occur between releases.

The following channels are available:

  • latest/stable - this is the latest feature release (4.x) plus any interim cherry-pick fixes and will automatically update to the next feature release.
  • 4.0/stable - this is the latest 4.0.x LTS release plus any interim cherry-pick fixes and will automatically update to the next LTS point release in the series.
  • 4.x/stable - this is the only available for the last 2 feature releases (4.x), so currently that is 4.16/stable and 4.17/stable. These channels receive cherry-pick fixes until the next feature release is made, at which point they are never updated (no security or bug fixes). However releases older than the last 2 feature release are not available for new installs. So you can’t, for instance, install 4.15/stable now as that no longer exists for new installs, however if you had previously installed using the 4.15/stable channel your system would remain on that release until the channel was manually changed.

The channels available are in the drop down on the top right here Install LXD on Linux | Snap Store

Because the LXD daemon is run as root it is important to keep it up to date, either via the LTS channel or via the feature channel. This is why we are keen not to have people pinned on unsupported releases.

Hi @tomp ,

Thank you again. If I understand you, according to your answer:

  • if we need to install 6 independent LXD systems in the course of a year (ex: 1 every 2 months) we either go for 4.0/stable (which has stoppers) or all of them will have 4.X with a different X.
  • even if we went for 4.0/stable it would be automatically updated to 5.0, once released, putting production systems at risk

So either we have 4.0, which has stopper issues and will automatically upgrade do a newer LTS (big jump), or we have a collection of slightly different 4.X systems to maintain. None of the sistuations seems ideal.

From a IaC (Infrastructure as Code) perspective it is critical to to have:

  1. repeatability: the ability to install identical systems over time
  2. configuration management: the ability to perform changes on a controlled manner (including planned cycles for security updates)
  3. traceability: the ability to audit those changes over time

This would be possible if a reasonable number of previous releases was kept in the archive but seems impossible otherwise.

We could say that the same happens for any debian package, say, firefox, where a new system install updates to the latest (only if the user chooses to perform updates - otherwise the version at the ISO remains). But unlike the case of firefox where each package update affects a single machine and those machines can be grouped in update batches (update group, get feedback, update another group, …), updating lxd could affect hundreds of containers and VMs.

So the

security risk
versus
server instability risk from not well enough tested version

tradeoff expressed in this sentence

Because the LXD daemon is run as root it is important to keep it up to date, either via the LTS >channel or via the feature channel. This is why we are keen not to have people pinned on >unsupported releases.

is a bit hard for me to work out. OPS engineers know that they have to scan security notices and perform security updates on a controlled manner.

For example, if the LXD daemon is only acessible via SSH sessions, which are only possible from a private network is the exposure worth the loss of repeatability, config management and traceability?

No that is not correct.

The 4.0/stable LTS channel will only track the 4.0.x LTS series of releases, it will not automatically switch to 5.0 LTS. This is similar to how apt will get security and bug fix updates in an LTS release of an OS.

The latest/stable channel will track the latest feature release.

The 4.x/stable feature release channels will be pinned at a specific version, but the channels themselves are only available for a short period of time (for the next 2 feature releases). We don’t really recommend to use this, but it is there for those who want to install a feature release and handle their own updates but don’t want to use one of the recommended ways to pin to a specific snap version.

Have you considered using one of the other mechanisms snap provides for controlling rollout of versions?

Namely Cohorts Pinning, and Snap Store Proxy, which are described in the afore mentioned guide:

It sounds like one of those would achieve what you want by pinning to a snap revision and controlling when that gets deployed (which is subtly different from the original question about targetting a specific LXD version).

And of course you can also take the upstream release tarball and package LXD as you need, which will then put you in complete control of when it is applied (this is what some distributions do).

Thanks for the clarifications regarding 4.0/stable not being updated to 5.0. I had misunderstood that.

From what I have seen cohorts limits the number of updates and the snap store proxy seems like another service maintain… all this to work around the fact that snap performs automatic updates whether people want them or not :slight_smile: Seems a bit contra-natura.

From what I know, OPS teams want to do the updates when the risk/benefit of the circumstances tells them to. That is what is done with apt, and the criteria depends on the specific projects and their trafeoffs.

I understand part of the problem comes from how snap works and is not at all lxd specific. All that lxd could do would be leaving more channels available.

We can as well build and package the upstream LXD and rebuild everytime we need to do security updates. But that is a negative incentive to updating and a loss of efficiency. All we wanted was being able to control the versions and the moments of updating in a responsible manner :slight_smile:

From the ops point of view, the fact that you’re never forced to the next channel works fine. If you snap install lxd --channel=4.17 today, you can stay that way for years.

You just won’t be able to install NEW systems on 4.17 after 4.19 is released, but that doesn’t affect existing users that want to schedule their upgrades.

Hi @stgraber

I wish that was the case, but it is not what is written in the document. Please look at this image:

According to this document I would have to determine the version before the latest, which in this case would be 4.16.

So to be clear, if you install --channel=4.17/stable you will NEVER be moved to 4.18.
What you will be getting is typically 3-4 tiny bugfix updates which we do to fix regressions and important bugs following a LXD release.

Once 4.18 releases, you won’t ever get another update as we only fix bugs on the current release, not on prior ones.

Thanks for this clarification @stgraber .

One last question: what is the impact of a background update (either bugfix update or release update) of lxd on a production system being used?

Typically it’s a 30s or so downtime of the API. All instances will keep running so there’s no impact on the actual workloads.

Existing API calls like lxc exec will have up to 5min to complete or will be forcefully disconnected (5min is default timeout, it can be configured).

Thank you Stéphane. One other thing popped up. When we target, for example, 4.17/stable and get automatically

3-4 tiny bugfix updates which we do to fix regressions and important bugs following a LXD release

is there a number that changes? (snap revision?)

The snap revision will go up indeed.

Thank you for all the answers. All things considered I find it a bit disappointing that a super powerful and sophisticated solution such as LXD is being delivered in a way that is rather OPS and IaC unfriendly. The recommended way lacks repeatability and fine grained control of security update cycles.

We will probably move along the following lines:

  1. repeatability: we are forced to accept that installations are not repeatable which therefore will force a full test every time we deploy a new cluster or standalone server; the corresponding effort will be reflected in couple of ways over time, rather then in planned feature update cycles

  2. control of updates: we will implement a workaround that prevents automatic updates, probably based on the suggestions written here:

and share the results with the community.

Problem 1) would be easily fixed if older channels were kept for a reasonable amount of time. On the replies written here, there seems to be an implicit assumption that OPS teams will not do “the right thing” if they are not forced to, “the right thing” being updating their system once updates are available. Because “the right thing” depends on risk/benefit analysis which is particular to each project where lxd is used, I find this "one fits all " assumption a bit too radical. It kind of says that the security risk from not updating immediately is always higher than the risk of unpredictable behaviour on a production system, arising from slight differences in the software. My experience with risk analysis tells me that it really depends on the exposure and impact, among other things.

Problem 2) is a serious design problem with snap not related to lxd. It needs to be fixed, unfortunately by means of workaround, for any critical infrastructure component that is delivered this way.

We will share our experience.

Thanks everyone.

On that point, snapd does support the ability to defer updates for up to 60 days and also specify the times when these updates occur, so it doesn’t have to be immediately updated as the software is released.

https://snapcraft.io/blog/how-to-make-snaps-and-configuration-management-tools-work-together

That doesn’t help with your repeatability of freshly installed systems goal though. Only snap proxy really helps with that.

https://docs.ubuntu.com/snap-store-proxy/en/overrides

Also, note, that LXD clusters all have to run the same LXD version, so they should be updated together close after one and other (irrespective of whether using snap or not).

The point of immediately here was about being forced to adopt newer versions on fresh installs almost after they are released (only an older one is kept). This assumes that OPS teams will always be wrong in their management, that they must be forced to use the latest or they will never update.

In the context of snap refreshes, 60 days being reasonable is also an assumption. Whether 60 days is too much or too little depends on the context. In the blog article I linked above a former Snap Advocate of Canonical, includes a cron job that postpones the refresh everyday in order to not have a deadline. I guess that it says a lot.

I am aware of Snap Proxy - it just seems to me contra natura to have to do that. It also introduces other potential problems.

Also, note, that LXD clusters all have to run the same LXD version, so they should be updated
together close after one and other (irrespective of whether using snap or not).

Right. We will take this into account. Thanks.

So this is what we decided to do:

  1. retest latest 4.0/stable and re-validate (4.0.7 seems to work for all our scenarios)
  2. install via SNAP in a way that updates are never applied automatically
  3. develop a script that updates the snap in the moment we want to

Step 2 is a declarative way of doing this:

snap download lxd --channel=4.0 --basename=lxd --target-directory=/root/snap
snap install /root/snap/lxd.snap --dangerous

Thank you everyone for the multiple comments on this matter.

1 Like

I would like to do a gentle follow up in the matter of reproducibility and immutability of the installations. The workaround above seems to not work with the latest LXD snap and the need for reproducible and immutability of the installations remains. Can you advise?

Thanks in advance.