Repeatable LXD installations

Yes, and which part has stopped working? Is there an error?

"The workaround above seems to not work with the latest LXD snap "

Yes, I can no longer install the local snap file. The error is this

This worked fine on previous servers, which have been running without problems for a long time.

Can you look in dmesg output for DENIED entries to do with apparmor?

As an aside, the official way to do this with snap is to use the snap store proxy.

See the proxy section in Managing the LXD snap

It allows you to pin a certain snap revision so all of your machines get same version.

Meanwhile I found something recent:

I am repeating the installation and will provide feedback ASAP.

Messages from apparmor.

The problem seems to go away if we remove the --dangerous option, which is the past was necessary. Things changing behaviour inside an LTS version :thinking:

1 Like

Seems that without the --dangerous option (which worked before and now does not…) we also need to run snap ack. However it seems that by running this we are enabling automatic refreshes to be performed which defeats the purpose of this exercise.

As a matter of fact a Snap refresh just killed one of our existing test LXD servers that had been installed with snap ack by updating from 4.x stable to 5.x (VM terminal access stopped working)

This is breaking business continuity for serious users of LXD - we simply can’t have production servers updated in an uncontrolled manner. I am aware of the SNAP proxy but that adds unnecessary complexity - the point is running a reliable virtualization server, not a server that workarounds SNAP problems.

Would you be able to suggest a simple solution to not have our LXD servers updated except when we want to?

Is updating to snap 2.58 the simplest option?

If you are using the 5.0/stable LTS channel then it won’t auto upgrade between major versions.

You could use snap hold too and then manage when you get security updates manually.

We are using 4.0/stable since we don’t have everything certified for 5.0 yet.

So 4.0/stable won’t switch to 5.x automatically.

Well but it somehow did!

We are going to try the latest snapd so that --hold can be used. I will report the result. Thank you.

1 Like

The aftermath of this:

  • the new --hold flag works
  • due to the confusion introduced by the change of snap behaviour (inside an LTS version!) we had two machines that took unintended LXD upgrades (from 4.0.X to 5.12) and plus interruption of running production systems

After the unintended upgrades to 5.12 we can’t downgrade even to 5.0.X because the lxd recover command does not recognize the machines. On version 5.12:

sudo lxc exec my-vm bash

fails with

Error: Failed to connect to lxd-agent

This worked 100% before the upgrade. I tried to upgrade lxd inside the VM but that did not work either. Any ideas on how to solve?

I’d be interested to understand what the change in snap behavior was that caused an inadvertent channel change?

This was the solution that we found in the past to stabilize the LXD version:

It stopped working but our IaC code relied on it. Then we changed our IaC code and there was a series of events that led to the problem.

To be precise, servers installed with Ubuntu 20.04 LTS on different moments of time got different versions of snap which had different behaviour. Because our IaC needs to be applied idempotently to each server (all of them with 20.04) we were surprised by inconsistent behaviours across servers and the attemps to fix resulted in some servers being upgraded. Snap should not change its behaviour inside an LTS as that defeats the purpose of having an LTS.

Anyway, at least the new snap has the --hold flag which allows for stabilization of the version. But now our VMs do not allow “exec”.

OK replying to myself with the solution:

  • the VM had to be rebooted so that lxd-agent was properly started
  • but this was not enough … the VM also had to be stopped/started on the LXD host

After this exec works again.

So seems that after upgrading LXD the VMs need rebooting.

(I don’t know if updating the LXD snap inside the VM made any difference in the process. I suspect that it is not relevant was we don’t really use LXD inside the VM)

Normally when upgrading LXD the VMs don’t need to be restarted.

However there was a bug fix between LXD 4.x and LXD 5.0.1 where the VM agent protocol changed from being incorrectly double TLS encrypted to correctly just being encrypted once with TLS.

See LXD 5.0.1 LTS has been released

I’ve also updated the snap tutorial with a reference to snap refresh --hold:

Thank you for this comment @tomp . This (i.e., the need of a VM reboot) also affects upgrades from 5.0 to 5.0.1 if I understand correctly. Can you confirm?

Yes that is correct. It was unfortunate and we tried many ways to avoid needing a reboot to the instances, but we were ultimately not able to find a way around it.

we don’t have everything certified for 5.0 yet