Repeatable LXD installations

I’d be interested to understand what the change in snap behavior was that caused an inadvertent channel change?

This was the solution that we found in the past to stabilize the LXD version:

It stopped working but our IaC code relied on it. Then we changed our IaC code and there was a series of events that led to the problem.

To be precise, servers installed with Ubuntu 20.04 LTS on different moments of time got different versions of snap which had different behaviour. Because our IaC needs to be applied idempotently to each server (all of them with 20.04) we were surprised by inconsistent behaviours across servers and the attemps to fix resulted in some servers being upgraded. Snap should not change its behaviour inside an LTS as that defeats the purpose of having an LTS.

Anyway, at least the new snap has the --hold flag which allows for stabilization of the version. But now our VMs do not allow “exec”.

OK replying to myself with the solution:

  • the VM had to be rebooted so that lxd-agent was properly started
  • but this was not enough … the VM also had to be stopped/started on the LXD host

After this exec works again.

So seems that after upgrading LXD the VMs need rebooting.

(I don’t know if updating the LXD snap inside the VM made any difference in the process. I suspect that it is not relevant was we don’t really use LXD inside the VM)

Normally when upgrading LXD the VMs don’t need to be restarted.

However there was a bug fix between LXD 4.x and LXD 5.0.1 where the VM agent protocol changed from being incorrectly double TLS encrypted to correctly just being encrypted once with TLS.

See LXD 5.0.1 LTS has been released

I’ve also updated the snap tutorial with a reference to snap refresh --hold:

Thank you for this comment @tomp . This (i.e., the need of a VM reboot) also affects upgrades from 5.0 to 5.0.1 if I understand correctly. Can you confirm?

Yes that is correct. It was unfortunate and we tried many ways to avoid needing a reboot to the instances, but we were ultimately not able to find a way around it.

we don’t have everything certified for 5.0 yet