Weekly status #219


Weekly status for the week of the 4th of October to the 10th of October.

LXD

This week we have added core scheduling support for VMs, and have merged a slew of improvements and bug fixes, with a focus on improving stability of clustering tests, as well as fixes to the recently released metrics exporter and some changes to the way we mount ZFS volumes to be more reliable when operating inside the snap package.

And @stgraber has also produced a video on using cloud-init with LXD.

https://www.youtube.com/watch?v=8OCG15TAldI

New features:

  • Added core scheduling support for VMs and containers (even without support in liblxc).

Improvements:

  • Clustering role handover: Over the last couple of weeks we have been focussing on making the clustering tests in our test suite more reliable, and part of this has been to make the clustering role change process itself more predictable (so that the tests can check for it). One of the recent changes we made was to make a LXD server that is shutting down handover its role sooner in the sequence so that the cluster operation continued smoothly whilst the server finished off its shutdown process. However this exacerbated an issue where the cluster role rebalancing process that is triggered at the end of a heartbeat round was detecting that the shutting down server was still online and re-promoting its role just after it had stepped down. We have now introduced a change that means that a server that is shutting down can indicate in its heartbeat response that although it is online, it does not want to be a candidate for role promotion.
  • Clustering error handling: If a leader member address cannot be ascertained then an error is now returned rather than returning nil. This helps when a server is shutting down that cannot access the global database (perhaps it has lost quorum), as it will detect the issue sooner and switch to offline shutdown mode earlier.
  • Storage ZFS volume mounting: Previously LXD set the mountpoint property on ZFS volume datasets to the desired mount path and used the zfs mount command to actually trigger a dataset to be mounted. However we were seeing issues when running LXD inside the snap package’s mount namespace when using a ZFS storage pool that was part of a wider ZFS dataset that was mounted on the host’s mount namespace. It seems that sometimes the ZFS mount subsystem was causing the LXD ZFS volume dataset to be mounted in the host’s mount namespace rather than the snap’s mount namespace (even though the zfs mount command was being executed inside the snap’s namespace). We believe we have worked around this issue by now setting the mountpoint=none property on all new and existing LXD volume datasets, and using the normal mount syscall (that we use for all other storage pool drivers) to mount the datasets. We were already using the normal syscall approach to unmount ZFS volumes (due to previous namespace issues).
  • Bridge networks will now show OVN network’s router addresses in the lxc network list-leases command output.

Bug fixes:

  • A regression in VM backup import has been fixed to allow non-optimised backups to be restored.
  • A race condition in the image download logic has been fixed so that if multiple instances that use the same image are launch concurrently they won’t cause image DB record errors as they both try to download the image.
  • When calling lxc stop on a VM instance without a timeout, if the VM took longer than 30s to actually stop, but did eventually stop, then an error would be shown indicating that the operation lock timeout exceeded the default 30s. This issue has now been fixed so that the operation lock is kept alive until the stop process has ended, or has been interrupted by another lxc stop or lxc stop -f request. This behaviour has also been ported to the container driver too.
  • A regression in the lxd shutdown command has been fixed. Previously the shutdown API endpoint was asynchronous, but it was recently made synchronous so that lxd shutdown can block until LXD has shutdown but doesn’t depend on the global database and the event stream (which it did before). However this change kept the checking of the event stream functionality around as a “belt-and-braces” approach, but this ended up causing LXD to start again when run inside the snap due to the snap package’s socket activation, so now that additional shutdown check has been removed.
  • Also lxd shutdown related, during offline shutdown mode, when the global database is not available, the shutdown process has been reworked to try and use the instance’s backup.yaml file to load information about the instance so it can be shutdown cleanly. This now also supports shutting down VMs properly too.
  • Several issues with the metrics exporter have been fixed.
  • An AppArmor fix was added to allow remount when noatime set.

LXC

Improvements:

  • Logging improvements when using musl.

LXCFS

Bug fixes:

  • Fixes -v and --version handling.

Distrobuilder

It was attempted to remove the NoNewPrivileges=no from the systemd generator as it was thought that this wouldn’t be needed any more. However it was reverted shortly afterwards as some images do still require it on certain kernels.

Dqlite (RAFT library)

Bug fixes:

  • An issue that was being seen intermittently in the LXD clustering test suite has been fixed. When shutting down multiple members concurrently, there was an issue where a node stays leader after role changed to RAFT_SPARE which was causing assertion crashes and has now been fixed.

Youtube channel

We’ve started a Youtube channel with live streams covering LXD releases and its use in the wider ecosystem.

You may want to give it a watch and/or subscribe for more content in the coming weeks.

https://www.youtube.com/lxd-videos

Contribute to LXD

Ever wanted to contribute to LXD but not sure where to start?
We’ve recently gone through some effort to properly tag issues suitable for new contributors on Github: Easy issues for new contributors

Upcoming events

  • Nothing to report this week

Ongoing projects

The list below is feature or refactoring work which will span several weeks/months and can’t be tied directly to a single Github issue or pull request.

  • Distrobuilder Windows support
  • Virtual networks in LXD
  • Various kernel work
  • Stable release work for LXC, LXCFS and LXD

Upstream changes

The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.

LXD

LXC

LXCFS

Distrobuilder

Dqlite (RAFT library)

Dqlite (database)

  • Nothing to report this week

Dqlite (Go bindings)

  • Nothing to report this week

LXD Charm

  • Nothing to report this week

Distribution work

This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.

Ubuntu

  • Nothing to report this week

Snap

  • Tweaked cohort key handling on refresh
  • lxd: Cherry-picked upstream bugfixes
1 Like