Weekly status for the week of the 4th of October to the 10th of October.
LXD
This week we have added core scheduling support for VMs, and have merged a slew of improvements and bug fixes, with a focus on improving stability of clustering tests, as well as fixes to the recently released metrics exporter and some changes to the way we mount ZFS volumes to be more reliable when operating inside the snap package.
And @stgraber has also produced a video on using cloud-init with LXD.
https://www.youtube.com/watch?v=8OCG15TAldI
New features:
- Added core scheduling support for VMs and containers (even without support in liblxc).
Improvements:
- Clustering role handover: Over the last couple of weeks we have been focussing on making the clustering tests in our test suite more reliable, and part of this has been to make the clustering role change process itself more predictable (so that the tests can check for it). One of the recent changes we made was to make a LXD server that is shutting down handover its role sooner in the sequence so that the cluster operation continued smoothly whilst the server finished off its shutdown process. However this exacerbated an issue where the cluster role rebalancing process that is triggered at the end of a heartbeat round was detecting that the shutting down server was still online and re-promoting its role just after it had stepped down. We have now introduced a change that means that a server that is shutting down can indicate in its heartbeat response that although it is online, it does not want to be a candidate for role promotion.
- Clustering error handling: If a leader member address cannot be ascertained then an error is now returned rather than returning nil. This helps when a server is shutting down that cannot access the global database (perhaps it has lost quorum), as it will detect the issue sooner and switch to offline shutdown mode earlier.
- Storage ZFS volume mounting: Previously LXD set the
mountpoint
property on ZFS volume datasets to the desired mount path and used thezfs mount
command to actually trigger a dataset to be mounted. However we were seeing issues when running LXD inside the snap package’s mount namespace when using a ZFS storage pool that was part of a wider ZFS dataset that was mounted on the host’s mount namespace. It seems that sometimes the ZFS mount subsystem was causing the LXD ZFS volume dataset to be mounted in the host’s mount namespace rather than the snap’s mount namespace (even though thezfs mount
command was being executed inside the snap’s namespace). We believe we have worked around this issue by now setting themountpoint=none
property on all new and existing LXD volume datasets, and using the normal mount syscall (that we use for all other storage pool drivers) to mount the datasets. We were already using the normal syscall approach to unmount ZFS volumes (due to previous namespace issues). - Bridge networks will now show OVN network’s router addresses in the
lxc network list-leases
command output.
Bug fixes:
- A regression in VM backup import has been fixed to allow non-optimised backups to be restored.
- A race condition in the image download logic has been fixed so that if multiple instances that use the same image are launch concurrently they won’t cause image DB record errors as they both try to download the image.
- When calling
lxc stop
on a VM instance without a timeout, if the VM took longer than 30s to actually stop, but did eventually stop, then an error would be shown indicating that the operation lock timeout exceeded the default 30s. This issue has now been fixed so that the operation lock is kept alive until the stop process has ended, or has been interrupted by anotherlxc stop
orlxc stop -f
request. This behaviour has also been ported to the container driver too. - A regression in the
lxd shutdown
command has been fixed. Previously the shutdown API endpoint was asynchronous, but it was recently made synchronous so thatlxd shutdown
can block until LXD has shutdown but doesn’t depend on the global database and the event stream (which it did before). However this change kept the checking of the event stream functionality around as a “belt-and-braces” approach, but this ended up causing LXD to start again when run inside the snap due to the snap package’s socket activation, so now that additional shutdown check has been removed. - Also
lxd shutdown
related, during offline shutdown mode, when the global database is not available, the shutdown process has been reworked to try and use the instance’s backup.yaml file to load information about the instance so it can be shutdown cleanly. This now also supports shutting down VMs properly too. - Several issues with the metrics exporter have been fixed.
- An AppArmor fix was added to allow remount when
noatime
set.
LXC
Improvements:
- Logging improvements when using musl.
LXCFS
Bug fixes:
- Fixes
-v
and--version
handling.
Distrobuilder
It was attempted to remove the NoNewPrivileges=no
from the systemd generator as it was thought that this wouldn’t be needed any more. However it was reverted shortly afterwards as some images do still require it on certain kernels.
Dqlite (RAFT library)
Bug fixes:
- An issue that was being seen intermittently in the LXD clustering test suite has been fixed. When shutting down multiple members concurrently, there was an issue where a node stays leader after role changed to
RAFT_SPARE
which was causing assertion crashes and has now been fixed.
Youtube channel
We’ve started a Youtube channel with live streams covering LXD releases and its use in the wider ecosystem.
You may want to give it a watch and/or subscribe for more content in the coming weeks.
https://www.youtube.com/lxd-videos
Contribute to LXD
Ever wanted to contribute to LXD but not sure where to start?
We’ve recently gone through some effort to properly tag issues suitable for new contributors on Github: Easy issues for new contributors
Upcoming events
- Nothing to report this week
Ongoing projects
The list below is feature or refactoring work which will span several weeks/months and can’t be tied directly to a single Github issue or pull request.
- Distrobuilder Windows support
- Virtual networks in LXD
- Various kernel work
- Stable release work for LXC, LXCFS and LXD
Upstream changes
The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.
LXD
- Clustering: Prevent a member that is shutting down from being promoted
- Generator: Insert into certificates_projects table
- Fix metrics issues
- Instance: Use project and instance name for operation locks
- Instance: Rework instancesOnDisk to load config from backup.yaml if available
- lxd-agent: Drop aggregated cpu stats in metrics
- test: Kill LXD process if doesn’t start in time
- lxd/main/shutdown: Fix shutdown regression when running in snap
- Suggest Ubuntu 20.04 instead of 18.04
- lxc: update wording when a cert is successfully trusted by a remote
- Update protobuf code
- Introduce downstream networks in leases
- lxd/apparmor: Allow remount using noatime
- Apparmor simplification
- test/suites: Fix cephfs backup tests
- Cluster: Error when no leader address found during handover
- Instance: Keep instance operation lock alive whilst waiting for instance to shutdown
- Instance: Fix image download race condition in instanceCreateFromImage
- Storage: Use normal mount rather than zfs mount for ZFS volumes
- Simpler filters
- lxd/network: Move Leases to network package
- lxd: core scheduling support for virtual machines and container core scheduling even without LXC shared library support
- Storage: Set mountpoint=none for ZFS filesystem volumes
- lxd/instance/lxc: Properly report mapped memory
- Network: Rework network loading functionality
- Clustering: Improve reliablity of remove raft node test
- Instance: Fix container restart locking
- lxd/network/driver/ovn: Fix comment on getLoadBalancerName
- lxd/network/ovn: Add support for leases
- Backup: Fix regression of VM backup imports
LXC
LXCFS
Distrobuilder
- main: remove NoNewPrivileges=no from systemd-generator
- Revert “main: remove NoNewPrivileges=no from systemd-generator”
Dqlite (RAFT library)
Dqlite (database)
- Nothing to report this week
Dqlite (Go bindings)
- Nothing to report this week
LXD Charm
- Nothing to report this week
Distribution work
This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.
Ubuntu
- Nothing to report this week
Snap
- Tweaked cohort key handling on refresh
- lxd: Cherry-picked upstream bugfixes