Weekly status for the week of the 4th April to the 10th April.
Introduction
This past week has been focused on working through our issues backlog and trying to resolve as many of them as possible to coincide with the Ubuntu Jammy release.
LXD
Improvements:
- VMs can now use custom firmware or kernel by using the
-bios
or-kernel
QEMU options insideraw.qemu
. - API and CLI errors now return 404 when a resource (or sub resource) cannot be found and the error includes the type of resource that cannot be found rather than just saying “Not found” as it was before. This is useful as when an operation can potentially fail due to multiple different resources not being found, knowing specifically which one wasn’t found is beneficial.
- Added
total
field to theGET /1.0/storage-pools/{name}/volumes/{type}/{volume}/state
API to allow getting total and used size in a single request. - When recovering an instance, if its backup.yaml file does not contain an instance type field, then assume it is a container.
- Added HTTP
HEAD
verb support in file API to allow getting metadata for a file without downloading it.
Bug fixes:
- The VM disk hotplug feature was causing some strange behavior in unrelated parts of Go (such as
lxc exec
web-sockets being closed unexpectedly oros.Exec
calls not returning even after the command had finished. This was tracked down to an issue with the way that file descriptors opened to the disks (in order to pass them to QEMU) were being stored as file descriptor numbers (outside of Go’s own reference keeping) and then later closed by the Go garbage collector. This unfortunately meant that Go was reusing the FD numbers for other operations, and when the FD numbers stored within theos.File
reference were closed, this was causing unrelated operations from being interrupted. This has been fixed by not storing the FD numbers passed to QEMU in an[]*os.File
slice that was never used for disks anyway. - We fixed an issue with VMs running on LVM not properly cleaning up when being stopped after performing a lot of I/O operations (such as exporting the instance). We now allow extra time after the QEMU process has ended to allow for the VM’s pending I/O to be flushed to the LVM subsystem. We do this by trying to unmount the VM’s volume without using MNT_DETACH option to ensure that the unmount has completed successfully and if its still in use we try several times.
- We have introduce an instance update operation lock as it was possible to issue multiple concurrent updates to instance and for them to arrive at an inconsistent state.
- Similarly we have also introduced a lock preventing concurrent deletion of an instance.
- Cross-pool BTRFS optimized refresh has been fixed, before it was failing and leaving the copy on the target server in an inconsistent state.
- It was observed that copying multiple
dir
based VMs concurrently was causing extremely high load and memory usage and often caused the host OS to grind to a halt and/or fail the operation. This was fixed in several ways; firstly it was found that the VM or block volume raw disk image files (which can be very large) were being copied twice per volume. Additionally copying the volume image files was causing the page cache to be polluted causing additional I/O. To avoid these issues the block volume image files are now copied once usingdd
running at low priority and using direct I/O where possible. - Add support for filesystems that don’t support
llistxattr
. - Fixed an intermittent freeze during ZFS copying.
- When nesting VMs, we now avoid using conflicting vsock IDs which was preventing use of the
lxd-agent
inside some nested VMs.
Distrobuilder
Bug fixes;
- Fixed an issue where the image target would ignore
ImageTargetAll
when building LXD images.
YouTube videos
The LXD team is running a YouTube channel with live streams covering LXD releases and weekly videos on different aspects of LXD. You may want to give it a watch and/or subscribe for more content in the coming weeks.
Contribute to LXD
Ever wanted to contribute to LXD but not sure where to start?
We’ve recently gone through some effort to properly tag issues suitable for new contributors on Github: Easy issues for new contributors
Upcoming events
- Nothing planned currently.
Ongoing projects
The list below is feature or refactoring work which will span several weeks/months and can’t be tied directly to a single Github issue or pull request.
- Stable release work for LXC, LXCFS and LXD
Upstream changes
The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.
LXD
- Database Refactor Part 1: Decouple Certificates from their Projects.
- doc: move Sphinx extensions to a separate repo
- Storage volume total size and used in a single query
- lxd/cluster: Don’t overwrite original volatile.evacuate.origin
- DB: Update generator to use api.StatusErrorf(http.StatusNotFound)
- Image: Prevent concurrent delete race
- doc/rest-api: Refresh swagger YAML
- lxd/util/net: Assign default port if no port given
- Storage: Use TryUnmount without MNT_DETACH in DiskMountClear
- lxd/instance/qemu: Allow using external firmware or kernel
- Lxd 100 network bgp
- lxd/storage/drivers/btrfs: Fix optimized refresh
- lxd/instance/qemu: Tweak warning on -bios/-kernel
- lxd/instance: Fix RuntimeLiblxcVersionAtLeast to handle ~
- shared: allow EOPNOTSUPP from llistxattr()
- Network types documentation
- VM: Fix disk hotplugging issues
- Introduction operationlock around Update
- Miscellaneous fixes
- Misc
- LXD: Replace use of ErrNotFound with api.StatusError with code set to http.StatusNotFound
- Backup: Default to container instance type if not specified in backup config
- lxd/storage/drivers/zfs: Close stderr after copy
- Storage: Updates genericVFSCopyVolume to not copy block volume files twice
- Update doc links
- Properly handle nesting a manually built LXD
- lxd/instance/qemu: Avoid conflicting vsock IDs
- Storage: VM copy dir driver peformance improvements
- Implement HEAD in file API
LXC
- Nothing to report this week
LXCFS
Distrobuilder
Dqlite (RAFT library)
Dqlite (database)
Dqlite (Go bindings)
- Nothing to report this week
LXD Charm
- Nothing to report this week
Distribution work
This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.
Ubuntu
- LXC 5.0 pre-release is in Ubuntu 22.04
Snap
- sshfs: Tweaked the error message
- lxd: Cherry-pick upstream bugfixes