Weekly status #243

tomp · April 11, 2022, 1:18pm

Weekly status for the week of the 4th April to the 10th April.

Introduction

This past week has been focused on working through our issues backlog and trying to resolve as many of them as possible to coincide with the Ubuntu Jammy release.

LXD

Improvements:

VMs can now use custom firmware or kernel by using the -bios or -kernel QEMU options inside raw.qemu.
API and CLI errors now return 404 when a resource (or sub resource) cannot be found and the error includes the type of resource that cannot be found rather than just saying “Not found” as it was before. This is useful as when an operation can potentially fail due to multiple different resources not being found, knowing specifically which one wasn’t found is beneficial.
Added total field to the GET /1.0/storage-pools/{name}/volumes/{type}/{volume}/state API to allow getting total and used size in a single request.
When recovering an instance, if its backup.yaml file does not contain an instance type field, then assume it is a container.
Added HTTP HEAD verb support in file API to allow getting metadata for a file without downloading it.

Bug fixes:

The VM disk hotplug feature was causing some strange behavior in unrelated parts of Go (such as lxc exec web-sockets being closed unexpectedly or os.Exec calls not returning even after the command had finished. This was tracked down to an issue with the way that file descriptors opened to the disks (in order to pass them to QEMU) were being stored as file descriptor numbers (outside of Go’s own reference keeping) and then later closed by the Go garbage collector. This unfortunately meant that Go was reusing the FD numbers for other operations, and when the FD numbers stored within the os.File reference were closed, this was causing unrelated operations from being interrupted. This has been fixed by not storing the FD numbers passed to QEMU in an []*os.File slice that was never used for disks anyway.
We fixed an issue with VMs running on LVM not properly cleaning up when being stopped after performing a lot of I/O operations (such as exporting the instance). We now allow extra time after the QEMU process has ended to allow for the VM’s pending I/O to be flushed to the LVM subsystem. We do this by trying to unmount the VM’s volume without using MNT_DETACH option to ensure that the unmount has completed successfully and if its still in use we try several times.
We have introduce an instance update operation lock as it was possible to issue multiple concurrent updates to instance and for them to arrive at an inconsistent state.
Similarly we have also introduced a lock preventing concurrent deletion of an instance.
Cross-pool BTRFS optimized refresh has been fixed, before it was failing and leaving the copy on the target server in an inconsistent state.
It was observed that copying multiple dir based VMs concurrently was causing extremely high load and memory usage and often caused the host OS to grind to a halt and/or fail the operation. This was fixed in several ways; firstly it was found that the VM or block volume raw disk image files (which can be very large) were being copied twice per volume. Additionally copying the volume image files was causing the page cache to be polluted causing additional I/O. To avoid these issues the block volume image files are now copied once using dd running at low priority and using direct I/O where possible.
Add support for filesystems that don’t support llistxattr.
Fixed an intermittent freeze during ZFS copying.
When nesting VMs, we now avoid using conflicting vsock IDs which was preventing use of the lxd-agent inside some nested VMs.

Distrobuilder

Bug fixes;

Fixed an issue where the image target would ignore ImageTargetAll when building LXD images.

YouTube videos

The LXD team is running a YouTube channel with live streams covering LXD releases and weekly videos on different aspects of LXD. You may want to give it a watch and/or subscribe for more content in the coming weeks.

Contribute to LXD

Ever wanted to contribute to LXD but not sure where to start?
We’ve recently gone through some effort to properly tag issues suitable for new contributors on Github: Easy issues for new contributors

Upcoming events

Nothing planned currently.

Ongoing projects

The list below is feature or refactoring work which will span several weeks/months and can’t be tied directly to a single Github issue or pull request.

Stable release work for LXC, LXCFS and LXD

Upstream changes

The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.

LXD

LXC

Nothing to report this week

LXCFS

Distrobuilder

main: Fix image targets for LXD

Dqlite (RAFT library)

Dqlite (database)

Dqlite (Go bindings)

Nothing to report this week

LXD Charm

Nothing to report this week

Distribution work

This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.

Ubuntu

LXC 5.0 pre-release is in Ubuntu 22.04

Snap

sshfs: Tweaked the error message
lxd: Cherry-pick upstream bugfixes