Weekly status #221

tomp · October 25, 2021, 8:29am

Weekly status for the week of the 18th of October to the 24th of October.

Introduction

This past week has been a busy one for LXD. It has gained two OVN networking features (network to network routing and network source address spoof protection), as well as support for fanotify for filesystem events watchers, and partial support for VM stateful migrations. In addition to that there has been a focus on improving cluster failover reliability.

LXD

@stgraber has added a new video covering deployment of LXD clusters with Juju:

https://www.youtube.com/watch?v=JgNVAvcXR9Q

New features:

OVN network-to-network routing (peering). This is one of our roadmap items and you can read more about its purpose and design in the design specification document.
OVN network source address spoofing protection. As part of implementing network to network peering we wanted to ensure that asymmetric routing was not possible and so we have added router security policy rules to ensure that only traffic from known source address of the network were allowed to be routed down the peer connection. However we have extended this validation so it applies to all egress traffic from the network. This means OVN networks now prevent instance NICs on the network from sending traffic outside of the known allowed subnets for that network. Note, this does not prevent instance NICs from assuming an IP of another instance NIC on the same network.
VM stateful migrations. Initial support for VM stateful migrations has been added when using clustering (with the --target flag ) or when moving a VM between local storage pools. This action causes the VM to be stopped statefully (i.e with its memory state saved to a file), migrated and then started back up again with its memory state restored.
Ability to specify sysctls settings for containers. When using LXD containers you can now specify that certain sysctl settings are applied inside the container on start using the linux.sysctl.* settings.

Improvements:

Fanotify support for filesystem event monitoring. LXD now supports (and prefers) using fannotify for watching for filesystem events. These are used to support automatic hot-plugging of devices into instances.

Bug fixes:

Bridged NICs now prevent the use of the ipv{n}.address settings when connected to an unmanaged bridge. This avoids confusion where a static IP is set but cannot take effect due to not being connected to LXD’s DHCP server.
A recent regression in listing active managed bridge DHCP leases has been fixed.
lxd-p2c gains support for passing an existing certificate.
No longer auto fill cluster member scheduler.instance config when adding new member.

Clustering failover fixes:

There’s been a focus on improving the reliability of clustering fail-over this past week. It was observed that if the LXD dqlite leader became abruptly unreachable by the other cluster members (perhaps if it went offline or there was a network issue), in some cases if there was data still in the TCP send queue of the DB connection that the remaining cluster members would block for up to 15 minutes before failing the ongoing query and recovering. During that time all operations that required DB access on those members would block. This, to most intents and purposes, effectively prevented fail-over from occurring for up to 15 minutes. The reason for this is because the data in the TCP send queue was preventing the normal TCP keep-alive timers from taking effect and the OS’s TCP re-transmission timers were taking precedence. These by default keep trying to re-transmit the data to the unreachable server for 15 minutes.

At the same time we also observed that the LXD event connection from the unreachable server was also hanging around blocked for 15 minutes.

To resolve these issues required several fixes:

Events web socket API now sends heartbeats to connected clients and expects replies. If the replies do not come in time then the socket connection is closed down.
DB queries (which will go to the leader server) now have a 10s timeout, which is implemented as a TCP read deadline, meaning that when a remote leader server becomes unreachable, ongoing queries will block for up to 10s before detecting the connection is broken and allowing a re-connection attempt (possibly to a new leader server) to proceed. This effectively shortens the cluster fail-over time from 15minutes to 10s.
The Dqlite proxy subsystem inside LXD (that handles incoming DB connections and outgoing Raft connections for other cluster members) is now using the TCP_USER_TIMEOUT connection setting to set the maximum time that a connection can remain open with unacknowledged sent data. This means that if data is stuck in the TCP send queue for too long, the socket will be closed forcefully closed (preventing connections and go routines hanging around for up to 15 minutes).
Retry cluster transactions once on query timeout so that if the query timed out due to a leader election it will retry automatically once the leader election has finished.

LXC

New features:

You can now specify how many RX and TX queues are configured with veth NICs using the veth.n_rxqueues and veth.n_txqueues settings respectively. This allows for distributing traffic over multiple CPU cores.

Improvements:

Detect and prevent rootfs being over-mounted using lxc.mount.entry setting, as this causes confusion during container setup.
Handle kernels without or not using SMT.

Bug fixes:

Support restoring containers with pre-created veth devices (CRIU).

Distrobuilder

Improvements:

The rootfs-http downloaded now supports local files with a prefix of file://.

Bug fixes:

Various fixes for the Oracle image.

Dqlite (database)

Bug fixes:

A statement leak has been fixed that was causing assert being hit in leader__close.
Fix page numbers leak.

Dqlite (Go bindings)

Bug fixes:

Fixes an issue that was preventing storing and retrieving a sql.NullTime.

Youtube channel

We’ve started a Youtube channel with live streams covering LXD releases and its use in the wider ecosystem.

You may want to give it a watch and/or subscribe for more content in the coming weeks.

https://www.youtube.com/lxd-videos

Contribute to LXD

Ever wanted to contribute to LXD but not sure where to start?
We’ve recently gone through some effort to properly tag issues suitable for new contributors on Github: Easy issues for new contributors

Upcoming events

Nothing to report this week

Ongoing projects

The list below is feature or refactoring work which will span several weeks/months and can’t be tied directly to a single Github issue or pull request.

Distrobuilder Windows support
Virtual networks in LXD
Various kernel work
Stable release work for LXC, LXCFS and LXD

Upstream changes

The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.

LXD

LXC

LXCFS

Distrobuilder

Dqlite (RAFT library)

Nothing to report this week

Dqlite (database)

Dqlite (Go bindings)

message: Fix sql.NullTime losing Nullness

LXD Charm

Distribution work

This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.

Ubuntu

Nothing to report this week

Snap

snapcraft: Simplified cohort handling
lxd: Cherry-pick upstream bugfixes
lxcfs: Bump to 4.0.11
lxc: Bump to 4.0.11
lxc: Cherry-pick upstream bugfixes

turtle0x1 · October 29, 2021, 5:16pm

Please could someone elaborate on this, I’ve had a socket open for a while now and I’m not seeing any pings? Perhaps the node websocket implementation auto magically handles this?

Im using snap version git-54ed3dc which appears to have the aforementioned fix.

tomp · October 29, 2021, 7:24pm

So the heartbeat implementation is here:

github.com

tomponline/lxd/blob/480f4956b474ff626b0478f00190045c9316c467/lxd/events/events.go#L189-L231

    
      
          func (e *Listener) heartbeat() {
          	defer e.Close()
          
          
	pingInterval := time.Second * 5
          	e.lastPong = time.Now() // To allow initial heartbeat ping to be sent.
          
          
	e.SetPongHandler(func(msg string) error {
          		e.lastPong = time.Now()
          		return nil
          	})
          
          
	// Run a blocking reader to detect if the remote side is closed.
          	// We don't expect to get anything from the remote side, so this should remain blocked until disconnected.
          	go func() {
          		e.Conn.NextReader()
          		e.Close()
          	}()
          
          
	for {
          		if e.IsClosed() {

This file has been truncated. show original

It uses the gorilla websocket package’s built in heartbeat support using websocket control messages, sent from the LXD server. Which is why perhaps you don’t see it, as your websocket client may be handling them automatically for you (that is what the gorilla websocket client does by default).

turtle0x1 · October 30, 2021, 11:33am

I had seen the code but probably should have paid more attention, as we both suspected the ws node package appears todo this automagically.

Sorry & Thanks