[LXD charm] New implementation (Operator Framework)

sdeziel · July 6, 2021, 6:36pm


Project	LXD
Status	Implemented
Author(s)	@sdeziel
Approver(s)	@stgraber, @morphis, @jamespage, Michael Skalka
Release	Not applicable
Internal ID	LX003

Abstract

Provide an easy way to use Juju to deploy standalone or clustered LXD machines. This new charm will be implemented using the Charmed Operator Framework.

Rationale

Over time, LXD was “charmed” multiple times. Here’s a non exhaustive list of prior charms that were all based on the older Reactive Framework:

Initial lxd charm by the OpenStack team for use with the now defunct nova-compute-lxd. It only supported standalone mode and was quite OpenStack specific.
The lxd charm by the Anbox team which is used to setup LXD clusters specifically for use by Anbox.
The lxd-cluster charm by Michael Skalka which was an older attempt at a generic charm to assemble LXD clusters.

The new charm will employ the Operator Framework from the get go as it is the modern way of “charming” an application. It will also provide first class support for clustering allowing secure, reliable and easy scaling up/down.

Specification

Design

The new charm will only support LXD 4.0 and later installed through snap. The snap track/channel will be configurable. Those LXD deployments driven by juju will provide an easy way to make fleet-wide configuration changes that traditionally required manual intervention (like snap set lxd lxcfs.loadaverage=true) on each of the cluster members.

When in cluster mode, Juju will leverage cluster join tokens to seamlessly and safely join new cluster members. The charm will leverage the fact that Juju automatically elects an application leader among the deployed units. This leader will be responsible for creating join tokens for joining units/LXD members. The join token will be securely relayed (through Juju) to the joining LXD member that will incorporate it in its preseed file used for the initial LXD configuration. The application leader will also be responsible for removing members from the cluster. The removal will fail if any instances are left on the departing member as those need to be evacuated by the cluster operator beforehand. Juju will also provide the block device(s) to use for storage and the network interfaces to be handed over to LXD.

The charm will support the use of Juju Network Spaces, allowing to bind various LXD components to different network spaces managed directly by Juju (cluster, https, etc). Block devices will also be made available to LXD through Juju managed storage.

In addition to configuring LXD, the charm will tune various sysctl settings for production usage. Also, some files/directories under /proc and /sys will have their permissions restricted to prevent container name leakage.

If problems are encountered in the field, the charm will provide a debug action to collect the necessary information to investigate and eventually report bugs to LXD upstream. The charm will also make it possible to sideload a LXD binary or a full LXD snap to assist in debugging.

If Juju is configured to use a HTTP/HTTPS proxy, the corresponding core.proxy_* configuration keys will be set during the initialization of the LXD units. Subsequent changes to the core.proxy_* will need to be done through LXD’s REST API.

Operating modes

Two modes of operation will be supported: standalone and cluster. The default mode will be standalone where each unit will be independent of each other. In this mode, there won’t be any interactions between units.

The other mode, cluster, will have Juju orchestrate the cluster management including additions and removals of members as the cluster is scaled up or down.

Resources

For debugging purposes, the charm will allow sideloading a LXD binary (lxd-binary) or a full LXD snap (lxd-snap) by attaching resources at deploy time or later on. Both resources will also accept tarballs containing architecture specific assets to support mixed architecture deployments. Those tarballs will contain files at the root named as lxd_${ARCH} for the lxd-binary resource and lxd_${ARCH}.snap for the lxd-snap resource.

To detach a resource, the operator will need to attach an empty file as Juju does not provide a mechanism to do this.

Endpoints

The charm will offer the following endpoints to bind to Juju Network Spaces (covers LXD features to be released this cycle):

bgp (core.bgp_address)
cluster (cluster.https_address)
https (core.https_address)

Since Juju automatically binds endpoints to network spaces, those come with a configuration key (lxd-listen-<name>) to enable/disable the corresponding listener. The operator can then bind them to any space they want, the charm will then pick the first address from that space and configure LXD accordingly.

The default port for each endpoint will be used, additional configuration keys may be added later on to allow alternative ports.

Changes to those will be applied to the local LXD config live with two exceptions:

When deploying a cluster, the cluster.https_address will be set to first address in the network space provided at deploy time.
Again for clusters, the cluster endpoint will be effectively read-only. Attempts to modify it will result in the charm getting blocked and asking the user to undo their action.

Note: due to limitations with Juju, if the cluster space should use IPv6, the proper space binding needs to be configured at deploy time because the default alpha space doesn’t come with IPv6 subnets.

LXD relations

`cluster` peer

This relation (known as a “peer relation”) is formed between all the units forming the Juju LXD application. Juju automatically manages this relation when new units are created (juju add-unit and juju scale-application) and provides the hostname of the joining unit to LXD which then issues a cluster join token. Upon unit removal (juju remove-unit), LXD will remove the exiting node as the relation is dropped.

Output for a joining member unit (lxd/82 with hostname of bar) in the application data bag:

{
    "version": "1.0",
    "lxd/82": "abcd....1234",
    "member_config": '[{"entity":"storage-pool","name":"local","key":"source","value":""}]',
    "members": '{"lxd/81": "foo", "lxd/82":"bar"}'
}

`https` provider

The charm will provide an https relation to be used/consumed by other charms. The connecting charm will have to provide a X.509 certificate (in PEM format) and optionally, the list of projects to get access to. On the LXD side, the certificate will be added to the trust store and its name property will be set to juju-relation-<app> or juju-relation-<unit>. Important connection information will be returned to the connecting charm. The certificate will automatically be removed from the LXD trust store when the relation is dropped.

Input in the application data bag if clustered or in the unit data bag otherwise:

{
    "version": "1.0",
    "certificate": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
    "projects": "foo,bar"
}

Output in the application data bag if clustered or in the unit data bag otherwise:

{
    "version": "1.0",
    "certificate": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
    "certificate_fingerprint": "1234...abcd",
    "addresses": "192.0.2.1:8443,[2001:db8::1]:8443"
}

Note: if the projects member is not provided, the client will have full unrestricted access to everything. If however the projects list is an empty string, the client will not have access to anything. The projects and addresses members are strings containing comma separated lists because Juju require strings.

Note 2: either sides or even both can be clustered. When a side is clustered, the application data bag is to be used.

Configuration options

Key	Type	Default	Live update	Description
`kernel-hardening`	boolean	true	yes	Restrict access to sensitive files/dirs (`/proc/sched_debug`, `/sys/kernel/slab`)
`lxd-listen-bgp`	boolean	false	yes	Enable/disable the `core.bgp_address` listener
`lxd-listen-https`	boolean	false	yes	Enable/disable the `core.https_address` listener
`lxd-preseed`	string		no	Pass a YAML configuration to `lxd init` on initial start
`mode`	string	“standalone”	no	Operate in `standalone` or `cluster` mode
`snap-channel`	string	“latest/stable”	yes	The snap store channel for LXD to use
`snap-config-ceph-builtin`	boolean		yes	Use snap-specific ceph configuration
`snap-config-ceph-external`	boolean		yes	Use the system’s ceph tools (ignores `snap-config-ceph-builtin`)
`snap-config-criu-enable`	boolean		yes	Enable experimental live-migration support
`snap-config-daemon-debug`	boolean		yes	Increase logging to debug level
`snap-config-daemon-group`	string		yes	Set group of users that can interact with LXD
`snap-config-daemon-syslog`	boolean		yes	Send LXD log events to syslog
`snap-config-lvm-external`	boolean		yes	Use the system’s LVM tools
`snap-config-lxcfs-cfs`	boolean		no	Consider CPU shares for CPU usage
`snap-config-lxcfs-loadavg`	boolean		no	Start tracking per-container load average
`snap-config-lxcfs-pidfd`	boolean		no	Start per-container process tracking
`snap-config-openvswitch-builtin`	boolean		yes	Run a snap-specific OVS daemon
`snap-config-shiftfs-enable`	string		yes	Enable shiftfs support
`sysctl-tuning`	boolean	true	yes	Apply production sysctl tuning

The kernel-hardening option controls if some files/directories under /proc and /sys should have their permissions adjusted to prevent leaking container names to other containers.

The lxd-listen-* keys enable/disable the corresponding network listener.

The lxd-preseed key can only be set before LXD is initialized, further change attempts will be rejected. The reason for that restriction is that LXD can be configured over its REST API without Juju’s knowledge which would lead to having two sources of truth. As such, the charm is responsible for an initial configuration of LXD and further configuration is expected to be done through LXD. All the other configuration keys are managed by Juju because they are outside of LXD’s control despite being related to its operation. This setting can only be used in cluster mode.

The snap-channel allows changing the snap track/channel used to deploy LXD. This can be changed at any point in time with the caveat that LXD doesn’t support being downgraded.

The snap-config-lxcfs-* keys all require a reboot to take effect because LXCFS cannot be reconfigured live, it requires a restart which would break all running instance. Rebooting will provide a new instance of LXCFS with the new configuration. As such, changing those keys will cause the unit to enter a blocked state waiting to be rebooted. Further configuration changes will be deferred until the unit is rebooted. The show-pending-config action (described below) provides a way to show the deferred configuration changes.

The other snap-config-* keys require no special consideration and will be applied immediately (unless a reboot is pending).

The sysctl-tuning option controls if sysctl keys should be tuned for production usage.

Actions

Action	Parameters	Description
`add-trusted-client`	`name`, `cert`, `cert-url`, `projects`	The client certificate to add to the trusted list
`debug`	-	Collect useful information for bug reports (calls `lxd.buginfo`)
`show-pending-config`	-	Show the currently pending configuration changes (queued for after the reboot)

The add-trusted-client action takes multiple parameters:

name: user-specified identifier (optional)
cert: raw X.509 PEM client certificate (required if cert-url isn’t set)
cert-url: HTTP/HTTPS URL to fetch the client certificate from (required if cert isn’t set)
projects: comma separated list of projects to restrict the client certificate to (optional)

The debug action invokes lxd.buginfo that collects various useful debugging information to diagnose problems or report bugs to LXD upstream.

The show-pending-config action will display the delta between the unit’s configuration and the desired application configuration set. This is useful as changes requiring a reboot to take effect will be deferred. Those will be accumulated and applied independently for each unit. This action allows the Juju operator to see what’s left to be applied.

Upgrade handling

It will be possible to introduce new charm configuration keys during upgrades with the caveat that any new lxd-* configuration keys won’t apply to already deployed units.

stgraber · July 7, 2021, 2:06am

I don’t think I’ve heard of that one before, what’s the use case for that?

sdeziel · July 7, 2021, 1:06pm

show-config-delta displays the delta between the unit’s config and the desired app config set. Since some changes are not applied immediately as they require a reboot, they “accumulate” and you can see which are pending on a per unit basis.

stgraber · July 8, 2021, 12:36am

show-pending-config maybe?

sdeziel · July 8, 2021, 1:59pm

Thanks for the much better name, s/show-config-delta/show-pending-config/ done.

stgraber · July 8, 2021, 2:06pm

This part is a bit inaccurate, there were many LXD charms:

Initial lxd charm by the OpenStack team for use with the now defunct nova-compute-lxd. This one only supported standalone mode and was quite OpenStack specific. https://jaas.ai/lxd/27. I believe it’s a Reactive charm.
The lxd charm by the Anbox team which is used to setup LXD clusters for use by Anbox. I don’t know that this has a standalone mode and is quite Anbox specific. Charmhub | Deploy lxd using Charmhub - The Open Operator Collection. I believe it’s also a Reactive charm.
The lxd-cluster charm by Michael Skalka which was an older attempt at a generic charm to assemble LXD cluster. Charmhub | Deploy Mskalka Lxd Cluster using Charmhub - The Open Operator Collection and is likely also Reactive.

stgraber · July 8, 2021, 2:08pm

I’d add and safely.

stgraber · July 8, 2021, 2:08pm

Tiny nit, responsible for removing members

stgraber · July 8, 2021, 2:10pm

I don’t actually think we should be doing that. It gets very tricky when you have many projects, storage volumes, … I’d much prefer we just fail if it wasn’t properly emptied ahead of time.

stgraber · July 8, 2021, 2:12pm

We should probably explain why. The reason is that LXD can be configured over its REST API without Juju’s knowledge and we don’t want to end up with two sources of truth for that. So the Charm is responsible for an initial configuration to make LXD function properly. Further configuration is expected to be done through LXD.

All the other configuration keys are for things related to LXD but outside of LXD’s own control and so managed by Juju instead.

stgraber · July 8, 2021, 2:14pm

Probably should explain that this is needed because LXCFS cannot be reconfigured live, it requires a restart and such a restart would break all instances. As a result, we require a system reboot which will give us a new instance of LXCFS with its new config.

stgraber · July 8, 2021, 2:14pm

I’d drop that part and just have the snap_config_* line point to https://snapcraft.io/lxd for details on what the keys do. The LXCFS section should stay though as that’s got special handling (requires reboot).

stgraber · July 8, 2021, 2:16pm

I’d really love us to have an alternative to this which just adds an entry to the trust store instead but passing a file to an action is quite impractical… @morphis any ideas of what we could do here?

stgraber · July 8, 2021, 2:20pm

I think we’ll have to reconsider this one actually, possibly having config keys for:

lxd_https_address
lxd_debug_address
lxd_cluster_address

But rather than taking an actual address, they would likely need to use a <space>:<port> syntax and have the charm lookup a suitable address in the provided space (if provided). The reason for this being that Juju config applies to the application, not to the unit. So a specific IPv4/IPv6 address makes no sense here.

stgraber · July 8, 2021, 2:21pm

Ah, an additional thing we should just do behind the scenes is if Juju was configured to use a proxy, we should set core.proxy_* to match that config during the initial LXD config. That should be quite useful to some folks.

sdeziel · July 8, 2021, 8:19pm

The password can be provided to the action:

$ juju run-action lxd/leader set-trust-password password='foobar'
Action queued with id: "170"

$ juju show-action-output 170
UnitId: lxd/188
id: "170"
results:
  result: core.trust_password set successfully
status: completed
timing:
  completed: 2021-07-08 20:16:18 +0000 UTC
  enqueued: 2021-07-08 20:16:13 +0000 UTC
  started: 2021-07-08 20:16:17 +0000 UTC

stgraber · July 8, 2021, 8:26pm

Yeah, I meant that I’d have loved for the charm to not do anything with passwords but instead have a juju run-action lxd/leader add-trusted-client cert=my-certificate.crt projects=foo,bar

That’d have been really neat and avoiding having to deal with setting passwords that end up in logs and that grants full admin rights.

sdeziel · July 8, 2021, 8:29pm

It should be possible to add that add-trusted-client action. So you’d like this to replace the password version?

stgraber · July 8, 2021, 8:36pm

I would indeed. The catch is that there’s no easy way to transfer a file.
Maybe we can make it so that:

juju run-action lxd/leader add-trusted-client cert=$(cat my-cert.crt) projects=foo,bar works by figuring out how to re-shuffle the PEM on the receiving end

stgraber · July 8, 2021, 8:38pm

So I guess the action would be add-trusted-client with:

name user-specified identifier (optional)
cert being a raw X509 PEM cert (required if cert-url isn’t set)
cert-url being an alternative HTTP/HTTPS URL to fetch the cert from (required if cert isn’t set)
projects being a comma separated list of projects to restrict the cert to (optional)