[LXD] First class cloud-init support

Project LXD
Status Implemented
Author(s) @monstermunchkin
Approver(s) @stgraber @cpaelzer @Chad_Smith
Release 4.21
Internal ID LX009

Abstract

Add first class support for cloud-init.

Rationale

LXD currently supports cloud-init for VMs using the user config keys user.vendor-data, user.user-data, and user.network-config. For containers, the cloud-init config cannot be changed using these keys.

Specification

Design

The cloud-init configuration will be done using the following new cloud-init config keys:

  • cloud-init.vendor-data
  • cloud-init.user-data
  • cloud-init.network-config

These keys will be exposed through /dev/lxd as well.

API changes

The following new endpoint will be added to /dev/lxd:

  • GET /1.0/devices (list instance devices)

Furthermore, the instance type (container or virtual-machine) will be exposed through /dev/lxd in GET /1.0.

Upgrade handling

Once the new cloud-init config keys are available, the contents of the previously used user config keys should be moved over.

Maybe i’m early but this seems important, are you suggesting removing the user keys and replacing them with cloud-init what are the benefits? Slightly clearer commands? Seems pointless (signed a noted pessimist).

I’m suggesting removing these three user keys, yes. The cloud-init key namespace will make cloud-init support more official as user keys can be just anything. To users this most likely will also make cloud-init support obvious.

cloud-init support seems like something someone would have to read the docs for regardless.

Do you really gain anything from a new top level key? For me its just another doc / code entry like;

if(supports_extension("cloud-init")){
    config["cloud-init.user-data"] = "TEXT/LINK/ZIP/WHATEVER"
}else{
    config["user.user-data"] = "TEXT/LINK/ZIP/WHATEVER"
}

I dont wanna hold up progress, it just seems like a pointless / superficial change.

@stgraber you’re the person here, tell me to stop complaining and I will.

Having a separate namespace (cloud-init) in this case does make it feel more official and also simplifies security handling in a few places.

The user.XYZ type keys should be entirely under the user’s control and should never be read or interacted with by LXD itself. Currently that’s true everywhere except for the cloud-init keys. So we’re quite keen on fixing that.

We haven’t done it before because of the lack of proper LXD support in cloud-init itself, so it was all a big hack and we decided to wait until we saw a clear path forward.

This has now changed as the cloud-init team is working on a proper LXD datasource which will use the /dev/lxd API to talk to LXD rather than rely on all our images shipping the right set of file templates.

So the changes in this specification let us move to a place where we have cloud-init properly interacting with LXD over an API, using proper configuration keys which LXD can validate and apply restrictions to and have nice clear documentation on both the LXD and cloud-init side on how to use cloud-init with LXD.

@stgraber anything else that needs to go into the spec? I believe I covered everything that was mentioned in the issue.

@monstermunchkin Question: cloud-init LXD datasource also consumes user.network_mode == “link-local” to determine whether to render default network config v1 as “manual” for “eth0”. Not sure whether that shoiuld also be reflected in cloud-init.* keys?

network_mode effectively went away with LXD 2.1 so we can ignore it and likely deprecate whatever logic may be left depending on that.

It dates back to when LXD couldn’t create and run a full network bridge of its own.

ok good to hear @stgraber.

@monstermunchkin, cloud-init approves this spec and we will prefer cloud-init.* configuration keys when present and fall back to user.vendor-data|network-config|meta-data when absent. My expectation is that the keys will exist under the socket API endpoints:
http://lxd/1.0/config/cloud-init.(vendor-data|user-data|network-config). If those config keys are absent, cloud-init will fallback to config/user.(user-data|vendor-data|network-config)

cloud-init will also drop consuming user.network_mode from the LXD datasource.

Excellent, thanks!

Marking as approved.

Additional work we’ll need to do in support of this:

  • Update our LXD templates in distrobuilder to prefer the new variables over the old ones (@monstermunchkin)
  • Send a matching update to CPC for the official Ubuntu images (@stgraber)
  • Update the Ubuntu community images to prefer the LXD datasource in cloud-init (@monstermunchkin)

@stgraber , @monstermunchkin LGTM as well. The one idea that came to mind (but that might be a follow on step) is that cloud-init is planning to provide schema validation (as invalid data is one of most common source of issues). Maybe LXD could somehow expose that ability or even auto-validate what a LXD admit sets up to pass to the container.

@cpaelzer, good thought and suggestion and I think something like this would be a followup item for next cycle when cloud-init schema validation has strict coverage for all config modules. Cloud-init’s plan for this cycle will only be the ability to validate #cloud-config user-data via the command cloud-init devel schema --config-file=<some.yaml> --annotate. LXD could potentially invoke that directly in the container image in the future to get machine-readable validation of the cloud-init.user-data, cloud-init.vendor-data provided. But, I do think we’ll need to iron out how best LXD can be informed about supported cloud-init schemas so that it doesn’t have to exec into the image to do validation when someone runs a lxc config set cloud-init.user-data="…". For instance, LXD might also want to perform validation on cloud-init.(user-data|vendor-data) at profile creation time too.

Yeah, moving those keys to a dedicated namespace (cloud-init instead of user) will definitely let us setup proper validation for them.

We’ll have to see what ends up being the best way to validate though. Calling an external python script every time a config needs validation would likely get very very expensive (modifying a profile causes all downstream instances to get re-validated).

If there was some kind of standard format for a YAML schema with a Go implementation of such a validator, then it’d likely be quite cheap and something we’d definitely like to do. Folks making a typo and ending up with non-working cloud-init is a very very common thing :slight_smile:

I’ve added schema validation to our ideas list for next cycle. We’ll have to see how things stand at that point and if there’s something we can use to validate this cheaply.

If you’re looking for a schema validation tool with native Go libraries, I recommend the CUE configuration language. We have been using it to great effect for generating Kubernetes YAML from Go. It’s a newer project, but I’ve become a big fan.

You would have to take on a dependency on pre-1.0 CUE to build this into LXD, but at least it’s a pure-Go dependency.

@stgraber I believe there is also a user.meta-data config key which is tied to cloud-init. Did we just forget to mention it here and in the issue, or must this remain as user.meta-data?

We will not keep that configuration key moving forward. It’s always been a very odd one with no real use cases, so it will just go away completely.

@monstermunchkin @stgraber is it intentional that for ubuntu:focal you still need to use user.user-data and for images:ubuntu/focal/cloud you have to use cloud-init.user-data.

It is quite confusing if someone starts looking at cloud-init for the first time (like I did). The documentation at cloud-init - LXD documentation does not say a word about it. It suggests that both ubuntu:focal and images:ubuntu/focal/cloud should work with cloud-init.user-data.