[LXD] Instance ready state

Project LXD
Status Approved
Author(s) @monstermunchkin
Approver(s) @stgraber @tomp
Release LXD 5.5
Internal ID LX021

Abstract

Add new Ready state to instances which indicates that an instance is ready to work with.

Rationale

Once an instance has been started successfully, it goes into Running state. This state doesn’t indicate that the instance is ready, only that it’s running.

Introducing a Ready state solves this problem. Once an instance is ready, it can (but doesn’t need to) notify LXD about this.

Specification

Design

Instances will use devlxd to notify LXD that they are ready. This will be done using a PATCH to devlxd’s /1.0 endpoint, using {"state":"Ready"} as the payload.

Once the instance has notified LXD about it being ready, the volatile.last_state.ready config key will be set to true. When requesting the status code, and the instance is running and ready, it will return the new Ready status.

When an instance is shut down, the aforementioned config key will be set to false. During daemon initialization, the config key is unset for all instances. That is because LXD doesn’t truly know whether or not an instance is ready when starting; it’s possible that instances have been left running while the server has been shut down.

It is possible for an instance to return to Running state by calling PATCH /1.0 with {"state":"Started"} as the payload.

API changes

The devlxd server will gain the following new endpoint:

  • PATCH /1.0

This endpoint accepts the following payload:

type devlxdPut struct {
	State string `json:"state" yaml:"state"`
}

CLI changes

No CLI changes.

Database changes

No database changes.

Upgrade handling

No upgrade handling.

Further information

No further information.

Would it be worth clarifying here that the instance ready state is different from the existing instance lxd-agent ready state?

1 Like

I wonder if we should change LXD-AGENT-READY to LXD-AGENT-STARTED before LXD 5.4 hits, to align it with the “STARTED” status we’re getting from the ring buffer. That would avoid any confusion with the new instance ready state.

What do you think?

If you believe this might be confusing, we can change it. I don’t mind.

In what way? I was thinking something like this:

if !m.agentStarted && m.eventHandler != nil {
    go m.eventHandler("LXD-AGENT-STARTED", nil)
}

Which is effectively mapping the ring-buffer STARTED state (which indicates the LXD agent has started) into a LXD-AGENT-STARTED VM event).

Either that or use “LXD-AGENT-RUNNING” perhaps, that avoids using “STARTED” or “READY”.

Well, LXD-AGENT-STARTED is misleading as that’s what STARTED is for.

We are using:

  • STARTED
  • STOPPED

Which are basic agent states (binary running). We then need a 3rd status to indicate that the agent is able to reach the host system. Might just go with CONNECTED which is still a bit misleading as we don’t remain connected, but is better than what we have today I think.

I think I’m not explaining it well. I’m proposing changing the existing LXD-AGENT-READY event type we have to LXD-AGENT-STARTED (that indicates the agent has started) so that it doesn’t get confused with the incoming instance ready state.

The existing “STARTED” state from the ring buffer is mapped into LXD-AGENT-READY event (so as not to conflict with any event names coming from QMP directly).

I’m not proposing we use LXD-AGENT-STARTED for the new ready state event.

I’m concerned that when we introduce the new instance ready state that it will be confusing to have an existing LXD-AGENT-READY event type that actually indicates the agent has started.

Ah right and I just confirmed that we are using STARTED/STOPPED/CONNECTED on the ring buffer, so that’s fine.

The name of the internal doesn’t really matter, so sure, we can make it LXD-AGENT-STARTED and also make it a const.

Yeah constant is a good idea.

It was introduced with this commit Bidirectional vsock interface by monstermunchkin · Pull Request #10610 · lxc/lxd · GitHub and LXD-AGENT-READY made sense at the time as there was already an internal agentReady var it was indicating had changed.

But if we change agentReady to agentStarted and LXD-AGENT-READY to LXD-AGENT-STARTED before LXD 5.4 is released, then that will ensure there is no confusion when the new ready state is introduced here.

I’m not sure this is correct reason for this project. My understanding is that we would provide the ability for an application inside the instance to call via devlxd to indicate that the instance is ready from the perspective of the what the admin wants to indicate “ready” (this might be that certain applications have started up and perform their initialisation).

We technically already known when the instance agent has started, and lxc exec and the like won’t even try to connect to the VM until it has detected the agent is running.

My understanding is that it is an application inside the instance, aside from lxd-agent, that will call this endpoint.

I dont think this is needed.

I prefer this one.

Note that we’ll also need to make sure that this gets cleared on startup so we don’t end up incorrectly marking instances as ready.

I covered that with

Also, during daemon initialization, the value is set to `false` for all instances.

The further information bit can go now we changed the lxd agent ready state to start.

1 Like

Should an instance be able to reset its state to Running?

Yeah, that’d be good to have I think.

By “reset” do you mean remove the “ready” status and/or set the “ready” status again?