[LXD] Instance ready state

monstermunchkin · July 20, 2022, 2:37pm


Project	LXD
Status	Implemented
Author(s)	@monstermunchkin
Approver(s)	@stgraber @tomp
Release	LXD 5.5
Internal ID	LX021

Abstract

Add new Ready state to instances which indicates that an instance is ready to work with.

Rationale

Once an instance has been started successfully, it goes into Running state. This state doesn’t indicate that the instance is ready, only that it’s running.

Introducing a Ready state solves this problem. Once an instance is ready, it can (but doesn’t need to) notify LXD about this.

Specification

Design

Instances will use devlxd to notify LXD that they are ready. This will be done using a PATCH to devlxd’s /1.0 endpoint, using {"state":"Ready"} as the payload.

Once the instance has notified LXD about it being ready, the volatile.last_state.ready config key will be set to true. When requesting the status code, and the instance is running and ready, it will return the new Ready status.

When an instance is shut down, the aforementioned config key will be set to false. During daemon initialization, the config key is unset for all instances. That is because LXD doesn’t truly know whether or not an instance is ready when starting; it’s possible that instances have been left running while the server has been shut down.

It is possible for an instance to return to Running state by calling PATCH /1.0 with {"state":"Started"} as the payload.

API changes

The devlxd server will gain the following new endpoint:

PATCH /1.0

This endpoint accepts the following payload:

type devlxdPut struct {
	State string `json:"state" yaml:"state"`
}

CLI changes

No CLI changes.

Database changes

No database changes.

Upgrade handling

No upgrade handling.

Further information

No further information.

tomp · July 20, 2022, 2:41pm

Would it be worth clarifying here that the instance ready state is different from the existing instance lxd-agent ready state?

github.com

lxc/lxd/blob/master/lxd/instance/drivers/qmp/monitor.go#L64

      
        
            		}
            
            
		// Extract the last entry.
            		entries := strings.Split(resp.Return, "\n")
            		if len(entries) > 1 {
            			status := entries[len(entries)-2]
            
            
			m.agentReadyMu.Lock()
            			if status == "STARTED" {
            				if !m.agentReady && m.eventHandler != nil {
            					go m.eventHandler("LXD-AGENT-READY", nil)
            				}
            
            
				m.agentReady = true
            			} else if status == "STOPPED" {
            				m.agentReady = false
            			}
            
            
			m.agentReadyMu.Unlock()
            		}
            	}

tomp · July 20, 2022, 3:08pm

I wonder if we should change LXD-AGENT-READY to LXD-AGENT-STARTED before LXD 5.4 hits, to align it with the “STARTED” status we’re getting from the ring buffer. That would avoid any confusion with the new instance ready state.

What do you think?

monstermunchkin · July 20, 2022, 3:13pm

If you believe this might be confusing, we can change it. I don’t mind.

tomp · July 20, 2022, 3:16pm

In what way? I was thinking something like this:

if !m.agentStarted && m.eventHandler != nil {
    go m.eventHandler("LXD-AGENT-STARTED", nil)
}

Which is effectively mapping the ring-buffer STARTED state (which indicates the LXD agent has started) into a LXD-AGENT-STARTED VM event).

tomp · July 20, 2022, 3:19pm

Either that or use “LXD-AGENT-RUNNING” perhaps, that avoids using “STARTED” or “READY”.

stgraber · July 20, 2022, 3:24pm

Well, LXD-AGENT-STARTED is misleading as that’s what STARTED is for.

We are using:

STARTED
STOPPED

Which are basic agent states (binary running). We then need a 3rd status to indicate that the agent is able to reach the host system. Might just go with CONNECTED which is still a bit misleading as we don’t remain connected, but is better than what we have today I think.

tomp · July 20, 2022, 3:31pm

I think I’m not explaining it well. I’m proposing changing the existing LXD-AGENT-READY event type we have to LXD-AGENT-STARTED (that indicates the agent has started) so that it doesn’t get confused with the incoming instance ready state.

The existing “STARTED” state from the ring buffer is mapped into LXD-AGENT-READY event (so as not to conflict with any event names coming from QMP directly).

github.com

lxc/lxd/blob/master/lxd/instance/drivers/qmp/monitor.go#L64

      
        
            		}
            
            
		// Extract the last entry.
            		entries := strings.Split(resp.Return, "\n")
            		if len(entries) > 1 {
            			status := entries[len(entries)-2]
            
            
			m.agentReadyMu.Lock()
            			if status == "STARTED" {
            				if !m.agentReady && m.eventHandler != nil {
            					go m.eventHandler("LXD-AGENT-READY", nil)
            				}
            
            
				m.agentReady = true
            			} else if status == "STOPPED" {
            				m.agentReady = false
            			}
            
            
			m.agentReadyMu.Unlock()
            		}
            	}

I’m not proposing we use LXD-AGENT-STARTED for the new ready state event.

I’m concerned that when we introduce the new instance ready state that it will be confusing to have an existing LXD-AGENT-READY event type that actually indicates the agent has started.

stgraber · July 20, 2022, 3:36pm

Ah right and I just confirmed that we are using STARTED/STOPPED/CONNECTED on the ring buffer, so that’s fine.

The name of the internal doesn’t really matter, so sure, we can make it LXD-AGENT-STARTED and also make it a const.

tomp · July 20, 2022, 3:37pm

Yeah constant is a good idea.

It was introduced with this commit Bidirectional vsock interface by monstermunchkin · Pull Request #10610 · lxc/lxd · GitHub and LXD-AGENT-READY made sense at the time as there was already an internal agentReady var it was indicating had changed.

But if we change agentReady to agentStarted and LXD-AGENT-READY to LXD-AGENT-STARTED before LXD 5.4 is released, then that will ensure there is no confusion when the new ready state is introduced here.

tomp · July 21, 2022, 10:04am

I’m not sure this is correct reason for this project. My understanding is that we would provide the ability for an application inside the instance to call via devlxd to indicate that the instance is ready from the perspective of the what the admin wants to indicate “ready” (this might be that certain applications have started up and perform their initialisation).

We technically already known when the instance agent has started, and lxc exec and the like won’t even try to connect to the VM until it has detected the agent is running.

tomp · July 21, 2022, 10:07am

My understanding is that it is an application inside the instance, aside from lxd-agent, that will call this endpoint.

tomp · July 21, 2022, 10:08am

I dont think this is needed.

tomp · July 21, 2022, 10:09am

I prefer this one.

stgraber · July 21, 2022, 7:48pm

Note that we’ll also need to make sure that this gets cleared on startup so we don’t end up incorrectly marking instances as ready.

monstermunchkin · July 21, 2022, 7:50pm

I covered that with

Also, during daemon initialization, the value is set to `false` for all instances.

tomp · July 21, 2022, 7:53pm

The further information bit can go now we changed the lxd agent ready state to start.

monstermunchkin · July 21, 2022, 7:58pm

Should an instance be able to reset its state to Running?

stgraber · July 21, 2022, 8:33pm

Yeah, that’d be good to have I think.

tomp · July 21, 2022, 8:39pm

By “reset” do you mean remove the “ready” status and/or set the “ready” status again?