[LXD] Bidirectional vsock interface for VMs

stgraber · June 15, 2022, 3:30pm

Yeah, it would connect to validate that the port and creds are functional and then immediately disconnect. As much as possible, we don’t want to have each instance maintain connections to LXD as that causes scalability issues (number of file descriptors, number of goroutines, …).

stgraber · June 15, 2022, 3:33pm

The agent state isn’t usually visible. We could have it log it when lxd-agent is run in debug mode, but otherwise it’s really all internal to LXD and not something we expose to the user. Additionally we could also have LXD itself log debug messages as it notices the changes in the ring buffer.

LXD uses that information to know whether it can connect to the agent so we don’t ever hit the vsock connection timeout which would otherwise delay all instance operations by 3s per instance.

tomp · June 15, 2022, 3:43pm

Yep makes sense, good to have it clarified as wasn’t clear in spec.

tomp · June 15, 2022, 3:45pm

We have that case handled today with the “STARTED” state, but if we aren’t going to expose the agent state anywhere, what is the purpose internally for LXD to know the agent has connected to its vsock via the “CONNECTED” state? This is what I’m not quite getting yet.

stgraber · June 15, 2022, 3:54pm

Currently, being able to know whether the VM has been able to connect or not isn’t useful yet. It will however be a useful debugging tool when we start working on the ready state and will likely be a requirement when we’re dealing with driving nested LXD at some point next year.

tomp · June 15, 2022, 4:02pm

I see thanks, that makes things clearer now, I couldn’t see before how the “CONNECTED” state was useful in the context of this.

I do think it would be useful to put this into the lxc info <instance> output, so it can be inspected both for debugging users’ systems and for automated testing, seeing as we’re going to be storing it internally anyway. At the moment its implicit from lxc ls if you don’t get the VM’s NIC interface name.

Do you see any issue with doing that?

Presumably we will also be adding a field to lxc info <instance> for the ready state too when we come to add that feature?

stgraber · June 15, 2022, 4:41pm

I don’t like the idea of extending /1.0/instances/NAME/state purely to expose a debugging detail which we don’t intend for user consumption. If we absolutely need it in the API, it should be under /internal.

The READY state will be a state just like RUNNING is today, so it won’t be a new field.

tomp · June 15, 2022, 6:03pm

Fair enough. Yes /internal would be fine too (I believe we already have at least one route we use for testing introspection like that). Would be good to catch regressions in both lxd and the agent in our VM tests.

stgraber · June 16, 2022, 4:45pm

@monstermunchkin
Based on the above chat with @tomp, can you add:

Mention that lxd-agent will attempt a quick connection to LXD /1.0 on devlxd API to validate things are functional before transitioning to CONNECTED.
Mention that the connection to LXD isn’t persistent. The agent will connect to LXD as needed based on requests on its devlxd listener.
Add an internal API endpoint to tell the agent state for a given instance.

@tomp does that cover it for you?

tomp · June 16, 2022, 6:12pm

Spot on thanks

tomp · July 11, 2022, 9:59am