[LXD] Bidirectional vsock interface for VMs

Yeah, it would connect to validate that the port and creds are functional and then immediately disconnect. As much as possible, we don’t want to have each instance maintain connections to LXD as that causes scalability issues (number of file descriptors, number of goroutines, …).

1 Like

The agent state isn’t usually visible. We could have it log it when lxd-agent is run in debug mode, but otherwise it’s really all internal to LXD and not something we expose to the user. Additionally we could also have LXD itself log debug messages as it notices the changes in the ring buffer.

LXD uses that information to know whether it can connect to the agent so we don’t ever hit the vsock connection timeout which would otherwise delay all instance operations by 3s per instance.

Yep makes sense, good to have it clarified as wasn’t clear in spec.

We have that case handled today with the “STARTED” state, but if we aren’t going to expose the agent state anywhere, what is the purpose internally for LXD to know the agent has connected to its vsock via the “CONNECTED” state? This is what I’m not quite getting yet.

Currently, being able to know whether the VM has been able to connect or not isn’t useful yet. It will however be a useful debugging tool when we start working on the ready state and will likely be a requirement when we’re dealing with driving nested LXD at some point next year.

I see thanks, that makes things clearer now, I couldn’t see before how the “CONNECTED” state was useful in the context of this.

I do think it would be useful to put this into the lxc info <instance> output, so it can be inspected both for debugging users’ systems and for automated testing, seeing as we’re going to be storing it internally anyway. At the moment its implicit from lxc ls if you don’t get the VM’s NIC interface name.

Do you see any issue with doing that?

Presumably we will also be adding a field to lxc info <instance> for the ready state too when we come to add that feature?

I don’t like the idea of extending /1.0/instances/NAME/state purely to expose a debugging detail which we don’t intend for user consumption. If we absolutely need it in the API, it should be under /internal.

The READY state will be a state just like RUNNING is today, so it won’t be a new field.

Fair enough. Yes /internal would be fine too (I believe we already have at least one route we use for testing introspection like that). Would be good to catch regressions in both lxd and the agent in our VM tests.

@monstermunchkin
Based on the above chat with @tomp, can you add:

  • Mention that lxd-agent will attempt a quick connection to LXD /1.0 on devlxd API to validate things are functional before transitioning to CONNECTED.
  • Mention that the connection to LXD isn’t persistent. The agent will connect to LXD as needed based on requests on its devlxd listener.
  • Add an internal API endpoint to tell the agent state for a given instance.

@tomp does that cover it for you?

Spot on thanks