Have LXD itself listen for VM sockets connections.
Rationale
The current state is that the lxd-agent listens on the VM vsock socket inside of the VM once it’s up and running. This allows for requests from LXD to the VM. There’s currently no way of the VM talking to LXD.
If LXD were to listen on the vsock, the VM could send requests to it. This would e.g. enable the VM to notify LXD that it’s ready i.e. done booting.
Specification
Design
The LXD server will listen on the vsock. Clients (VMs) can connect to the vsock using their client certificate, and make requests through it. The server will be listening on a random port. The port and context ID are communicated to an instance via PUT /1.0 on the lxd-agent.
Once LXD detects the already used STARTED state in the ring-buffer, it will call PUT /1.0 on the lxd-agent. The instance (using lxd-agent) will then attempt a quick connection to LXD /1.0 on devlxd API to validate things are functional. It will set the state to CONNECTED if successful. If it fails, or it gets disconnected the state should go back to STARTED effectively waiting for LXD to call PUT /1.0 again.
The connection isn’t persistent. The agent will connect to LXD as needed based on requests on its devlxd listener.
The current /dev/lxd handler in VMs will be replaced by this interface. The instance-data file containing various information can be removed as it will not be needed anymore.
Hmm, no, that’d conflict with a potential VM. Instead we should use 0 (hypervisor) or 2 (host) for this and have one listener per daemon, we can easily identify what instance is talking to us from the certificate they use.
We’ll also need to handle the case where we’re running multiple LXD daemons so the host can’t use the default port of 8443 (as will be the case in Jenkins) as well as the case where LXD is itself running inside of a VM and so likely can’t use 2 as the address but will likely have to use the address of the VM (its CID).
So we effectively need a way to communicate and update the host CID and port to the agent in the guest. We may be able to do that over the existing serial port and its ring buffer.
I just looked into the vsock package, and it’s possible to listen on a random port. Would that be OK, or should we start at specific port, e.g. 9000, and increase this number depending on the number of LXD daemons?
Random port is fine. It’s not user visible so as long as we can communicate the port to all running instances on startup so they can figure out how to reach LXD, that’s fine.
For that part, I was thinking of using the ring-buffer of our serial device.
Currently it can contain:
STARTED
STOPPED
That’s written by lxd-agent to allow LXD to quickly check whether we have an agent or not without having to hit vsock. We could extend this with another state:
CONNECTED
And so when LXD detects the STARTED state, it would write something like CONNECT 2:12345, the agent would then try to connect. If it succeeds, it would update the state to CONNECTED, if it fails or later gets disconnected, it should go back to STARTED effectively waiting for LXD to write a new CONNECT XYZ string.
Am I correct in understanding this to mean that each VM instance will keep a persistent vsock connection open to the LXD process?
Also could we simplify the ring-buffer protocol, such that the lxd-agent would be periodically writing “STARTED” in a loop, and the lxd process would be writing “CONNECT …” periodically every time it read “STARTED” from the lxd-agent.
On the lxd-agent side it would need to try and read from the ring-buffer and wait until the LXD process had written the current “CONNECT …” string before writing the “STARTED” message again.
I’m not sure what the need is for the “CONNECTED” message, this can be inferred from the “STARTED” message, and if we do need to know if the lxd-agent has connected back to the LXD vsock we should keep track of those connections in LXD and use that as the indication itself.
Will this be done as part of this specification, if so how will it be done?
IIRC it will be a /dev/lxd request that a VM application can write to to indicate its finished booting?
lxd-agent will connect to LXD when it needs something from it.
The only case where we’d keep a persistent connection is if the user inside of the VM accesses /1.0/events on /dev/lxd/sock as that requires a persistent connection to LXD to get the events.
So now that I understand the VM->lxd connections won’t be persistent, i’m still not clear what purpose the “CONNECTED” message serves? Its not the same as considering the VM “booted” which will presumably be posted via the /dev/lxd connection (and established on demand).
Although one thing we should consider is what happens after the VM user application has indicated it has finished booted, but then LXD restarts, we’d need the lxd-agent to remember the booted status and feed it back to LXD. But that is probably more relevant when we come to add that feature.
Have LXD know when an agent is available (above, this would be the STARTED, CONNECT or CONNECTED states)
Have the agent know when to update its connection information for the host (that’s the CONNECT state)
Have the agent be able to report that it’s working (CONNECTED state)
Have the agent be able to report that it’s not working and needs updated information (goes back to STARTED state)
Have the host LXD update all agents on startup with new connection information (CONNECT)
As the connections aren’t persistent, the CONNECTED state means that the agent managed to connect the last time it tried. An attempt will be made immediately upon receiving a CONNECT.
Because reading the ring-buffer isn’t a free operation, we could in theory adjust the frequency based on the state. Being pretty aggressive with checking when in CONNECT, very slow when in CONNECTED and somewhere in between when in STOPPED.
Yeah this is the bit from the current spec that confuses me. It says it will immediately connect, but does it disconnect immediately afterwards. That bit is a key part that is missing if that is the intention.
Just for my own knowledge, does writing to the ring-buffer (on either side) mean that one will immediately be able to read back what we just wrote (i.e shared memory) or is that messages sent on one side only appear on the other side?
Also it would be good for the spec to indicate where this state info will be presented, as it will be very useful when adding tests for this feature to be able to access the agent status from the lxc command.
I see, thanks. So coordinating that would mean that each side would keep reading what they just wrote over and over again and then they see it change take it as a message from the other side.