[LXD] Stream lifecycle and log events to Loki

On that reconnection point, will lxd be buffering any log messages if the connection to Loki drops temporarily?

Well, that part was for the lxc monitor case mentioned above.

For the LXD to Loki case, as Loki will be an internal log handler, we can have the handler buffer or block on the reconnection.

Would help a lot, as people have different tools of data gathering and processing.
Just to mention, I have a log rotation for that simple text file, later further processed by selecting projects, instances, event type 

I see the issue of reconnection as well, in my case nodemon/pm2 help keep the node websocket up, in case termination it reconnects.
But as for lxd, I have no idea to achieve that.

Ok I’ll keep an eye out for that in the implementation. Thanks

@stgraber and @tomp is the spec OK? If so, we can mark it as approved.

Are loki.api.cert and loki.api.key files? If so can we name them loki.api.cert_file and loki.api.key_file? For consistency with cephobject.radosgw.endpoint_cert_file.

Will they also be removed from the resulting log message?

Actually, both keys are strings not file paths.

Yes, they will. I just clarified this in the spec.

1 Like

I wonder if they should be files. Do we have a precedent of storing certs/keys in the database vs files?

Also is this cert/key a per-cluster-member key or a global key?

As I am not familiar with Loki protocol, it would be great to describe it at a top level.
Some questions I am thinking about are:

  • Is it a persistent TCP connection, or opened per event (doubtful but worth checking)?
  • Assuming its persistent, how will we detect losing a connection (does it support TCP keepalives)?
  • How will we deal with re-connections? Especially if multiple events are coming through that need to be delivered?
  • In the case that a connection is closed, how long/how many events will we buffer to redeliver before dropping them?

We have private and public keys for rbac: rbac.agent.private_key and rbac.agent.public_key.

All config keys are cluster-wide (lxd/cluster/config/config.go).

No, it’s a Rest API so we call <host>/loki/api/v1/push for each event.

Not persistent.

Each event will cause a POST to the aforementioned endpoint. If the host cannot be reached for whatever reason, we could just retry every X seconds, and discard the event after Y seconds.

See answer above.

OK makes sense, so for cluster wide config we use key variable settings (which avoids the need to replicate the config files onto each cluster member manually). Cool.

This surprised me.

It sounds like it wouldn’t perform well, and if we were sending lots of events concurrently we would end up opening many connections to the Loki server, potentially overwhelming it.

So I looked at some of the official clients for Loki and came across Promtail (which is a standalone command rather than a package).

However inside it is a client package we could potentially use:

https://pkg.go.dev/github.com/grafana/loki/pkg/promtail/client

But aside from potentially being able to use it, I was interested in seeing how it managed connections to the Loki server(s).

We can see that the New() function returns a client that internally has a single go routine that handles entries from:

and batches them up

It also has the concept of retries with backoff delays too.

So I suspect we should be doing something similar, if not using this client package directly.

1 Like

@tomp I had a look at the client, and we’ll be doing something similar. But that needn’t be mentioned in the spec as that’s specific to the implementation.

OK thanks. In that case the spec looks good to me.

@stgraber is there anything I should add, or can I mark the spec as approved?

I think it’s fine.

Regarding authentication, Loki itself doesn’t do authentication. Instead, they suggest using a reverse proxy. The way the spec in written now, we only support mTLS or no authentication. Should we also add support for basic authentication?

If so, we might want to consider using the following keys:

  • loki.auth.type (takes "" (none), "mtls", "basic")
  • loki.auth.cert
  • loki.auth.key
  • loki.auth.ca_cert
  • loki.auth.username
  • loki.auth.password

Yeah, I suspect we should probably start with just basic auth, that would make things a bit cleaner and that’s likely what most folks will do as TLS based auth is annoying to setup in something like nginx.

Even if we end up supporting TLS based auth, we wouldn’t need/want the type one as it’d technically be possible to do both, so we should just set basic auth if provided and TLS auth if provided, if both are provided, then do both.

Anyway, for now, I think we can drop the certificate ones and stick with just username and password.

I’d probably do:

loki.api.url => URL to LOKI endpoint
loki.api.ca_cert => If provided, CA cert for server
loki.auth.username => username for basic auth
loki.auth.password => password for basic auth

1 Like