On that reconnection point, will lxd be buffering any log messages if the connection to Loki drops temporarily?
Well, that part was for the lxc monitor
case mentioned above.
For the LXD to Loki case, as Loki will be an internal log handler, we can have the handler buffer or block on the reconnection.
Would help a lot, as people have different tools of data gathering and processing.
Just to mention, I have a log rotation for that simple text file, later further processed by selecting projects, instances, event type âŠ
I see the issue of reconnection as well, in my case nodemon/pm2 help keep the node websocket up, in case termination it reconnects.
But as for lxd, I have no idea to achieve that.
Ok Iâll keep an eye out for that in the implementation. Thanks
Are loki.api.cert
and loki.api.key
files? If so can we name them loki.api.cert_file
and loki.api.key_file
? For consistency with cephobject.radosgw.endpoint_cert_file
.
Will they also be removed from the resulting log message?
Actually, both keys are strings not file paths.
Yes, they will. I just clarified this in the spec.
I wonder if they should be files. Do we have a precedent of storing certs/keys in the database vs files?
Also is this cert/key a per-cluster-member key or a global key?
As I am not familiar with Loki protocol, it would be great to describe it at a top level.
Some questions I am thinking about are:
- Is it a persistent TCP connection, or opened per event (doubtful but worth checking)?
- Assuming its persistent, how will we detect losing a connection (does it support TCP keepalives)?
- How will we deal with re-connections? Especially if multiple events are coming through that need to be delivered?
- In the case that a connection is closed, how long/how many events will we buffer to redeliver before dropping them?
We have private and public keys for rbac: rbac.agent.private_key
and rbac.agent.public_key
.
All config keys are cluster-wide (lxd/cluster/config/config.go
).
No, itâs a Rest API so we call <host>/loki/api/v1/push
for each event.
Not persistent.
Each event will cause a POST
to the aforementioned endpoint. If the host cannot be reached for whatever reason, we could just retry every X seconds, and discard the event after Y seconds.
See answer above.
OK makes sense, so for cluster wide config we use key variable settings (which avoids the need to replicate the config files onto each cluster member manually). Cool.
This surprised me.
It sounds like it wouldnât perform well, and if we were sending lots of events concurrently we would end up opening many connections to the Loki server, potentially overwhelming it.
So I looked at some of the official clients for Loki and came across Promtail (which is a standalone command rather than a package).
However inside it is a client
package we could potentially use:
https://pkg.go.dev/github.com/grafana/loki/pkg/promtail/client
But aside from potentially being able to use it, I was interested in seeing how it managed connections to the Loki server(s).
We can see that the New()
function returns a client that internally has a single go routine that handles entries from:
and batches them up
It also has the concept of retries with backoff delays too.
So I suspect we should be doing something similar, if not using this client package directly.
@tomp I had a look at the client, and weâll be doing something similar. But that neednât be mentioned in the spec as thatâs specific to the implementation.
OK thanks. In that case the spec looks good to me.
@stgraber is there anything I should add, or can I mark the spec as approved?
I think itâs fine.
Regarding authentication, Loki itself doesnât do authentication. Instead, they suggest using a reverse proxy. The way the spec in written now, we only support mTLS or no authentication. Should we also add support for basic authentication?
If so, we might want to consider using the following keys:
-
loki.auth.type
(takes""
(none),"mtls"
,"basic"
) loki.auth.cert
loki.auth.key
loki.auth.ca_cert
loki.auth.username
loki.auth.password
Yeah, I suspect we should probably start with just basic auth, that would make things a bit cleaner and thatâs likely what most folks will do as TLS based auth is annoying to setup in something like nginx.
Even if we end up supporting TLS based auth, we wouldnât need/want the type
one as itâd technically be possible to do both, so we should just set basic auth if provided and TLS auth if provided, if both are provided, then do both.
Anyway, for now, I think we can drop the certificate ones and stick with just username and password.
Iâd probably do:
loki.api.url
=> URL to LOKI endpoint
loki.api.ca_cert
=> If provided, CA cert for server
loki.auth.username
=> username for basic auth
loki.auth.password
=> password for basic auth