Incus certificate management - am I overthinking this?

I am running an Incus cluster across 3 machines (successfully) and curious how others are handling day-to-day operations.

Specifically:

  • How do you manage remote access for team members? (certs, auth, etc.)
  • Backup strategy? Built-in tools or custom scripts?
  • Any dashboard/monitoring you’re using, or just CLI?

Currently doing most of this manually and wondering if I’m missing something or if that’s just the Incus way.

What does your setup look like?

Thanks,

Danny

For shared environments I use OIDC (Keycloak as IDP) and OpenFGA, that provides the most flexibility, especially in quickly being able to extend/revoke access and allows enforcement of 2FA and the like through the IDP.

For my own stuff, I have a backup of /var/lib/incus/database (excluding any instance/volume/image data), mostly to be able to recover the database if needed.

Then I usually use backups within the guests to a backup server as that provides the best, most consistent backups. All instances and volumes have daily snapshots with weekly expiry to handle the vast majority of the recovery cases (undoing a recent mistake).

I’ve got all metrics pulled into Prometheus and all logs sent into Loki which then lets me use Grafana dashboards and alerts. Though similar to backups, the most useful metrics aren’t the ones exposed by Incus but the ones directly retrieved from the applications which can then be combined with the ones from Incus when debugging an issue.

Thanks Stéphane - really helpful to see your production setup.

So you’re running the full stack: Keycloak + OIDC + OpenFGA for auth, then Prometheus + Loki + Grafana for observability. That’s proper enterprise-grade but yeah, quite a few moving parts to wire together.

I’m running a 3-node Incus 6.16 cluster on NixOS (two servers, one desktop) with a bunch of containers and a few VMs spread across the nodes. I am considering bumping up the amount of nodes as things grow. I’ve been manually stitching together cert management, backup orchestration, and trying to get unified visibility across the cluster. incus list on a tablet that doesn’t have proper font rendering (like in Termius) can be a tough read.

I’m considering building something that integrates these capabilities - basically bundling what you’ve described (OIDC, RBAC, monitoring, backups) with a management UI. The goal would be making that stack accessible to teams who need it but don’t have the bandwidth to architect and maintain it themselves.

Two questions if you have a minute:

  1. For production Incus deployments, do you see value in an integrated solution that handles auth + monitoring + backups out of the box? Or is the flexibility of separate tools more important for most users?

  2. Would something like this make sense as an independent tool, or would it be better positioned as contributions to the Incus ecosystem itself?

I’m trying to figure out if I’m solving a real problem or just over-engineering my own setup. Don’t want to build something that duplicates effort or misses the mark on what the community actually needs.

Appreciate any thoughts.

A bit off topic, but something I struggle with sometimes is being able to extract instances YAML config from an old Incus/LXD installation where the service is not running. Is there some tool that can be used extract instances/profiles/networks/storage YAML from an offline database?

Would you mind posting a drawing (mermaid or somesuch) of the 2fa integration? I have protected my incus interfaces and just use port forwarded ssh to control? Also your config (with secrets moved) would be interesting!