We encountering various I/O and CPU bottleneck on multiple LXD bare metal servers running production containers. We have linked the I/O spikes to the backup routines that perform snapshots and exports of containers to be replicated on a storage NAS. The issue is that the LXD processes of “publishing” and “exporting” a snapshot seem to hog the disk and compete with the services running inside the containers for disk access. As a result we experience production downtimes during the backup slots.
Our tests to “renice” and “ionice” the backup processes in order to deprioritise them have failed. I suspect this is because both “lxc publish” and “lxc image export” commands are the client part of the deal and that setting a nice priority for them have no impact over the server side of LXD which actually perform the publication and export using its default “high” priority, which provokes the bottleneck.
I am not in favor of playing with nice and ionice over the LXD daemon as this will impact the whole system besides the backup operations.
What would be the recommendation to deprioritise (in terms of CPU and I/O) a specific set of instructions performed by the LXD server?