Elevated LXD host disk utilization - 'global' database 'operations' table

rhodgkin · January 5, 2021, 9:41pm

Hi guys,
We are running LXD container instances on Ubuntu 18.04 hosts in a non-clustered mode - LXD version ‘3.0.3’. We are periodically executing various “lxc exec ${INSTANCE_NAME} …” commands on the host to pull BGP/IPSec/etc status from the containers and using logstash to glean the desired information. The issue we are seeing is that as we scale up the number of containers and thus overall number of “lxc exec” commands we are seeing highly elevated disk usage which seems to be due to disk I/O writes to the ‘global’ lxd database (/var/lib/lxd/database/global) - it seems to mainly be the ‘operations’ table that we can see. Our assumption is that possibly every “lxc exec” command execution is getting logged? Does this sound accurate?
If we disable these periodic “lxc exec” commands via logstash our disk I/O issues go away. Are there any knobs that we can tweak/disable around writes to the LXD database?

thanks!

tomp · January 11, 2021, 9:19am

Hi,

The lxc exec commands along with most other LXD API requests, create an Operation record which can then be used to asynchronously refer to the ongoing operation by a UUID. It also provides some locking mechanisms as well.

What are are you running lxc exec?

stgraber · January 11, 2021, 2:13pm

The DB record for the operations (very light, just id, type and node_id) is used to allow for quick lookup of operations in clusters (avoiding having to query every node) as well as allowing basic locking in some instances (prevent two operations of the same type from running, only used on a subset of them).

In theory someone could contribute logic to skip this step and solely rely on the in-memory state when not clustered, though particular care would need to be taken when joining a cluster (replicate all in-memory operations in the DB at least).

rhodgkin · January 11, 2021, 10:37pm

Thanks much for the response, that makes sense. Yes, we are using ‘lxc exec’ with various extensions to periodically pull some BGP/IPSec and other status from inside each container. It sounds like these operational records are vital to proper async operation in LXD and could not be minimized/disabled.

thanks!
-Rob