LXC commands under heavy load

mezobari · May 9, 2023, 6:12pm

Sometimes under heavy load (many VMs) getting i/o timeout error when running a lxc commands:

write unix @->/var/snap/lxd/common/lxd/unix.socket: i/o timeout

It’s due to heavy load on the system, I’m using retry mechanism to reduce failures but still getting nevertheless, does increasing system limits will help?

According to this doc, there is no need to touch /etc/security/limits.conf since I’m using snap

I did update accordingly /etc/sysctl.conf

fs.aio-max-nr = 524288
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
kernel.dmesg_restrict = 1
kernel.keys.maxbytes = 2000000
kernel.keys.maxkeys = 2000
net.core.bpf_jit_limit = 3000000000
net.ipv4.neigh.default.gc_thresh3 = 8192
net.ipv6.neigh.default.gc_thresh3 = 8192
vm.max_map_count = 262144

still getting same error once in a while, any advice?

cemzafer · May 10, 2023, 7:50am

Hi @mezobari,
According to your system values, net.core.bpf_jit_limit is suspicious to me, If you get those values from productionvalues the paremeter is not mentioned. And I get sysctl: setting key “net.core.bpf_jit_limit”: Invalid argument
when I apply your config.
Regards.

mezobari · May 10, 2023, 8:01am

tbh, doesn’t remember why there is that field at all, I have bash script to set these values along with other stuff, maybe copilot suggested and I just approved (tabbed) it?!

tomp · May 12, 2023, 3:16pm

We recently modified the way the lxc exec websocket proxy logic works, this will be in LXD 5.14, so will be interesting to see if it helps when that is released.