Lxc list is slow when getting ipv4 addresses

gqdc · April 7, 2022, 8:54am

Hi everybody,

First of all, I want to thanks all LXD’s developpers for their very very great job. LXD is wonderfull !

I have some speed problem on my cluster, which can sometimes cause dramatical issues .

Nodes configuration :

Debian 11
LXD 4.24 rev 22710
snapd 2.54.4
64 Gb of RAM

I run a LXD cluster with 3 nodes, geographically distant, managed by Pacemakerd. The storage is on a ceph cluster (SSD).

The 3 nodes dialog on public network (500mbps upload, 1gbps download for each one).

All 15 containers are running on same node. If this node has a problem, all containers move to another node.

The principal problem is that lxc list is very slow.
As lxc list --fast, is… fast, I played lxc -c, and I can see that the retriving of IPv4 (I do not have IPv6) is particulary slow.

Here is some “time” test :

# time lxc ls -c nsL
real    0m0.432s
user    0m0.030s
sys     0m0.065s

# time lxc ls -c nsL4
real    3m27.149s
user    0m0.104s
sys     0m0.088s

# time lxc ls
real    3m25.567s
user    0m0.070s
sys     0m0.046s

The other speed problem that I can see, is that when I launch a new container, with the simple : lxc launch images:debian/11 new_container it takes many minutes, and causes a timeout into the LXD API, that cause pacemaker to think the resources are down.

Thank you for your time.

===== Update

I started a container which was stopped, and hasn’t IPv4 address, and now lxc list is mush faster :

# time lxc ls
real    0m4.372s
user    0m0.067s
sys     0m0.068s

by the way, if I split the containers over 2 or 3 nodes, lxc list will be very slow again.

I wonder if settings a virtual shared network over the nodes will tend to reduce this latence ?

tomp · April 7, 2022, 1:02pm

Whats the latency between members?

gqdc · April 7, 2022, 3:22pm

The 3 hostnames are : host2, host3 and host4.

Actually, all containers are running on host3, so I make test from it.
There is the results of ping from host3 to the others :

--- host2.domain.tld ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9004ms
rtt min/avg/max/mdev = 10.095/10.328/10.608/0.161 ms

--- host4.domain.tld ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9015ms
rtt min/avg/max/mdev = 1.696/1.877/2.175/0.158 ms

It’s interesting because there is big difference.

I tried to ping from host4 to host3 and host2, and results are equals.

From host2, the latence is 10 ms to host3 and host4.

It seems to be logical, as host3 and host 4 are distant of 100 km, and host 2 is at least at 500 km from the others.

These are dedicated server in France at Roubaix, Strasbourg and Gravelines.

tomp · April 7, 2022, 3:36pm

LXD isn’t really designed to work over a WAN. Although the latency between your members is relatively low, you’ll likely to see performance issues for actions that require cross-cluster communication or require queries to traverse the WAN to get to the leader member.

See lxc move fail with websocket: bad handshake on cluster with network latency between nodes · Issue #9861 · lxc/lxd · GitHub for more info.

gqdc · April 7, 2022, 4:00pm

Thank you for your answers. I read the “git issue” you linked, I understand that LXD cluster has to work on very low latency network.

I’ll try to go over a LAN, it will be better, for LXD and for Ceph too