Hello
I have a LXD cluster composed of 32 nodes
When I issue commands like
lxc launch blah blah
or
lxc ls
sometimes it takes like two, three minutes to finish command
if i have many nodes in a cluster, could it slow down speed?
Hello
I have a LXD cluster composed of 32 nodes
When I issue commands like
lxc launch blah blah
or
lxc ls
sometimes it takes like two, three minutes to finish command
if i have many nodes in a cluster, could it slow down speed?
Hello,
yes, the number of nodes (and much more importantly) the number of containers might affect the speed of “lxc list”. How many containers do you have?
If you wish us to take a look, you should probably enable debug logging on all nodes, invoke “lxc list”, and then send us all the logs around the time “lxc list” was run. That will help profiling where the time is spent and if there are possible improvements we can make.
Also worth trying lxc list --fast
which will cut down on API calls significantly and make it easier to figure out where time is being spent.
Time spent is very fluctuating
I’ve removed all journal logs then started test
I had 5 containers
It takes 4 ~ 5 mins to launch a new container but with cached image
Around 1 ~ 2 mins to remove it
sometimes lxc ls
gives output in 2 seconds but lxc ls --fast
returns in 7 ~ 8 seconds
next time it’s opposite…
I’ve removed all journal logs to identify what’s happening then started issuing commands
THis is what I’ve got from journalctl
-- Logs begin at Mon 2018-05-14 20:01:55 UTC, end at Mon 2018-05-14 20:11:34 UTC. --
May 14 20:06:26 node00 lxd[4776]: 2018/05/14 20:06:26 http: multiple response.WriteHeader calls
How long does it take to launch a container normally?
@freeekanayaka 4-5min to launch and 1-2min to delete definitely don’t seem normal even for a large cluster
I have 32 nodes and for master node
it had two physical NICs and 255 VLAN interfaces for DHCP and DNS
now I’ve removed 249 VLAN interfaces and left only 6 VLAN interfaces and also
removed all those DHCP settings for 249 VLAN network segments
And this time, made cluster composed of only 5 nodes including master
Now this time, it takes like 1 min to create a container,
and 20 seconds to start it.
It’s much better now but still kinda slow though this cluster is old
Could it be problem with too many network segments?
or just many hosts -> became slow
Now when I issue command
lxc ls
I get
Error: disk I/O error
I’m starting some work to improve this area, see this comment to the same issue reported by another user.
It will take a bit, but I’ll follow up this post when we think we have nailed things down.
Thanks for your attention.
I have a cluster which I mentioned on this post
This cluster has reproducibility of that bug though I install LXD over and over
If you want I can give you shell access to entire cluster for inspection…
I think there’s no need for now. Once we have done more profiling, testing and fixing I’ll get back to this post so you can try again. As said, it will likely take a while, so probably next week.
Yeah I think, this should be future improvement rather than a bug
I’ve installed LXD on another cluster, composed of 11 computers with latest gen hardwares,
but still, that LXD cluster takes ~ 2 mins to launch a container with cached image,
~ 20 seconds for lxc ls
and ~ 5 seconds for lxc ls --fast
For this cluster, I could not find any suspicious logs in journal.
I’ve installed LXD cluster again to previous 32 nodes cluster
This time I’ve cleaned up previous installation of LXD with apt purge and
installed ntp for time synchronization over cluster.
Now lxc commands are much responsive than before.
Could this be related?