LXD commands are are kinda slow


(Park Kyung Won) #1

Hello

I have a LXD cluster composed of 32 nodes

When I issue commands like

lxc launch blah blah
or
lxc ls

sometimes it takes like two, three minutes to finish command

if i have many nodes in a cluster, could it slow down speed?


(Free Ekanayaka) #2

Hello,

yes, the number of nodes (and much more importantly) the number of containers might affect the speed of “lxc list”. How many containers do you have?

If you wish us to take a look, you should probably enable debug logging on all nodes, invoke “lxc list”, and then send us all the logs around the time “lxc list” was run. That will help profiling where the time is spent and if there are possible improvements we can make.


(Stéphane Graber) #3

Also worth trying lxc list --fast which will cut down on API calls significantly and make it easier to figure out where time is being spent.


(Park Kyung Won) #4

Time spent is very fluctuating

I’ve removed all journal logs then started test

I had 5 containers

It takes 4 ~ 5 mins to launch a new container but with cached image
Around 1 ~ 2 mins to remove it

sometimes lxc ls gives output in 2 seconds but lxc ls --fast returns in 7 ~ 8 seconds
next time it’s opposite…

I’ve removed all journal logs to identify what’s happening then started issuing commands

THis is what I’ve got from journalctl

-- Logs begin at Mon 2018-05-14 20:01:55 UTC, end at Mon 2018-05-14 20:11:34 UTC. --
May 14 20:06:26 node00 lxd[4776]: 2018/05/14 20:06:26 http: multiple response.WriteHeader calls

How long does it take to launch a container normally?


(Stéphane Graber) #5

@freeekanayaka 4-5min to launch and 1-2min to delete definitely don’t seem normal even for a large cluster :slight_smile:


(Park Kyung Won) #6

I have 32 nodes and for master node

it had two physical NICs and 255 VLAN interfaces for DHCP and DNS

now I’ve removed 249 VLAN interfaces and left only 6 VLAN interfaces and also

removed all those DHCP settings for 249 VLAN network segments

And this time, made cluster composed of only 5 nodes including master

Now this time, it takes like 1 min to create a container,

and 20 seconds to start it.

It’s much better now but still kinda slow though this cluster is old

  1. Could it be problem with too many network segments?

  2. or just many hosts -> became slow


(Park Kyung Won) #7

sometimes with lxc ls I get all containers in state ERROR


What causes "Error: disk I/O error"?
(Park Kyung Won) #8

Now when I issue command
lxc ls
I get
Error: disk I/O error


(Free Ekanayaka) #9

I’m starting some work to improve this area, see this comment to the same issue reported by another user.

It will take a bit, but I’ll follow up this post when we think we have nailed things down.


(Park Kyung Won) #10

Thanks for your attention.

I have a cluster which I mentioned on this post

This cluster has reproducibility of that bug though I install LXD over and over

If you want I can give you shell access to entire cluster for inspection…


(Free Ekanayaka) #11

I think there’s no need for now. Once we have done more profiling, testing and fixing I’ll get back to this post so you can try again. As said, it will likely take a while, so probably next week.


(Park Kyung Won) #12

Yeah I think, this should be future improvement rather than a bug

I’ve installed LXD on another cluster, composed of 11 computers with latest gen hardwares,

but still, that LXD cluster takes ~ 2 mins to launch a container with cached image,

~ 20 seconds for lxc ls and ~ 5 seconds for lxc ls --fast

For this cluster, I could not find any suspicious logs in journal.


(Park Kyung Won) #13

I’ve installed LXD cluster again to previous 32 nodes cluster

This time I’ve cleaned up previous installation of LXD with apt purge and

installed ntp for time synchronization over cluster.

Now lxc commands are much responsive than before.

Could this be related?