Lots of processes and LXD hangs (again)

This is the repetition of

cluster Setup

  • 31 nodes in total: node00 ~ node31, but without node08
  • node00 is the master node
  • node01 ~ node31 are slave nodes of LXD cluster
  • All nodes were Ubuntu 17.10 and LXD 3.0 was installed with snap
  • LXD cluster has been created, with LXD cluster feature
  • All nodes had same configuration, with local storage provider to ‘/dev/sda6’ and no remote storage

I didn’t touch this cluster for few days, and today I found out that both ‘lxc’ and ‘lxd’ commands are not responding

Also found that there are lots of lxd processes
This is the result of “ps -aux | grep lxd”, acquired from node00 but all same around any node

root      1948  1.3  0.5 2026472 67952 ?       Sl   Apr30  33:21 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      3816  0.0  0.1 328304 19244 ?        Sl   02:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      3911  0.0  0.1 255980 20388 ?        Sl   03:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4007  0.0  0.1 255980 19056 ?        Sl   03:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4104  0.0  0.1 255980 19712 ?        Sl   03:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4200  0.0  0.1 256236 20384 ?        Sl   03:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4294  0.0  0.1 253260 20212 ?        Sl   03:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4389  0.0  0.1 187980 18932 ?        Sl   03:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4485  0.0  0.1 255980 19636 ?        Sl   04:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4580  0.0  0.1 255980 20108 ?        Sl   04:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4676  0.0  0.1 255980 19712 ?        Sl   04:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4770  0.0  0.1 187980 20084 ?        Sl   04:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4865  0.0  0.1 253260 19420 ?        Sl   04:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      4960  0.0  0.1 190444 18792 ?        Sl   04:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5055  0.0  0.1 253260 19940 ?        Sl   05:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5150  0.0  0.1 189036 19652 ?        Sl   05:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5248  0.0  0.1 254572 19132 ?        Sl   05:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5344  0.0  0.1 190444 19668 ?        Sl   05:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5439  0.0  0.1 190444 19252 ?        Sl   05:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5535  0.0  0.1 254828 20228 ?        Sl   05:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5629  0.0  0.1 253260 19020 ?        Sl   06:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5723  0.0  0.1 254828 19976 ?        Sl   06:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      5865  0.0  0.1 255980 19440 ?        Sl   06:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6037  0.0  0.1 253260 19388 ?        Sl   06:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6132  0.0  0.1 255980 19116 ?        Sl   06:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6225  0.0  0.1 190444 19640 ?        Sl   06:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6321  0.0  0.1 253260 20180 ?        Sl   07:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6416  0.0  0.1 187724 18668 ?        Sl   07:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6514  0.0  0.1 254572 19304 ?        Sl   07:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6609  0.0  0.1 255980 19136 ?        Sl   07:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6704  0.0  0.1 253260 18692 ?        Sl   07:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6799  0.0  0.1 255980 19264 ?        Sl   07:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6893  0.0  0.1 190444 19488 ?        Sl   08:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      6986  0.0  0.1 190444 19564 ?        Sl   08:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7082  0.0  0.1 187724 19508 ?        Sl   08:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7177  0.0  0.1 253260 19068 ?        Sl   08:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7272  0.0  0.1 255980 19240 ?        Sl   08:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7366  0.0  0.1 326992 19168 ?        Sl   08:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7462  0.0  0.1 256236 19888 ?        Sl   09:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7556  0.0  0.1 187724 20180 ?        Sl   09:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7653  0.0  0.1 255980 19200 ?        Sl   09:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7747  0.0  0.1 255980 19288 ?        Sl   09:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7841  0.0  0.1 255980 19028 ?        Sl   09:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      7935  0.0  0.1 253260 19448 ?        Sl   09:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8035  0.0  0.1 190444 19968 ?        Sl   10:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8129  0.0  0.1 253260 19244 ?        Sl   10:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8226  0.0  0.1 254572 20448 ?        Sl   10:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8320  0.0  0.1 253260 19244 ?        Sl   10:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8413  0.0  0.1 253260 18900 ?        Sl   10:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8507  0.0  0.1 255980 19364 ?        Sl   10:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8601  0.0  0.1 255980 19224 ?        Sl   11:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8696  0.0  0.1 256236 19836 ?        Sl   11:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8794  0.0  0.1 255980 18984 ?        Sl   11:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8888  0.0  0.1 255980 19512 ?        Sl   11:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      8991  0.0  0.1 255980 18968 ?        Sl   11:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9086  0.0  0.1 187980 19616 ?        Sl   11:51   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9180  0.0  0.1 256236 19664 ?        Sl   12:01   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9275  0.0  0.1 255980 19808 ?        Sl   12:11   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9374  0.0  0.1 255980 19788 ?        Sl   12:21   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9469  0.0  0.1 254572 19480 ?        Sl   12:31   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9565  0.0  0.1 189292 18576 ?        Sl   12:41   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9661  0.0  0.1 190444 18996 ?        Sl   12:52   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9757  0.0  0.1 256236 19584 ?        Sl   13:02   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9853  0.0  0.1 190444 20016 ?        Sl   13:12   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      9951  0.0  0.1 255980 19528 ?        Sl   13:22   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10046  0.0  0.1 189036 18912 ?        Sl   13:32   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10142  0.0  0.1 253260 19052 ?        Sl   13:42   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10236  0.0  0.1 190444 19352 ?        Sl   13:52   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10340  0.0  0.1 255980 19504 ?        Sl   14:02   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10435  0.0  0.1 254572 18924 ?        Sl   14:12   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10533  0.0  0.1 190444 19032 ?        Sl   14:22   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10629  0.0  0.1 254572 19420 ?        Sl   14:32   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10724  0.0  0.1 253260 18952 ?        Sl   14:42   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10873  0.0  0.1 256236 19448 ?        Sl   14:52   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     10969  0.0  0.1 189388 20268 ?        Sl   15:02   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11064  0.0  0.1 253260 20164 ?        Sl   15:12   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11162  0.0  0.1 254572 18760 ?        Sl   15:22   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11257  0.0  0.1 187724 19572 ?        Sl   15:32   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11365  0.0  0.1 255980 19500 ?        Sl   15:42   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11460  0.0  0.1 253260 18936 ?        Sl   15:52   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11554  0.0  0.1 254572 19064 ?        Sl   16:02   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11649  0.0  0.1 255980 18952 ?        Sl   16:12   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11748  0.0  0.1 253260 20044 ?        Sl   16:22   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11843  0.0  0.1 255980 19496 ?        Sl   16:32   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     11939  0.0  0.1 190700 19400 ?        Sl   16:42   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     12033  0.0  0.1 256236 19024 ?        Sl   16:52   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     12133  0.0  0.1 254572 18768 ?        Sl   17:02   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root     12280  0.0  0.1 187724 19024 ?        Sl   17:12   0:00 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
user     12395  0.0  0.0  47040  5488 pts/0    S    17:15   0:00 ssh node02 sudo snap remove lxd && sudo apt remove -y snap
user     12635  0.0  0.0  15428  2116 pts/0    S+   17:25   0:00 grep --color=auto lxd

This is clusterwide

This is tar archive of “/var/snap/lxd/common/lxd/logs” from cluster node04 ~ node31
I accidently removed LXD from node00 ~ node03

Hope this could help

Hello,

thanks for retrying and getting back. From the logs I didn’t spot any obvious problem, but the thing we have to understand here is how is it that we ended up with serveral lxd processes running at the same time.

Even if an automatic snap refresh fails, it should not leave processes behind. @stgraber do you have an idea of how this can happen? It seems that 1) the checks in the LXD daemon that prevent 2 instances from running at the same time did not quite work 2) a snap refresh failure does not try hard enough to kill everything.

For the reason of the snap refresh failure itself, @Park_Kyung_Won could you also send us the output of journalctl -u snap.lxd.daemon on all the nodes?

Can you show ps fauxww on this system? Having the entire process list as a tree would help.

The entire journalctl -u snap.lxd.daemon from one of the affected systems would help too.

Here’s log for all nodes

ps logs

https://www.dropbox.com/s/sgrik6tcnridgrj/ps.tar?dl=0

journalctl logs

https://www.dropbox.com/s/yvc6any5r3yyepv/journalctl.tar?dl=0

journalctl logs were to big so I had to compress it once with gzip

I have attached logs at the comment down below

Thanks for your helps