After lxd restart: Some lxd containers restart automatically, some not, none have boot.autostart

lxd

(Gaetanquentin) #1

lxd 3.13
snapd 2.39
Ubuntu 18.04.2 LTS
fs: btrfs

one day i see that some of my containers are stopped. I look into lxd logs:

in lxd logs, i can see this:
t=2019-06-04T09:46:25+0200 lvl=info msg=“Asked to shutdown by API, shutting down containers”
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” [ …]
every container are shut down

question: “Asked to shutdown by API”: that mean lxd got a restart order for all containers?

and just after shutdown, the restart action:
t=2019-06-04T09:46:43+0200 lvl=info msg=“LXD 3.13 is starting in normal mode” path=/var/snap/lxd/common/lxd
[…]
t=2019-06-04T09:46:43+0200 lvl=info msg=“Starting container” action=start […]

lot of containers are restarted but not all.

Some are not, and i don’t know why.

None of the restarted container have the boot.autostart set to yes.

So why some are restarted and not others?


#2

First of all why this restart happened ? Normally a snap refresh restarts lxd but does not stop containers. Containers are only stopped when computer is shutdown.
What is saying journalctl -xe -u snap.lxd.daemon for this date ? search for ‘Stop reason’.

Then it’s not clear from your message if lxd attempts to restart all (running before) containers and some fail to do so, or if somehow lxd only attempts to restart some of them ?


(Gaetanquentin) #3

i don’t know why it restarted. it was on 4 June but i have no more theses logs:

-rw-r-----+ 1 root systemd-journal 16777216 Jun 11 13:28 system.journal
-rw-r-----+ 1 root systemd-journal 109051904 Jun 5 18:34 system@442f2700731f46d3b6b684772c14598b-00000000009a84a9-00058a906d203458.journal
-rw-r-----+ 1 root systemd-journal 109051904 Jun 6 01:25 system@442f2700731f46d3b6b684772c14598b-00000000009d2c99-00058a962c821644.journal

i think lxd tried to shutdown all and restart some of them. In the logs, there are missing restarting lines for some containers.


#4

no logs after 1 week ??? neither with journalctl nor in /var/snap/lxd/common/lxd/logs/ ?
what could have happened ? are you so low on disk space you configured logging to keep almost no logs ? if so that could explain erratic behaviour for many apps including lxd.


(Gaetanquentin) #5

i have a rotatelog for 7 days for rsyslog and journalctl.

for lxd i have posted log ^^
the logs start with 'msg=“Asked to shutdown by API, shutting down containers” ’ as said before.
And then a '“Shutting down container” for each container
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” action=shutdown created=2019-06-03T16:03:38+0200 ephemeral=false name=XXX project=default timeout=30s used=2019-06-03T16:24:04+0200
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” action=shutdown created=2019-05-17T12:29:22+0200 ephemeral=false name=XXX project=default timeout=30s used=2019-05-20T16:44:47+0200
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” action=shutdown created=2019-05-17T12:29:31+0200 ephemeral=false name=XXX project=default timeout=30s used=2019-05-20T16:44:51+0200
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” action=shutdown created=2019-05-10T17:07:15+0200 ephemeral=false name=XXX project=default timeout=30s used=2019-05-10T17:07:16+0200
t=2019-06-04T09:46:25+0200 lvl=info msg=“Shutting down container” action=shutdown created=2019-03-05T17:12:43+0100 ephemeral=false name=XXX project=default timeout=30s used=2019-04-29T11:30:17+0200

And:
t=2019-06-04T09:46:40+0200 lvl=info msg=“Starting shutdown sequence”
t=2019-06-04T09:46:40+0200 lvl=info msg=“Stopping REST API handler:”
t=2019-06-04T09:46:40+0200 lvl=info msg=" - closing socket" socket=[::]:9443
t=2019-06-04T09:46:40+0200 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2019-06-04T09:46:40+0200 lvl=info msg=“Stopping /dev/lxd handler:”
t=2019-06-04T09:46:40+0200 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2019-06-04T09:46:40+0200 lvl=info msg=“Closing the database”
t=2019-06-04T09:46:40+0200 lvl=info msg=“Unmounting temporary filesystems”
t=2019-06-04T09:46:40+0200 lvl=info msg=“Done unmounting temporary filesystems”


#6

maybe running a diff between lxc.conf files in the log containers subdirectory between a restarting container and one that did not could bring some light ?