@stgraber I’m out of ideas here. How could a “snap refresh lxd” run end up printing:
=> Preparing the system (14503)
if the latest version of the snap in the store is 14709?
@stgraber I’m out of ideas here. How could a “snap refresh lxd” run end up printing:
=> Preparing the system (14503)
if the latest version of the snap in the store is 14709?
snap info lxd
and snap changes
may be helpful
# snap info lxd
sname: lxd
summary: System container manager and API
publisher: Canonical✓
store-url: https://snapcraft.io/lxd
contact: https://github.com/lxc/lxd/issues
license: unset
description: |
....
commands:
- lxd.benchmark
- lxd.buginfo
- lxd.check-kernel
- lxd.lxc
- lxd
- lxd.migrate
services:
lxd.activate: oneshot, enabled, inactive
lxd.daemon: simple, enabled, inactive
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: today at 13:32 UTC
channels:
latest/stable: 4.0.0 2020-04-17 (14709) 62MB -
latest/candidate: 4.0.0 2020-04-21 (14759) 62MB -
latest/beta: ↑
latest/edge: git-b65cc2d 2020-04-21 (14768) 62MB -
4.0/stable: 4.0.0 2020-04-17 (14709) 62MB -
4.0/candidate: 4.0.0 2020-04-21 (14759) 62MB -
4.0/beta: ↑
4.0/edge: ↑
...
installed: 4.0.0 (14503) 70MB -
# snap changes
ID Status Spawn Ready Summary
102 Done today at 07:24 UTC today at 07:24 UTC Running service command
103 Error today at 07:26 UTC today at 07:26 UTC Refresh snap "lxd"
104 Error today at 07:28 UTC today at 07:29 UTC Refresh snap "lxd"
105 Error today at 07:38 UTC today at 07:38 UTC Refresh "lxd" snap from "4.0/stable" channel
106 Done today at 07:59 UTC today at 07:59 UTC Running service command
107 Done today at 08:06 UTC today at 08:06 UTC Running service command
108 Done today at 08:06 UTC today at 08:06 UTC Running service command
109 Error today at 08:28 UTC today at 08:28 UTC Refresh "lxd" snap from "4.0/stable" channel
110 Error today at 08:41 UTC today at 08:42 UTC Refresh "lxd" snap from "4.0/stable" channel
111 Done today at 08:51 UTC today at 08:51 UTC Running service command
112 Error today at 09:18 UTC today at 09:19 UTC Refresh "lxd" snap from "4.0/stable" channel
113 Error today at 09:47 UTC today at 09:47 UTC Refresh "lxd" snap from "4.0/stable" channel
114 Error today at 09:52 UTC today at 09:52 UTC Refresh "lxd" snap from "4.0/stable" channel
115 Error today at 10:00 UTC today at 10:00 UTC Auto-refresh snap "lxd"
116 Error today at 10:28 UTC today at 10:28 UTC Refresh "lxd" snap from "4.0/stable" channel
117 Error today at 11:21 UTC today at 11:21 UTC Refresh "lxd" snap from "4.0/stable" channel
118 Error today at 11:26 UTC today at 11:27 UTC Refresh "lxd" snap
119 Error today at 12:21 UTC today at 12:21 UTC Refresh "lxd" snap
120 Error today at 12:42 UTC today at 12:42 UTC Refresh "lxd" snap
121 Error today at 12:43 UTC today at 12:43 UTC Refresh "lxd" snap from "4.0/stable" channel
122 Error today at 13:00 UTC today at 13:01 UTC Refresh "lxd" snap from "4.0/stable" channel
123 Error today at 13:01 UTC today at 13:01 UTC Refresh "lxd" snap from "4.0/stable" channel
124 Error today at 13:18 UTC today at 13:19 UTC Refresh "lxd" snap from "4.0/stable" channel
125 Error today at 13:20 UTC today at 13:20 UTC Refresh "lxd" snap
126 Error today at 13:22 UTC today at 13:22 UTC Refresh snap "lxd"
127 Error today at 13:27 UTC today at 13:27 UTC Install "lxd" snap from file "lxd_14709.snap"
128 Error today at 13:31 UTC today at 13:32 UTC Install "lxd" snap from file "./lxd_14709.snap"
# snap change 126
Status Spawn Ready Summary
Done today at 13:22 UTC today at 13:22 UTC Ensure prerequisites for "lxd" are available
Undone today at 13:22 UTC today at 13:22 UTC Download snap "lxd" (14709) from channel "latest/stable"
Done today at 13:22 UTC today at 13:22 UTC Fetch and check assertions for snap "lxd" (14709)
Undone today at 13:22 UTC today at 13:22 UTC Mount snap "lxd" (14709)
Undone today at 13:22 UTC today at 13:22 UTC Run pre-refresh hook of "lxd" snap if present
Undone today at 13:22 UTC today at 13:22 UTC Stop snap "lxd" services
Undone today at 13:22 UTC today at 13:22 UTC Remove aliases for snap "lxd"
Undone today at 13:22 UTC today at 13:22 UTC Make current revision for snap "lxd" unavailable
Undone today at 13:22 UTC today at 13:22 UTC Copy snap "lxd" data
Undone today at 13:22 UTC today at 13:22 UTC Setup snap "lxd" (14709) security profiles
Undone today at 13:22 UTC today at 13:22 UTC Make snap "lxd" (14709) available to the system
Undone today at 13:22 UTC today at 13:22 UTC Automatically connect eligible plugs and slots of snap "lxd"
Undone today at 13:22 UTC today at 13:22 UTC Set automatic aliases for snap "lxd"
Undone today at 13:22 UTC today at 13:22 UTC Setup snap "lxd" aliases
Undone today at 13:22 UTC today at 13:22 UTC Run post-refresh hook of "lxd" snap if present
Error today at 13:22 UTC today at 13:22 UTC Start snap "lxd" (14709) services
Hold today at 13:22 UTC today at 13:22 UTC Remove data for snap "lxd" (14353)
Hold today at 13:22 UTC today at 13:22 UTC Remove snap "lxd" (14353) from the system
Hold today at 13:22 UTC today at 13:22 UTC Clean up "lxd" (14709) install
Hold today at 13:22 UTC today at 13:22 UTC Run configure hook of "lxd" snap if present
Hold today at 13:22 UTC today at 13:22 UTC Run health check of "lxd" snap
Done today at 13:22 UTC today at 13:22 UTC Consider re-refresh of "lxd"
......................................................................
Start snap "lxd" (14709) services
2020-04-21T13:22:25Z ERROR [start snap.lxd.activate.service] failed with exit status 1: Job for snap.lxd.activate.service failed because the control process exited with error code.
See "systemctl status snap.lxd.activate.service" and "journalctl -xe" for details.
# snap change 128
Status Spawn Ready Summary
Done today at 13:31 UTC today at 13:32 UTC Ensure prerequisites for "lxd" are available
Undone today at 13:31 UTC today at 13:32 UTC Prepare snap "/var/lib/snapd/snaps/.local-install-341550792" (14709)
Undone today at 13:31 UTC today at 13:32 UTC Mount snap "lxd" (14709)
Undone today at 13:31 UTC today at 13:32 UTC Run pre-refresh hook of "lxd" snap if present
Undone today at 13:31 UTC today at 13:32 UTC Stop snap "lxd" services
Undone today at 13:31 UTC today at 13:32 UTC Remove aliases for snap "lxd"
Undone today at 13:31 UTC today at 13:32 UTC Make current revision for snap "lxd" unavailable
Undone today at 13:31 UTC today at 13:32 UTC Copy snap "lxd" data
Undone today at 13:31 UTC today at 13:32 UTC Setup snap "lxd" (14709) security profiles
Undone today at 13:31 UTC today at 13:32 UTC Make snap "lxd" (14709) available to the system
Undone today at 13:31 UTC today at 13:32 UTC Automatically connect eligible plugs and slots of snap "lxd"
Undone today at 13:31 UTC today at 13:32 UTC Set automatic aliases for snap "lxd"
Undone today at 13:31 UTC today at 13:32 UTC Setup snap "lxd" aliases
Undone today at 13:31 UTC today at 13:32 UTC Run post-refresh hook of "lxd" snap if present
Error today at 13:31 UTC today at 13:32 UTC Start snap "lxd" (14709) services
Hold today at 13:31 UTC today at 13:32 UTC Remove data for snap "lxd" (14353)
Hold today at 13:31 UTC today at 13:32 UTC Remove snap "lxd" (14353) from the system
Hold today at 13:31 UTC today at 13:32 UTC Clean up "lxd" (14709) install
Hold today at 13:31 UTC today at 13:32 UTC Run configure hook of "lxd" snap if present
Hold today at 13:31 UTC today at 13:32 UTC Run health check of "lxd" snap
......................................................................
Start snap "lxd" (14709) services
2020-04-21T13:32:01Z ERROR [start snap.lxd.activate.service] failed with exit status 1: Job for snap.lxd.activate.service failed because the control process exited with error code.
See "systemctl status snap.lxd.activate.service" and "journalctl -xe" for details.
Just to have the complete picture, can you also paste the journalctl logs that match the start and end timestamps of snap change 128
(or snap change 126
, should be the same). The timestamp of the journalctl log that you pasted earlier seem the ones from snap change 123
(i.e. around 13:01 UTC).
Apr 21 13:32:14 cloud10 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 21 13:32:14 cloud10 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 21 13:32:14 cloud10 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Apr 21 13:32:14 cloud10 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 6.
Apr 21 13:32:14 cloud10 systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 21 13:32:14 cloud10 systemd[1]: Started Service for snap application lxd.daemon.
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: => Preparing the system (14503)
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Loading snap configuration
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Setting up mntns symlink (mnt:[4026533096])
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Setting up kmod wrapper
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Preparing /boot
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Preparing a clean copy of /run
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Preparing a clean copy of /etc
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Setting up ceph configuration
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Setting up LVM configuration
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Rotating logs
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Setting up ZFS (0.7)
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Escaping the systemd cgroups
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ====> Detected cgroup V1
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Escaping the systemd process resource limits
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Disabling shiftfs on this kernel (auto)
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: => Re-using existing LXCFS
Apr 21 13:32:14 cloud10 lxd.daemon[4113]: ==> Cleaning up existing LXCFS namespace
Apr 21 13:32:15 cloud10 lxd.daemon[4113]: => Starting LXD
Apr 21 13:32:15 cloud10 lxd.daemon[4113]: t=2020-04-21T13:32:15+0000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swa
Apr 21 13:32:16 cloud10 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 21 13:32:16 cloud10 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 21 13:32:16 cloud10 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Apr 21 13:32:16 cloud10 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 7.
And what about /var/snap/lxd/common/lxd/logs/lxd.log
in the same time window?
@stgraber I’m a bit confused that during the install of 14709, snapd outputs:
Preparing the system (14503)
is that normal? The only explanation is that the version of the currently installed snap is printed (14503), but that seems odd.
t=2020-04-21T13:32:15+0000 lvl=info msg="LXD 4.0.0 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2020-04-21T13:32:15+0000 lvl=info msg="Kernel uid/gid map:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - u 0 0 4294967295"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - g 0 0 4294967295"
t=2020-04-21T13:32:15+0000 lvl=info msg="Configured LXD uid/gid map:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - u 0 1000000 1000000000"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - g 0 1000000 1000000000"
t=2020-04-21T13:32:15+0000 lvl=info msg="Kernel features:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - netnsid-based network retrieval: no"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - uevent injection: no"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - seccomp listener: no"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - seccomp listener continue syscalls: no"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - cgroup layout: hybrid"
t=2020-04-21T13:32:15+0000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - shiftfs support: disabled"
t=2020-04-21T13:32:15+0000 lvl=info msg="Initializing local database"
t=2020-04-21T13:32:15+0000 lvl=info msg="Starting /dev/lxd handler:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-04-21T13:32:15+0000 lvl=info msg="REST API daemon:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-04-21T13:32:15+0000 lvl=info msg=" - binding TCP socket" socket=172.16.0.10:8443
t=2020-04-21T13:32:15+0000 lvl=info msg="Initializing global database"
t=2020-04-21T13:32:15+0000 lvl=warn msg="Dqlite: server unavailable err=failed to establish network connection: Head \"https://172.16.11.1:8443/internal/database\": dial tcp 172.16.11.1:8443: connect: connection refused address=172.16.11.1:8443 attempt=0"
t=2020-04-21T13:32:15+0000 lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: this node's version is behind, please upgrade"
t=2020-04-21T13:32:15+0000 lvl=info msg="Starting shutdown sequence"
t=2020-04-21T13:32:15+0000 lvl=info msg="Stopping REST API handler:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - closing socket" socket=172.16.0.10:8443
t=2020-04-21T13:32:15+0000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-04-21T13:32:15+0000 lvl=info msg="Stopping /dev/lxd handler:"
t=2020-04-21T13:32:15+0000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
I wonder if you can do something like:
So it stops being an idiot and reverting you to a broken revision…
@freeekanayaka snapd itself shows that it’s running the old rev I believe, so we need to have snapd stop trying to revert stuff and actually start the latest rev
@orose there’s a dirty workaround you can do though:
This will have LXD start the 14709 version no matter what snapd thinks.
Once that’s online, snap refresh
better behave, at which point you can rm /var/snap/lxd/common/lxd.debug
Think I’ll have to go with dirty at the moment - need to get things up and running as soon as possible now.
The binary isn’t sticking around long enough to copy, so is there another place I can get it from?
You already downloaded the .snap file I think, right? Just unsquashfs-it to extract the whole content.
I think I’m getting somewhere now - running in debug mode to see what is what, and we’re currently doing (presumably) DB update tasks.
It’s certainly further down the process at least.
I think this is an improvement, but I now have a different problem - I think I’ve got a corrupted certificate in there somewhere now. I do know who I need to be angry at if that’s the case though. It is now starting, at least.
2020/04/21 17:25:27 http: TLS handshake error from 172.16.11.1:53914: remote error: tls: bad certificate
I don’t have lxc info to pull the certificate from any node at present, so is there any way of installing it from another node?
Dirty solution seems to have triggered enough of a DB update to get things up and running, so thanks everyone.