Reverted from lxd 3.4 to 3.0 - won't start

Overnight my laptop upgraded to the latest lxd 3.4.

$ snap change 968 | grep lxd
Done    yesterday at 19:45 BST  yesterday at 19:46 BST  Ensure prerequisites for "lxd" are available
Done    yesterday at 19:45 BST  yesterday at 19:46 BST  Download snap "lxd" (8415) from channel "stable"
Done    yesterday at 19:45 BST  yesterday at 19:46 BST  Fetch and check assertions for snap "lxd" (8415)
Done    yesterday at 19:45 BST  yesterday at 19:46 BST  Mount snap "lxd" (8415)
Done    yesterday at 19:45 BST  yesterday at 19:46 BST  Run pre-refresh hook of "lxd" snap if present
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Stop snap "lxd" services
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Remove aliases for snap "lxd"
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Make current revision for snap "lxd" unavailable
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Copy snap "lxd" data
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Setup snap "lxd" (8415) security profiles
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Make snap "lxd" (8415) available to the system
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Automatically connect eligible plugs and slots of snap "lxd"
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Set automatic aliases for snap "lxd"
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Setup snap "lxd" aliases
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Run post-refresh hook of "lxd" snap if present
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Start snap "lxd" (8415) services
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Clean up "lxd" (8415) install
Done    yesterday at 19:45 BST  yesterday at 19:47 BST  Run configure hook of "lxd" snap if present

Unfortunately this broke snapcraft which uses lxd to build snaps.

I reverted back to 3.0 via snap refresh lxd --channel=3.0/stable. This broke lxd, which now, no longer starts.

$ systemctl status snap.lxd.daemon.service 
â—Ź snap.lxd.daemon.service - Service for snap application lxd.daemon
   Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; enabled; vendor preset: enabled)
   Active: inactive (dead) (Result: exit-code) since Tue 2018-08-21 10:24:47 BST; 42s ago
  Process: 31037 ExecStop=/usr/bin/snap run --command=stop lxd.daemon (code=exited, status=0/SUCCESS)
  Process: 30936 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=137)
 Main PID: 30936 (code=exited, status=137)

Aug 21 10:24:47 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Unit entered failed state.
Aug 21 10:24:47 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Aug 21 10:24:47 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Aug 21 10:24:47 KinkPad-K450 systemd[1]: Stopped Service for snap application lxd.daemon.
Aug 21 10:24:47 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Aug 21 10:24:47 KinkPad-K450 systemd[1]: Failed to start Service for snap application lxd.daemon.

Here’s the journal…

Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: => Preparing the system
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Loading snap configuration
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Setting up mntns symlink (mnt:[4026532423])
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Setting up kmod wrapper
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Preparing /boot
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Preparing a clean copy of /run
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Preparing a clean copy of /etc
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Setting up ceph configuration
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Setting up LVM configuration
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Rotating logs
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Setting up ZFS (0.7)
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Escaping the systemd cgroups
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: ==> Escaping the systemd process resource limits
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]: mount namespace: 7
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]: hierarchies:
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   0: fd:   8: perf_event
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   1: fd:   9: freezer
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   2: fd:  10: memory
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   3: fd:  11: pids
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   4: fd:  12: hugetlb
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   5: fd:  13: rdma
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   6: fd:  14: net_cls,net_prio
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   7: fd:  15: cpu,cpuacct
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   8: fd:  16: blkio
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:   9: fd:  17: devices
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:  10: fd:  18: cpuset
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]:  11: fd:  19: name=systemd
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[4570]: lxcfs.c: 105: do_reload: lxcfs: reloaded
Aug 21 10:27:15 KinkPad-K450 lxd.daemon[31813]: => Re-using existing LXCFS
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: => Starting LXD
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-08-21T09:27:16+0000
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: panic: wal frames txn 13202 (1): out of memory
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: trace:
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04016: fsm: restore=13196 start
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04016: fsm: restore=13196 database size: 4096
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04017: fsm: restore=13196 wal size: 304912
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04033: fsm: restore=13196 filename: db.bin
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04033: fsm: restore=13196 transaction ID:
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04072: fsm: restore=13196 open follower: db.bin
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.04158: fsm: restore=13196 done
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.08639: fsm: term=1 index=13205 cmd=frames txn=13202 pages=0 commit=1 start
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: 2018-08-21 09:27:16.08655: fsm: term=1 index=13205 cmd=frames failed: wal frames txn 13202 (1): out of memory
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: goroutine 35 [running]:
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/CanonicalLtd/dqlite/internal/trace.(*Tracer).Panic(0xc42027a140, 0xfff699, 0x2, 0xc4203d3d20, 0x1, 0x1)
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/dqlite/internal/trace/tracer.go:59 +0x12c
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/CanonicalLtd/dqlite/internal/replication.(*FSM).Apply(0xc4202a1c60, 0xc420524990, 0x0, 0x0)
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/dqlite/internal/replication/fsm.go:84 +0x137
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc4201947b0)
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:57 +0x15a
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/hashicorp/raft.(*Raft).runFSM(0xc420224580)
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:120 +0x2fa
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/api.go:506 +0x2a
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc420224580, 0xc4201955e0)
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:146 +0x53
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]: created by github.com/hashicorp/raft.(*raftState).goFunc
Aug 21 10:27:16 KinkPad-K450 lxd.daemon[31813]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:144 +0x66
Aug 21 10:27:17 KinkPad-K450 lxd.daemon[31813]: => LXD failed to start
Aug 21 10:27:17 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=137/n/a
Aug 21 10:27:17 KinkPad-K450 lxd.daemon[31914]: => Stop reason is: crashed
Aug 21 10:27:17 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Unit entered failed state.
Aug 21 10:27:17 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Aug 21 10:27:17 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Aug 21 10:27:17 KinkPad-K450 systemd[1]: Stopped Service for snap application lxd.daemon.
-- Subject: Unit snap.lxd.daemon.service has finished shutting down
-- Unit snap.lxd.daemon.service has finished shutting down.
Aug 21 10:27:17 KinkPad-K450 systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Aug 21 10:27:17 KinkPad-K450 systemd[1]: Failed to start Service for snap application lxd.daemon.
-- Subject: Unit snap.lxd.daemon.service has failed
-- Unit snap.lxd.daemon.service has failed.

Hi Alan,

I think it’s the first time we see this one. Would it be possible for you to make a tarball of your /var/snap/lxd/common/lxd/database directory and send it to me? free at canonical dot com.

I’ll look into your state and see what’s wrong.

It’s too big for me to mail it where I am, can I wormhole it to you? Giving you the code over irc or pm?

Okay, we chatted on IRC and the course of events was:

  1. at some point snapd upgraded lxd from 3.0 to 3.4, from what I gathered the upgrade was fine
  2. Alan noticed that snapcraft was not working anymore for him with 3.4 (for reasons being investigated elsewhere)
  3. he tried to downgrade to 3.0 using snap refresh
  4. lxd panic’ed as in the above traceback since downgrades are not supported and 3.4 has slight differences in the on-disk raft log format

Simply snap refresh’ing back to 3.4 from 3.0 fixed the problem this lxd instance is now working again (modulo snapcraft brekages).

1 Like