Lxd in broken state can't even remove

LXD daemon wont start, seems like it’s a bad state somehow there was containers still running from juju even after i tried to remove juju and now something seems messed up.

Was actually just trying to stop and purge everything LXD and start again but this is hanging when trying ‘snap remove lxd’.

Any thoughts or help would be greately appreciated. Below is output from ‘journal ctl snap.daemon’

Jan 24 22:21:44 odin lxd.daemon[6894]: 2019-01-24 22:21:44.54987: fsm: term=1 index=2193     cmd=checkpoint file=db.bin start
Jan 24 22:21:44 odin lxd.daemon[6894]: 2019-01-24 22:21:44.55018: fsm: term=1 index=2193 cmd=checkpoint failed: checkpoint: disk I/O error
Jan 24 22:21:44 odin lxd.daemon[6894]: goroutine 75 [running]:
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/CanonicalLtd/go-dqlite/internal/trace.(*Tracer).Panic(0xc000148000, 0x1197771, 0x2, 0xc0003f7d68, 0x1, 0
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/trace/tracer.go:59 +0x12a
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/CanonicalLtd/go-dqlite/internal/replication.(*FSM).Apply(0xc0002f6920, 0xc0002e03c0, 0x0, 0x0)
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/replication/fsm.go:84 +0x136
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc00053c330)
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:57 +0x155
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/hashicorp/raft.(*Raft).runFSM(0xc00014a000)
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:120 +0x2ef
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/hashicorp/raft.(*Raft).runFSM-fm()
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/api.go:506 +0x2a
Jan 24 22:21:44 odin lxd.daemon[6894]: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00014a000, 0xc0001ba950)
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:146 +0x53
Jan 24 22:21:44 odin lxd.daemon[6894]: created by github.com/hashicorp/raft.(*raftState).goFunc
Jan 24 22:21:44 odin lxd.daemon[6894]:         /build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:144 +0x66
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 10.
Jan 24 22:21:45 odin systemd[1]: Stopped Service for snap application lxd.daemon.
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jan 24 22:21:45 odin systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jan 24 22:21:45 odin systemd[1]: Failed to start Service for snap application lxd.daemon.
Jan 25 17:43:32 odin systemd[1]: Started Service for snap application lxd.daemon.
Jan 25 17:43:32 odin lxd.daemon[17433]: => Preparing the system
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Loading snap configuration
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Setting up mntns symlink (mnt:[4026532375])
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Setting up kmod wrapper
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Preparing /boot
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Preparing a clean copy of /run
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Preparing a clean copy of /etc
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Setting up ceph configuration
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Setting up LVM configuration
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Rotating logs
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Setting up ZFS (0.7)
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Escaping the systemd cgroups
Jan 25 17:43:32 odin lxd.daemon[17433]: ==> Escaping the systemd process resource limits
Jan 25 17:43:32 odin lxd.daemon[17433]: => Re-using existing LXCFS
Jan 25 17:43:32 odin lxd.daemon[17433]: => Starting LXD
Jan 25 17:43:32 odin lxd.daemon[17433]: t=2019-01-25T17:43:32-0500 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
Jan 25 18:28:47 odin systemd[1]: Stopping Service for snap application lxd.daemon...
Jan 25 18:28:48 odin lxd.daemon[20847]: => Stop reason is: snap removal
Jan 25 18:28:48 odin lxd.daemon[20847]: ==> Removing bash completion hook
Jan 25 18:28:48 odin lxd.daemon[20847]: => Stopping LXD (with container shutdown)

additional info.

root@odin:/var/snap# snap remove lxd
error: cannot perform the following tasks:
- Remove data for snap "lxd" (9874) (remove /var/snap/lxd/common/ns/mntns: device or resource busy)

root@odin:/var/snap# lsof /var/snap/lxd/common/ns/mntns
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
lxcfs   3960 root    5r   REG    0,3        0 4026532374 mnt
dockerd 4035 root   11r   REG    0,3        0 4026531993 net

If you want to unblock the removal, unmounting lazily that path (and any other reported subsequently) should do the trick:

umount -l /var/snap/lxd/common/ns/mntns
1 Like

@stgraber – many thanks, worked as expected after this.

root@odin:/var/snap# umount -l /var/snap/lxd/common/ns
root@odin:/var/snap# snap remove lxd
lxd removed