4.0.0 Bug 'cannot mount' filesystem [Solved]

I discovered that 4.0.0 had been released because some of my LXD hosts wouldn’t start correctly. Looking in /var/snap/lxd/common/lxd/logs/lxd.log a common error was

t=2020-04-03T08:47:04+1000 lvl=eror msg="Failed to start the daemon: Failed to mount images storage: Failed to mount storage volume \"default/images\": 
    Failed to run: zfs mount default/custom/default_images: cannot mount 'default/custom/default_images': filesystem already mounted"

Any way to fix this w/o rebooting the host?

The solution was:

  1. Find where that zfs volume is mounted inside the namespace:
    nsenter --mount=/run/snapd/ns/lxd.mnt cat /proc/self/mountinfo | grep default/custom/default_image
    
  2. Grab the location it’s reporting as mounted - in this case /var/snap/lxd/common/shmounts… etc.
    2236 2406 0:58 / /var/snap/lxd/common/shmounts/storage-pools/default/custom/default_images rw shared:298 - zfs default/custom/default_images rw,xattr,posixacl
    
  3. Manually unmount it
    nsenter --mount=/run/snapd/ns/lxd.mnt umount /var/snap/lxd/common/shmounts/storage-pools/default/custom/default_images
    

Then try snap start lxd and it should start working. If not, check for more errors in /var/snap/lxd/common/lxd/logs/lxd.log - on one of my LXD hosts, I had to umount both default_images and default_backups before lxd would start.

Try unmounting it.

nsenter --mount=/run/snapd/ns/lxd.mnt umount /var/snap/lxd/common/lxd/storage-pools/default/custom/default_images

You’ll probably want to do the same for default_backups if that’s a thing on your system.

Interestingly there is logic in place specifically to detect pre-existing mounts so it’s all a bit weird that this error happens in the first place.

Before unmounting it, can you get:

  • nsenter --mount=/run/snapd/ns/lxd.mnt cat /proc/self/mountinfo
  • nsenter --mount=/run/snapd/ns/lxd.mnt stat -f /var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
  • nsenter --mount=/run/snapd/ns/lxd.mnt stat -f /var/snap/lxd/common/lxd/storage-pools/default/custom
  • journalctl -u snap.lxd.daemon -n 300

Well, I’m getting that information from here:

root@lxc-07:/var/snap/lxd/common/lxd/logs# tail lxd.log
t=2020-04-03T08:47:04+1000 lvl=info msg="Initializing storage pools"
t=2020-04-03T08:47:04+1000 lvl=info msg="Initializing daemon storage mounts"
t=2020-04-03T08:47:04+1000 lvl=eror msg="Failed to start the daemon: Failed to mount images storage: Failed to mount storage volume \"default/images\": Failed to run: zfs mount default/custom/default_images: cannot mount 'default/custom/default_images': filesystem already mounted"
t=2020-04-03T08:47:04+1000 lvl=info msg="Starting shutdown sequence"
t=2020-04-03T08:47:04+1000 lvl=info msg="Stopping REST API handler:"
t=2020-04-03T08:47:04+1000 lvl=info msg=" - closing socket" socket=[::]:8443
t=2020-04-03T08:47:04+1000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-04-03T08:47:04+1000 lvl=info msg="Stopping /dev/lxd handler:"
t=2020-04-03T08:47:04+1000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-04-03T08:47:04+1000 lvl=info msg="Closing the database"
root@lxc-07:/var/snap/lxd/common/lxd/logs# date
Fri  3 Apr 10:53:50 AEST 2020
root@lxc-07:/var/snap/lxd/common/lxd/logs#

journalctl doesn’t seem to provide any useful information

Apr 03 08:46:56 lxc-07 systemd[1]: Started Service for snap application lxd.daemon.
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: => Preparing the system (14194)
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Loading snap configuration
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Setting up mntns symlink (mnt:[4026533153])
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Setting up kmod wrapper
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Preparing /boot
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Preparing a clean copy of /run
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Preparing a clean copy of /etc
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Setting up ceph configuration
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Setting up LVM configuration
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Rotating logs
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Setting up ZFS (0.7)
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Escaping the systemd cgroups
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ====> Detected cgroup V1
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Escaping the systemd process resource limits
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: ==> Disabling shiftfs on this kernel (auto)
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: => Re-using existing LXCFS
Apr 03 08:46:56 lxc-07 lxd.daemon[123600]: => Starting LXD
Apr 03 08:46:57 lxc-07 lxd.daemon[123600]: t=2020-04-03T08:46:57+1000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
Apr 03 08:46:58 lxc-07 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 03 08:46:58 lxc-07 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 03 08:46:59 lxc-07 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Apr 03 08:46:59 lxc-07 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
Apr 03 08:46:59 lxc-07 systemd[1]: Stopped Service for snap application lxd.daemon.

There are 209 entries in mountinfo

root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt cat 
/proc/self/mountinfo | wc -l
209
root@lxc-07:/var/snap/lxd/common/lxd/logs#

Most of them are

2356 2355 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro
2357 2356 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro
2358 2357 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro
2359 2358 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro
2360 2359 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro
2361 2360 7:0 /wrappers/kmod /bin/kmod ro,nodev,relatime master:31 - squashfs /dev/loop0 ro


root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt stat -f /var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
  File: "/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images"
    ID: 4e213bd0db28aa77 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 51085834   Free: 40580605   Available: 37968177
Inodes: Total: 13041664   Free: 12929954
root@lxc-07:/var/snap/lxd/common/lxd/logs#

root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt stat -f /var/snap/lxd/common/lxd/storage-pools/default/custom
  File: "/var/snap/lxd/common/lxd/storage-pools/default/custom"
    ID: 4e213bd0db28aa77 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 51085834   Free: 40580605   Available: 37968177
Inodes: Total: 13041664   Free: 12929954
root@lxc-07:/var/snap/lxd/common/lxd/logs#

Ok, so it is in fact not mounted…

Can you show those mountinfo entries please?

root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt umount /var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
umount: /var/snap/lxd/common/lxd/storage-pools/default/custom/default_images: not mounted
root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt cat /proc/self/mountinfo | grep default_ima
2236 2406 0:58 / /var/snap/lxd/common/shmounts/storage-pools/default/custom/default_images rw shared:298 - zfs default/custom/default_images rw,xattr,posixacl
root@lxc-07:/var/snap/lxd/common/lxd/logs#

So that command doesn’t seem to work, would you like me to try with shmounts?

Edit: I took ‘no answer’ as ‘I’m busy’ so I tried it anyway. Unmounted happily.

root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt cat /proc/self/mountinfo | grep default_ima
2236 2406 0:58 / /var/snap/lxd/common/shmounts/storage-pools/default/custom/default_images rw shared:298 - zfs default/custom/default_images rw,xattr,posixacl
root@lxc-07:/var/snap/lxd/common/lxd/logs# nsenter --mount=/run/snapd/ns/lxd.mnt umount /var/snap/lxd/common/shmounts/storage-pools/default/custom/default_images
root@lxc-07:/var/snap/lxd/common/lxd/logs#

And now it’s starting!

root@lxc-07:/var/snap/lxd/common/lxd/logs# snap start lxd
Started.
root@lxc-07:/var/snap/lxd/common/lxd/logs# lxc list
+--------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
|     NAME     |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+--------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| ehsg         | RUNNING |                      |                                               | CONTAINER | 0         |
+--------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
(etc etc)...

So, I guess the important question is, if you said that it SHOULD detect it, is it not looking for the correct shmounts path?

Final edit, I noticed that Discourse hasn’t picked up that I’m replying to you @stgraber so I’m going to ping you directly. Sorry 8)

So the fact that it landed in shmounts suggests that there has been an error moving some mounts in the past.

Can you show the output of journalctl -u snap.lxd.daemon -n 3000

Hopefully I can locate the start where that failed and see if we got lucky enough to have some recorded output.

I’m having a look now. The oldest entry is here, and I’ll go through and try to find anything different after that (edits to come).

-- Logs begin at Sat 2018-10-27 14:19:07 AEST, end at Fri 2020-04-03 11:32:56 AEST. --
Feb 18 06:27:40 lxc-10 systemd[1]: Stopping Service for snap application lxd.daemon...
Feb 18 06:27:42 lxc-10 lxd.daemon[111999]: => Stop reason is: snap refresh
Feb 18 06:27:42 lxc-10 lxd.daemon[111999]: => Stopping LXD
Feb 18 06:27:43 lxc-10 systemd[1]: Stopped Service for snap application lxd.daemon.
Feb 18 06:27:49 lxc-10 systemd[1]: Started Service for snap application lxd.daemon.
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: => Preparing the system (13390)
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: ==> Loading snap configuration
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: ==> Setting up mntns symlink (mnt:[4026532868])
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Feb 18 06:27:49 lxc-10 lxd.daemon[112469]: ==> Setting up persistent shmounts path
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: Failed to mount new mntns: Invalid argument
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ====> Failed to setup shmounts, continuing without
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ====> Making LXD shmounts use the persistent path
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ====> Making LXCFS use the persistent path
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Setting up kmod wrapper
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Preparing /boot
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Preparing a clean copy of /run
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Preparing a clean copy of /etc
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Setting up ceph configuration
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Setting up LVM configuration
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Rotating logs
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Setting up ZFS (0.7)
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Escaping the systemd cgroups
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ====> Detected cgroup V1
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Escaping the systemd process resource limits
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: ==> Disabling shiftfs on this kernel (auto)
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: => Starting LXCFS
Feb 18 06:27:50 lxc-10 lxd.daemon[112469]: => Starting LXD

It looked like I rebooted the machine after that, and that error wasn’t there after the reboot

Feb 20 20:48:09 lxc-10 lxd.daemon[66081]: => Stopping LXCFS
Feb 20 20:48:11 lxc-10 systemd[1]: Stopped Service for snap application lxd.daemon.
-- Reboot --
Feb 20 20:48:51 lxc-10 systemd[1]: Started Service for snap application lxd.daemon.
Feb 20 20:48:51 lxc-10 lxd.daemon[1733]: => Preparing the system (13439)
Feb 20 20:48:51 lxc-10 lxd.daemon[1733]: ==> Loading snap configuration
Feb 20 20:48:51 lxc-10 lxd.daemon[1733]: ==> Setting up mntns symlink (mnt:[4026532685])
Feb 20 20:48:51 lxc-10 lxd.daemon[1733]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Feb 20 20:48:51 lxc-10 lxd.daemon[1733]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ==> Setting up persistent shmounts path
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ====> Making LXD shmounts use the persistent path
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ====> Making LXCFS use the persistent path
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ==> Setting up kmod wrapper
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ==> Preparing /boot
Feb 20 20:48:52 lxc-10 lxd.daemon[1733]: ==> Preparing a clean copy of /run

I’ve sent you a PM with the complete journalctl output, and I was hoping to find something in /var/snap/lxd/common/lxd/logs but it looks like they get rotated every time lxd starts, instead of when they’re over a certain size - you’re missing a ‘size’ in etc/logrotate.conf.

Sadly, because of that, all I have is 7 files from this morning that all are tiny and all say pretty much the same thing

t=2020-04-03T06:51:30+1000 lvl=info msg="LXD 4.0.0 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2020-04-03T06:51:30+1000 lvl=info msg="Kernel uid/gid map:"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - u 0 0 4294967295"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - g 0 0 4294967295"
t=2020-04-03T06:51:30+1000 lvl=info msg="Configured LXD uid/gid map:"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - u 0 1000000 1000000000"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - g 0 1000000 1000000000"
t=2020-04-03T06:51:30+1000 lvl=info msg="Kernel features:"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - netnsid-based network retrieval: no"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - uevent injection: no"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - seccomp listener: no"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - seccomp listener continue syscalls: no"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - cgroup layout: hybrid"
t=2020-04-03T06:51:30+1000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-04-03T06:51:30+1000 lvl=info msg=" - shiftfs support: disabled"
t=2020-04-03T06:51:30+1000 lvl=info msg="Initializing local database"
t=2020-04-03T06:51:31+1000 lvl=info msg="Starting /dev/lxd handler:"
t=2020-04-03T06:51:31+1000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-04-03T06:51:31+1000 lvl=info msg="REST API daemon:"
t=2020-04-03T06:51:31+1000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-04-03T06:51:31+1000 lvl=info msg=" - binding TCP socket" socket=[::]:8443
t=2020-04-03T06:51:31+1000 lvl=info msg="Initializing global database"
t=2020-04-03T06:51:31+1000 lvl=info msg="Firewall loaded driver \"xtables\""
t=2020-04-03T06:51:31+1000 lvl=info msg="Initializing storage pools"
t=2020-04-03T06:51:31+1000 lvl=info msg="Initializing daemon storage mounts"
t=2020-04-03T06:51:31+1000 lvl=eror msg="Failed to start the daemon: Failed to mount backups storage: Failed to mount storage volume \"default/backups\": Failed to run: zfs mount default/custom/default_backups: cannot mount 'default/custom/default_backups': filesystem already mounted"
t=2020-04-03T06:51:31+1000 lvl=info msg="Starting shutdown sequence"
t=2020-04-03T06:51:31+1000 lvl=info msg="Stopping REST API handler:"
t=2020-04-03T06:51:31+1000 lvl=info msg=" - closing socket" socket=[::]:8443
t=2020-04-03T06:51:31+1000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-04-03T06:51:31+1000 lvl=info msg="Stopping /dev/lxd handler:"
t=2020-04-03T06:51:31+1000 lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-04-03T06:51:31+1000 lvl=info msg="Closing the database"