LXC Snapshot and File pull are throwing permissions error in LXD 4.9

Hi,

We have been facing issues with the LXD version 4.9 since last 6-8 hours, all lxc file push/pull and snapshot operations are throwing error, for example;

root@test~#  /var/snap/lxd/common/lxd/storage-pools/zfs/containers# lxc file pull elenortown/var/www/html/wp-config.php .
Error: Failed to run: zfs mount zfs/containers/elenortown: cannot mount 'zfs/containers/elenortown': filesystem already mounted

lxd --version

4.9

lxc info --show-log elenortown

Name: elenortown
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/11/13 09:31 UTC
Status: Running
Type: container
Profiles: default
Pid: 7427
Ips:
  eth0: inet    10.172.193.234  vetha6c3fd21
  eth0: inet6   fd42:d05f:acb7:8f2b:216:3eff:fef4:3d06  vetha6c3fd21
  eth0: inet6   fe80::216:3eff:fef4:3d06        vetha6c3fd21
  lo:   inet    127.0.0.1
  lo:   inet6   ::1
Resources:
  Processes: 47
  Disk usage:
    root: 450.54MB
  CPU usage:
    CPU usage (in seconds): 1109
  Memory usage:
    Memory (current): 151.02MB
    Memory (peak): 296.74MB
  Network usage:
    eth0:
      Bytes received: 67.98MB
      Bytes sent: 1.07MB
      Packets received: 1160229
      Packets sent: 11649
    lo:
      Bytes received: 69.75kB
      Bytes sent: 69.75kB
      Packets received: 658
      Packets sent: 658
Snapshots:
  elenortown_2020-12-17_18:45:29 (taken at 2020/12/17 18:45 UTC) (expires at 2020/12/24 18:45 UTC) (stateless)
  elenortown_2020-12-17_19:08:51 (taken at 2020/12/17 19:08 UTC) (expires at 2020/12/24 19:08 UTC) (stateless)

Log:

lxc elenortown 20201121135814.497 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.elenortown"
lxc elenortown 20201121135814.509 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.elenortown"
lxc elenortown 20201121135814.509 ERROR    utils - utils.c:lxc_can_use_pidfd:1846 - Kernel does not support pidfds
lxc elenortown 20201121135814.511 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1573 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 0 to uid -1 and gid 65534
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 0
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 1 to uid -1 and gid 65534
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 1
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 2 to uid -1 and gid 65534
lxc elenortown 20201122001102.894 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 2
lxc elenortown 20201122001102.894 WARN     attach - attach.c:attach_child_main:882 - Failed to adjust stdio permissions
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 0 to uid -1 and gid 65534
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 0
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 1 to uid -1 and gid 65534
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 1
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 2 to uid -1 and gid 65534
lxc elenortown 20201122072935.643 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 2
lxc elenortown 20201122072935.643 WARN     attach - attach.c:attach_child_main:882 - Failed to adjust stdio permissions
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 0 to uid -1 and gid 65534
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 0
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 1 to uid -1 and gid 65534
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 1
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1906 - Operation not permitted - Failed to chown standard I/O file descriptor 2 to uid -1 and gid 65534
lxc elenortown 20201123001102.541 WARN     utils - utils.c:fix_stdio_permissions:1912 - Operation not permitted - Failed to chmod standard I/O file descriptor 2
lxc elenortown 20201123001102.541 WARN     attach - attach.c:attach_child_main:882 - Failed to adjust stdio permissions

I am facing same issue in more than 100 containers. However, in some of containers, I see same permission errors but the lxc file push/pull and snapshot etc all commands are running.

Can anyone help identify the fixes please?

If you restart a container does it fix it?

The ZFS storage driver has a call to check whether the volume is already mounted before attempting to run zfs mount here:

So if thats failing to detect that the volume is already mounted it suggests that the OS has somehow lost track of its mounts. Perhaps it may be an issue with the snap package’s mount namespace.

Can you enter into the snap package’s mount namespace and show what you can see:

On the LXD host:

ps aux | grep lxd # find the PID of the main LXD process
sudo nsenter -t <LXD PID> -m mount | grep <container name>

On my fresh system with a single running container on a ZFS pool I see:

sudo nsenter -t 4225 -m mount | grep c1
zfs/containers/c1 on /var/snap/lxd/common/lxd/storage-pools/zfs/containers/c1 type zfs (rw,xattr,posixacl)

Thanks @tomp restarting the container doesn’t solve the problem. We tried to grab all the outputs as you suggested, please check:

root@~:~# lxc snapshot north-dion
Error: Create instance snapshot (mount source): Failed to run: zfs mount default/containers/north-dion: cannot mount 'default/containers/north-dion': filesystem already mounted
root@~:~# ps aux | grep lxd | grep north-dion
root     30710  0.0  0.0 1232964 11888 ?       Ss   Dec07   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers north-dion
root@prod-us-west1-b-001:~# nsenter -t 30710 -m mount | grep north-dion
default/containers/north-dion on /var/snap/lxd/common/shmounts/storage-pools/default/containers/north-dion type zfs (rw,xattr,posixacl)
root@~:~# lxc restart north-dion
Error: Failed preparing container for start: Failed to run: zfs mount default/containers/north-dion: cannot mount 'default/containers/north-dion': filesystem already mounted
Try `lxc info --show-log north-dion` for more info

Can you nsenter on the lxd process rather than the lxc monitor process please (I think they are in same mountns but worth being sure anyway).

The PID returned from lxc info | grep server_pid

Also do you see the mount of the container on the LXD host normally (outside of the LXD package’s mount namespace)?

Also, take a look at a similar issue:

If you stop the container, then run the suggested steps in that post it should show what is still keeping the mount open and allow you to unmount it.

root@prod-us-west1-b-001:~# lxc info | grep server_pid
  server_pid: 18044
root@prod-us-west1-b-001:~# nsenter -t 18044 -m mount | grep north-dion
default/containers/north-dion on /var/snap/lxd/common/shmounts/storage-pools/default/containers/north-dion type zfs (rw,xattr,posixacl)
1 Like

I run the commands but as shown in output there are no folders in this path in my installation:
This is blank
/var/snap/lxd/common/shmounts/

All storage pools are defined here instead:

/var/snap/lxd/common/lxd/storage-pools/default/containers/

Have you tried unmounting (after stopping the container) and then trying to start it?

nsenter -t PID -m umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/north-dion

Yes it shows this:

nsenter -t 18044 -m umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/north-dion
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/north-dion: no mount point specified.

I think this is because my storage pools are defined here actually:

/var/snap/lxd/common/lxd/storage-pools/default/containers/north-dion

I tried this and it is saying no points:
nsenter -t 18044 -m umount /var/snap/lxd/common/lxd/storage-pools/default/containers/north-dion
umount: /var/snap/lxd/common/lxd/storage-pools/default/containers/north-dion: not mounted.

Thats normal where your storage pools are. But also your /var/snap/lxd/common/shmounts directory seems to have become out of sync.

I’m not familiar with how the /var/snap/lxd/common/shmounts directory works.

However there is a mention of a similar issue in the past here:

Also see

@tomp I tried umount commands but no luck

What about re-creating the directory and removing the symlinks I suggested?

Yes I tried creating the directory /var/snap/lxd/common/shmounts/instances and then removing the symbolic link /var/snap/lxd/common/lxd/shmounts . but it is not working. I didn’t run the snap lxd restart or any other commands as I am afraid my other running containers will stop.

OK well I think we will have to wait until @stgraber comes online to help with this as I don’t know what else to suggest I’m afraid. It seems the snap package’s mount namespace has become confused.