Snapd can't remove old revisions when running inside LXD

kyrofa · August 24, 2017, 4:22pm

I’ve got a bit of a weird issue, and I’m hoping someone can help me poke at it.

I have snapd running in an unprivileged LXD container (note that LXD is not a snap). LXD is version 2.12.

Once a specific snap has three revisions installed, upon the next update, snapd removes the oldest revision (thereby keeping only three installed). However, when running within a container, snapd is unable to unmount the oldest snap, which prevents the actual update from ever taking place.

Let’s say I have three revisions of the Nextcloud snap installed:

$ ls -l /snap/nextcloud/
total 0
drwxr-xr-x 24 root root 0 Apr 25 17:44 1212
drwxr-xr-x 24 root root 0 Apr 30 08:10 1284
drwxr-xr-x 24 root root 0 May 16 20:27 1337
lrwxrwxrwx  1 root root 4 May 16 21:07 current -> 1337

Note that the oldest is 1212. Once I update, I expect that revision to be gone. This is what happens:

$ sudo snap refresh --candidate nextcloud
2017-05-25T22:30:53Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop snap-nextcloud-1212.mount] failed with exit status 1: Job for snap-nextcloud-1212.mount failed. See "systemctl status snap-nextcloud-1212.mount" and "journalctl -xe" for details.

2017-05-25T22:33:53Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
2017-05-25T22:36:54Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
2017-05-25T22:39:54Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted

This goes on forever. The only way to make it work is to go into /var/lib/snapd/snaps, manually move the old snap out of the way, reboot, and snapd finally notices it can proceed.

I have no idea why this happens, but I suspect it has something to do with squashfuse. Check this out:

$ mount | grep 1212
squashfuse on /snap/nextcloud/1212 type fuse.squashfuse (ro,nodev,relatime,user_id=0,group_id=0,allow_other)
$ sudo umount /snap/nextcloud/1212
umount: /snap/nextcloud/1212: not mounted

Huh? Or this:

$ fusermount -u /snap/nextcloud/1212 
fusermount: entry for /snap/nextcloud/1212 not found in /etc/mtab
$ grep 1212 /etc/mtab 
squashfuse /snap/nextcloud/1212 fuse.squashfuse ro,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

I’m at the end of my rope. Any advice would be appreciated!

stgraber · August 24, 2017, 9:35pm

I’m unable to reproduce this here with an up to date Ubuntu 16.04 container:

root@nextcloud:~# ls -lh /snap/nextcloud/
total 512
drwxr-xr-x 24 root root 0 Aug 23 22:36 2707
drwxr-xr-x 24 root root 0 Aug 24 18:05 2718
drwxr-xr-x 24 root root 0 Aug 24 18:13 2719
lrwxrwxrwx  1 root root 4 Aug 24 21:31 current -> 2718
root@nextcloud:~# snap refresh nextcloud --channel=12/edge
nextcloud (12/edge) 12-2017-08-24 from 'nextcloud' refreshed
root@nextcloud:~# ls -lh /snap/nextcloud/
total 512
drwxr-xr-x 24 root root 0 Aug 23 22:36 2707
drwxr-xr-x 24 root root 0 Aug 24 18:05 2718
drwxr-xr-x 24 root root 0 Aug 24 19:13 2729
lrwxrwxrwx  1 root root 4 Aug 24 21:34 current -> 2729

stgraber · August 24, 2017, 9:41pm

Ah, rebooting the container seems to cause the problem here.

stgraber · August 24, 2017, 10:00pm

Looks like a mount ordering issue in snapd/systemd.

root@nextcloud:~# cat /proc/self/mountinfo | grep fuse
285 282 0:6 /fuse /dev/fuse rw,nosuid,relatime master:2 - devtmpfs udev rw,size=8060160k,nr_inodes=2015040,mode=755
289 284 0:45 / /sys/fs/fuse/connections rw,relatime master:30 - fusectl fusectl rw
300 283 0:51 /proc/cpuinfo /proc/cpuinfo rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
301 283 0:51 /proc/diskstats /proc/diskstats rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
302 283 0:51 /proc/meminfo /proc/meminfo rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
303 283 0:51 /proc/stat /proc/stat rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
304 283 0:51 /proc/swaps /proc/swaps rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
305 283 0:51 /proc/uptime /proc/uptime rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
307 280 0:51 / /var/lib/lxcfs rw,nosuid,nodev,relatime master:104 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
246 280 0:82 / /snap/core/2462 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
247 280 0:83 / /snap/nextcloud/2718 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
248 280 0:84 / /snap/nextcloud/2729 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
250 280 0:85 / /snap/nextcloud/2719 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
251 280 0:86 / /snap/nextcloud/2707 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
253 252 0:82 / /snap/core/2462 ro,nodev,relatime shared:96 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
254 252 0:83 / /snap/nextcloud/2718 ro,nodev,relatime shared:97 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
256 252 0:84 / /snap/nextcloud/2729 ro,nodev,relatime shared:114 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
258 252 0:85 / /snap/nextcloud/2719 ro,nodev,relatime shared:117 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
259 252 0:86 / /snap/nextcloud/2707 ro,nodev,relatime shared:136 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other

As you can see, all mount entries under /snap show up duplicated with the only difference that the second entry shows as “shared:XYZ”. It looks like this is caused by a bind-mount of /snap over /snap as is proved below:

root@nextcloud:~# umount /snap/nextcloud/2707
umount: /snap/nextcloud/2707: not mounted
root@nextcloud:~# umount -l /snap/
root@nextcloud:~# umount /snap/nextcloud/2707

The root of the problem is that the code which was added to snap-confine to remount /snap MS_SHARED is run AFTER systemd has setup all the .mount units, effectively masking the underlying mount and preventing the unmount unless you get rid of the /snap overmount first.

You can see the bad mount ordering here:

root@nextcloud:~# cat /proc/self/mountinfo | grep \/snap
248 294 0:85 / /snap/nextcloud/2719 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
250 294 0:84 / /snap/core/2462 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
261 294 0:91 / /snap/nextcloud/2729 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
262 294 0:93 / /snap/nextcloud/2718 ro,nodev,relatime - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
263 294 0:43 /rootfs/snap /snap rw,relatime shared:117 master:27 - zfs castiana/lxd/containers/nextcloud rw,xattr,posixacl
264 263 0:85 / /snap/nextcloud/2719 ro,nodev,relatime shared:136 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
265 263 0:84 / /snap/core/2462 ro,nodev,relatime shared:153 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
266 263 0:91 / /snap/nextcloud/2729 ro,nodev,relatime shared:154 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
267 263 0:93 / /snap/nextcloud/2718 ro,nodev,relatime shared:164 - fuse.squashfuse squashfuse ro,user_id=0,group_id=0,allow_other
269 214 0:74 /snapd/ns /run/snapd/ns rw,nosuid,nodev - tmpfs tmpfs rw,mode=755,uid=1000000,gid=1000000
419 269 0:3 mnt:[4026532805] /run/snapd/ns/nextcloud.mnt rw - nsfs nsfs rw

Note that this isn’t container-specific. The same will happen on any system where / isn’t MS_SHARED itself and where snap-confine will therefore overmount /snap.

The remount of /snap MUST happen BEFORE any mount is setup under it.

Please file a bug against snap-confine and point here for the explanation.

PS: This only happens after a restart because that’s when the mis-ordering will happen. Anything which is mounted after the /snap overmount will be perfectly unmountable which is why I couldn’t initially reproduce the issue.

kyrofa · August 24, 2017, 10:16pm

Thank you @stgraber, I appreciate your time. I’ve been banging my head against this one for far too long! I logged LP: #1712930.

Sebastian_Run · March 5, 2022, 10:48am

hey
Is it possible to append just rw to this line in the file without re-mounting:
snapfuse /snap/nextcloud/29223 fuse.snapfuse ro,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

ro to change on rw

greetings
SW

stgraber · March 6, 2022, 11:04pm

squashfs is a read-only filesystem, so I don’t see how you could ever get a read-write mount for it.

Sebastian_Run · March 7, 2022, 8:59am

I would like to replace the logo.svg on the login panel and the wallpaper and inside the logo on the left panel … Maybe there is some other way?

regards
sw

stgraber · March 7, 2022, 3:28pm

Snap packages are by-design read-only, so you can’t really change anything inside of /snap/XYZ.

I don’t know Nextcloud too much but there may be some theme engine of some kind that can be used to do this cleanly?

If you really need to modify files inside of the snap, you can do it temporarily by using bind-mounts followed by a snap restart, but this tends to all break on refreshes.

kyrofa · March 7, 2022, 4:11pm

We’re departing from the original topic, here. @Sebastian_Run as @stgraber mentioned, Nextcloud does indeed have a theming app that you can utilize for such things. Hacking around the read-only nature of squashfs will only lead to pain. If you need some help you’re welcome to log an issue against the Nextcloud snap, although we’re not pros at theming either.

Sebastian_Run · April 15, 2022, 12:14pm

I did it in the theme app thx

I did not notice that there is such an application

best regards
SW