Stuck "Remapping container filesystem" for 30 hours

kpfa · November 11, 2021, 3:09pm

Two days ago: the lxd snap was stuck complaining that schema version 'xx' is more recent than expected 'xx

snap refresh didn’t help, as the snap was already on the most recent stable.

I removed the lxd snap and made a backup of and cleared the /var/snap/lxd/common/lxd/ folder and went for a lxd recover. lxd was able to import all containers and all of them booted except one.

a lxc start mastodon has triggered a Remapping container filesystem that’s been running for the past day and a half.

lxc operation list
+--------------------------------------+------+-------------------+---------+------------+----------------------+
|                  ID                  | TYPE |    DESCRIPTION    | STATUS  | CANCELABLE |       CREATED        |
+--------------------------------------+------+-------------------+---------+------------+----------------------+
| 679ac120-901a-497e-9b9e-c7d9e383a516 | TASK | Starting instance | RUNNING | NO         | 2021/11/10 06:25 UTC |
+--------------------------------------+------+-------------------+---------+------------+----------------------+

There is no reference to the container name in /var/snap/lxd/common/lxd/logs/lxd.log.
The log files are 0 bytes long in /var/snap/lxd/common/lxd/logs/mastodon

I tried to lxc copy the container to another host and start it there. same issue, stuck in Remapping container filesystem. And I gave it about 30 hours before I posted.

Any advice?
Thanks!

stgraber · November 11, 2021, 3:49pm

Hmm, it would be interesting to run strace -f -p PID where PID is the LXD PID to see if it’s actually still doing the remapping.

If you’re on slow-ish (non-SSD) storage and your container has a LOT of files, this can be a very slow process, though over a day would be a new record. The worst I’ve personally seen was a couple of hours when dealing with a few million files.

kpfa · November 11, 2021, 4:06pm

The host is an 8 drive ZFS mirrored SSD totaling 7Tb. The container has 1.18TB of files, but there is likely * alot * of small files in that container because it’s a mastodon instance.

strace sure does show a lot … Here’s a snippet before ctrl-C:

[pid 24050] <… openat resumed> ) = 21 [44/92028]
[pid 24050] epoll_ctl(5, EPOLL_CTL_ADD, 21, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3156981104, u64=140379868083568}}) = -1 EPERM (Operation not permitted)
[pid 23616] <… nanosleep resumed> NULL) = 0
[pid 24050] getdents64(21 <unfinished …>
[pid 23616] nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0
[pid 23616] nanosleep({tv_sec=0, tv_nsec=40000}, NULL) = 0
[pid 24050] <… getdents64 resumed> , /* 3 entries /, 8192) = 80
[pid 23616] nanosleep({tv_sec=0, tv_nsec=80000}, <unfinished …>
[pid 24050] getdents64(21 <unfinished …>
[pid 23616] <… nanosleep resumed> NULL) = 0
[pid 23616] nanosleep({tv_sec=0, tv_nsec=160000}, <unfinished …>
[pid 24050] <… getdents64 resumed> , / 0 entries */, 8192) = 0
[pid 24050] close(21) = 0
[pid 24050] newfstatat(AT_FDCWD, “/var/snap/lxd/common/lxd/storage-pools/tank/containers/mastodon/rootfs/home/mastodon/live/public/system/cache/preview_cards/images/000/603/894”, {st_mode=S_IFDIR|07
55, st_size=3, …}, AT_SYMLINK_NOFOLLOW) = 0
[pid 23616] <… nanosleep resumed> NULL) = 0
[pid 23616] getpid() = 23610
[pid 23616] tgkill(23610, 24050, SIGURG) = 0
[pid 24050] — SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=23610, si_uid=0} —
[pid 23616] nanosleep({tv_sec=0, tv_nsec=320000}, <unfinished …>
[pid 24050] rt_sigreturn({mask=}) = 1001
[pid 23653] llistxattr(“/var/snap/lxd/common/lxd/storage-pools/tank/containers/mastodon/rootfs/home/mastodon/live/public/system/cache/media_attachments/files/105/663/810/122/982/241/original”, NULL, 0) = 0
[pid 23653] openat(AT_FDCWD, “/var/snap/lxd/common/lxd/storage-pools/tank/containers/mastodon/rootfs/home/mastodon/live/public/system/cache/media_attachments/files/105/663/810/122/982/241/original”,
O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH <unfinished …>
[pid 23616] <… nanosleep resumed> NULL) = 0
[pid 23653] <… openat resumed> ) = 21
[pid 23616] nanosleep({tv_sec=0, tv_nsec=640000}, <unfinished …>
[pid 23653] readlink(“/proc/self/fd/21”, “/var/snap/lxd/common/lxd/storage”…, 4096) = 166
[pid 23653] fstat(21, {st_mode=S_IFDIR|0755, st_size=3, …}) = 0
[pid 23653] fchownat(21, “”, 1001001, 1001001, AT_SYMLINK_NOFOLLOW|AT_EMPTY_PATH) = 0
[pid 23653] chmod(“/proc/self/fd/21”, 040755) = 0
[pid 23653] close(21) = 0
[pid 23653] getxattr(“/var/snap/lxd/common/lxd/storage-pools/tank/containers/mastodon/rootfs/home/mastodon/live/public/system/cache/media_attachments/files/105/663/810/122/982/241/original”, “system
.posix_acl_access”^C, 0x7fac8bffeb20, 132) = -1 ENODATA (No data available)
[pid 23653] stat(“/var/snap/lxd/common/lxd/storage-pools/tank/containers/mastodon/rootfs/home/mastodon/live/public/system/cache/media_attachments/files/105/663/810/122/982/241/original”, strace: Process 23610 detached

stgraber · November 11, 2021, 4:48pm

Right so that does show shifting in progress. It needs to look at every single file, look at extended attributes on them (for fscaps and ACLs), look at the owner and then rewrite those with the shifted value.

It seems to be currently shifting a very very large cache directory.

The good news is that this entire shifting business is going away soon. We’ve had shiftfs available for a while which you could use to not need any shifting (requires `snap set lxd shiftfs.enable=true && systemctl reload snap.lxd.daemon) but that’s got a bunch of bugs which is why it’s not enabled by default.

We’ve then contributed VFS idmapped mounts to the Linux kernel in 5.12 and @brauner has been busy extending this work with additional supported filesystems. Currently mainline supports ext4, xfs, vfat and btrfs. We’re working on cephfs and then zfs with the hope that Ubuntu 22.04 can ship with all of those and with shiftfs removed.

kpfa · November 11, 2021, 4:57pm

Appreciate it and all the hard work that goes into the project. I guess we’ll just wait out the operation and report if there’s any errors or if the container doesn’t start.

Might be time for NVMe drives or a different design w/ a separate dir storage pool that’s backed by NFS for the media dir.

kpfa · November 12, 2021, 3:47pm

Update: it’s still going (according to strace). We’d like to kill the operation as it was faster to recreate the mastodon install, copy the files and db over, and check permissions than it was to wait for this operation.

What’s the best way to interrupt? kill -9 PID? we no longer care about the container as well.