Systemctl status reports "degraded" inside lxd container

candlerb · February 23, 2022, 9:35am

Looking inside an lxd container launched from image “ubuntu:20.04”, systemctl says it is “degraded”:

root@vault1:~# systemctl status | head -4
● vault1
    State: degraded
     Jobs: 0 queued
   Failed: 1 units

And the problem is this service:

root@vault1:~# systemctl | grep fail
● systemd-remount-fs.service            loaded failed     failed    Remount Root and Kernel File Systems
root@vault1:~# systemctl status systemd-remount-fs
● systemd-remount-fs.service - Remount Root and Kernel File Systems
     Loaded: loaded (/lib/systemd/system/systemd-remount-fs.service; enabled-runtime; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2022-02-03 07:01:46 UTC; 2 weeks 6 days ago
       Docs: man:systemd-remount-fs.service(8)
             https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
   Main PID: 65 (code=exited, status=1/FAILURE)

Feb 03 07:01:46 vault1 systemd-remount-fs[73]: mount: /: can't find LABEL=cloudimg-rootfs.
Warning: journal has been rotated since unit was started, output may be incomplete.
root@vault1:~# cat /etc/fstab
LABEL=cloudimg-rootfs	/	 ext4	defaults	0 0
root@vault1:~#

Clearly, a container shouldn’t be attempting to mount its own root filesystem in the first place, and ideally the container images would be set up so no user tweaking is required.

I can solve it by commenting out that line in /etc/fstab by hand, and restarting the container.

However, does anybody know an “official” way to fix this at container creation time? I have tried the following in user.user-data:

#cloud-config
mounts: []

and

#cloud-config
mounts: [['LABEL=cloudimg-rootfs']]

and

#cloud-config
mounts: [['LABEL=cloudimg-rootfs',null]]

but none of these work. In the second and third cases, /var/log/cloud-init.log shows:

2022-02-23 09:00:43,217 - stages.py[DEBUG]: Running module mounts (<module 'cloudinit.config.cc_mounts' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_mounts.py'>) with frequency once-per-instance
2022-02-23 09:00:43,218 - handlers.py[DEBUG]: start: init-network/config-mounts: running config-mounts with frequency once-per-instance
2022-02-23 09:00:43,219 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/tst1/sem/config_mounts - wb: [644] 24 bytes
2022-02-23 09:00:43,220 - helpers.py[DEBUG]: Running config-mounts using lock (<FileLock using file '/var/lib/cloud/instances/tst1/sem/config_mounts'>)
2022-02-23 09:00:43,221 - cc_mounts.py[DEBUG]: mounts configuration is [['LABEL=cloudimg-rootfs']]
2022-02-23 09:00:43,221 - util.py[DEBUG]: Reading from /etc/fstab (quiet=False)
2022-02-23 09:00:43,222 - util.py[DEBUG]: Read 43 bytes from /etc/fstab
2022-02-23 09:00:43,222 - cc_mounts.py[DEBUG]: Attempting to determine the real name of LABEL=cloudimg-rootfs
2022-02-23 09:00:43,222 - cc_mounts.py[DEBUG]: changed LABEL=cloudimg-rootfs => None
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount LABEL=cloudimg-rootfs
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: changed default device ephemeral0 => None
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount ephemeral0
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap
2022-02-23 09:00:43,223 - cc_mounts.py[DEBUG]: changed default device swap => None
2022-02-23 09:00:43,224 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount swap
2022-02-23 09:00:43,224 - cc_mounts.py[DEBUG]: Skipping nonexistent device named LABEL=cloudimg-rootfs
2022-02-23 09:00:43,224 - cc_mounts.py[DEBUG]: no need to setup swap
2022-02-23 09:00:43,224 - cc_mounts.py[DEBUG]: No modifications to fstab needed
2022-02-23 09:00:43,224 - handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts ran successfully

“Read 43 bytes from /etc/fstab” suggests that the line is already there:

root@tst1:~# cat /etc/fstab
LABEL=cloudimg-rootfs	/	 ext4	defaults	0 1
root@tst1:~# wc -c /etc/fstab
43 /etc/fstab

But it’s not matching my rule. It looks like sanitize_devname in cloudinit/config/cc_mounts.py throws it away because it can’t find a device on the system with this name (and it’s not an otherwise recognized format like a network mount host:/path)

Any ideas?

I’m not sure whether to class this issue as:

A problem with the Ubuntu lxd image itself (containing an fstab entry that’s not needed in a container)
An issue with cloud-init’s cc_mount not being able to manage LABEL=… entries
Something else

tomp · February 23, 2022, 9:48am

Do you see this on a freshly launched container?

If so, please can you post a reproducer here (as not sure which repo an issue should be filed in yet)?

Thanks

candlerb · February 23, 2022, 10:09am

Yes indeed, it’s easily reproducible with a fresh container. Here’s the script I’ve been using:

#!/bin/bash -eu
lxc delete -f tst1 || true
lxc launch -c user.user-data="#cloud-config
mounts: [['LABEL=cloudimg-rootfs',null]]
runcmd:
  - touch /wombat
" ubuntu:20.04 tst1

The /wombat is just to give me 100% confirmation that cloud-init has run to completion.

# lxc exec tst1 bash
root@tst1:~# while [ ! -f /wombat ]; do sleep 1; done
root@tst1:~# cat /etc/fstab
LABEL=cloudimg-rootfs	/	 ext4	defaults	0 1
root@tst1:~# systemctl | grep fail
● systemd-remount-fs.service            loaded failed failed    Remount Root and Kernel File Systems

Host is Ubuntu 20.04 with lxd 4.23 from snap.

tomp · February 23, 2022, 10:26am

Does it occur without the cloud-init config?

What I’m trying to ascertain is is it an image problem or a config problem?

candlerb · February 23, 2022, 10:45am

Sure, you can reproduce happily without providing any cloud-init configuration:

root@nuc2:~# lxc launch ubuntu:20.04 tst1
Creating tst1
Starting tst1
root@nuc2:~# lxc shell tst1
root@tst1:~# systemctl status | head -4
● tst1
    State: starting
     Jobs: 9 queued
   Failed: 1 units
root@tst1:~# systemctl status | head -4
● tst1
    State: degraded
     Jobs: 0 queued
   Failed: 1 units
root@tst1:~# systemctl | grep fail
● systemd-remount-fs.service            loaded failed failed    Remount Root and Kernel File Systems
root@tst1:~# cat /etc/fstab
LABEL=cloudimg-rootfs	/	 ext4	defaults	0 1
root@tst1:~#

Preferably lxd images should work “out of the box” without tweaking. (The container does actually work, but it’s not great for health monitoring if systemd reports that the system is “degraded” when nothing serious is wrong)

The secondary problem is that the tweaking of fstab via cloud-init user-config doesn’t work, when the fs_spec is of the form LABEL=..., which rules out the manual workaround.

tomp · February 23, 2022, 11:24am

Yep thats why I’m asking

@monstermunchkin is this something you can look into please?

monstermunchkin · February 23, 2022, 11:38am

The ubuntu:20.04 image is not built by the LXD team. You could try images:ubuntu/20.04 in the meantime which should work flawlessly

tomp · February 23, 2022, 11:40am

Ah yeah good spot, do you think thats something we should report upstream?

candlerb · February 23, 2022, 12:01pm

Thanks, images:ubuntu/20.04 indeed doesn’t have this issue. I will need to get to grips with the differences though:

root@tst1:~# dpkg-query -l | wc -l
248

root@tst2:~# dpkg-query -l | wc -l
568

It looks like images:ubuntu/20.04 is only ubuntu-minimal, while ubuntu:20.04 has ubuntu-server and ubuntu-standard.

This gives a major problem for me: images:ubuntu/20.04 doesn’t have cloud-init, which means that my scripted container creation doesn’t work (I’ve tested it). The other things which are missing, like openssh-server, I could add easily if cloud-init were installed.

tomp · February 23, 2022, 12:03pm

You should use images:ubuntu/20.04/cloud for cloud-init.

candlerb · February 23, 2022, 12:08pm

That works nicely, thank you!

hi-ko · March 8, 2023, 10:29am

That can't find LABEL=cloudimg issue bites me on several containers since the latest snap refresh (lxd 5.8-bb9c9b1).
What changed in that lxd release to now fail on existing containers? As a work around I change all existing containers commenting out the mount in fstab (and additional I disable ipv6 link-local default in netplan to avoid failed systemd-networkd-wait-online service).

I have now learned that we should have created the container from images:ubuntu/*/cloud, but we can’t just undo that.
The question I’m asking myself: will this cause me more problems in the future, since ubuntu:x images may not be tested by the lxd team on future releases? This is not meant to be bashing, but an honest question.

candlerb · March 8, 2023, 10:41am

Can you describe how it “bites you” differently since upgrading to 5.8?

This problem has been there for a long time, but the only impact I observed was systemctl reporting the container state as “degraded”. Are you now seeing another problem as well?

The problem does go away if you comment out the spurious entry from /etc/fstab, correct?

I don’t see any likely future impact to existing containers, and if you switch your process for creating new ones to use images:ubuntu/* or images:ubuntu/*/cloud then you should be good.

hi-ko · March 8, 2023, 11:01am

Can you describe how it “bites you” differently since upgrading to 5.8?

Unfortunately, I can’t say from which exact lxd version this occurred, but that it appeared with 5.8:
e.g. a jitsi container did not start the following services via systemd, so the service was not available. Maybe it’s a side effekt from other issues not identified yet.
I have no problem changing the fstab in all containers, but now i’m a little worried about future updates about what may go wrong with the existing containers since they haven’t been initialized by */cloud images …

candlerb · March 8, 2023, 11:13am

I noticed the same issue since at least lxd 4.23. The fact that you notice a problem now, but didn’t notice it before, might even be due to an update within the container (e.g. newer version of systemd), rather than a change in lxd on the host.

I would not worry unduly about your existing containers. cloud-init is only run on the first container startup; after that it does nothing. If you fix the mis-provisioned /etc/fstab then you should be fine.

hi-ko · March 8, 2023, 12:26pm

thank you @candlerb for your assessment and feedback!