Cannot do-release-upgrade of container : permissions denied on /var/log/dist-upgrade

Lox · September 30, 2024, 2:55am

Hello,

I am trying to upgrade a container on a host running Incus 6.5 on ZFS (Ubuntu 20.04.6 LTS). It is an ancient LXD install that was migrated a while back.

The do-release-upgrade script exits with this error :

PermissionError: [Errno 13] Permission denied: '/var/log/dist-upgrade/main.log'

Here some debug information :

root@hass:~# ls -lah /var/log | grep dist
drwxr-xr-x   2 nobody nogroup    2 juil.  7  2020 dist-upgrade
root@hass:~# touch /var/log/dist-upgrade/test
touch: cannot touch '/var/log/dist-upgrade/test': Permission denied

stgraber · October 2, 2024, 12:22am

Do you have a lot of files showing that kind of broken ownership?

Lox · October 2, 2024, 2:55am

I have the whole /var and /usr in that case. It seems like it is the only container with such an issue. /tmp has same ownership but is writable :

drwxrwxrwt  11 nobody nogroup  23 oct.   2 13:40 tmp
drwxr-xr-x  14 nobody nogroup  14 févr.  3  2022 usr
drwxr-xr-x  13 nobody nogroup  15 juil. 21  2020 var

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20200720)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20200720"
  image.version: "20.04"
  security.nesting: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"
  user.network-config: "  \n  version: 2\n  ethernets:\n    eth0:\n      dhcp6: no\n
    \     dhcp4: yes"
  user.user-data: |-
    #cloud-config
    runcmd:
      - hostnamectl set-hostname hass.[redacted domain]
      - echo "postfix postfix/mailname string hass.[redacted domain]" | debconf-set-selections
      - echo "postfix postfix/main_mailer_type string 'Internet Site'" | debconf-set-selections
      - apt-get install --assume-yes mailutils
      - postconf -e "inet_interfaces = loopback-only"
      - echo root:maintenance@[redacted domain] >> /etc/aliases
      - newaliases
      - systemctl reload postfix
      - [sh, -c, "cat >> /root/.bashrc <<EOF\nif [ -f /etc/bash_completion ] && ! shopt -oq posix;\nthen\n        . /etc/bash_completion\nfi\n\nEOF" ]
  user.vendor-data: |
    #cloud-config
    locale: fr_FR.UTF-8
    timezone: Pacific/Noumea
    ## doing only update until package cloud-init is updated
    ## see: https://github.com/canonical/cloud-init/issues/5143
    package_update: true
    # package_upgrade: true
    ntp:
      enabled: true
      ntp_client: systemd-timesyncd
      servers:
        - 0.oceania.pool.ntp.org
        - 1.oceania.pool.ntp.org
        - 2.oceania.pool.ntp.org
        - 3.oceania.pool.ntp.org
        - pool.ntp.org
        - ntp.ubuntu.com
    apt:
      primary:
        - arches: [default]
          uri: http://nc.archive.ubuntu.com/ubuntu
      conf: | # APT config
        Unattended-Upgrade::Allowed-Origins {
          "${distro_id}:${distro_codename}";
          "${distro_id}:${distro_codename}-security";
          // Extended Security Maintenance; doesn't necessarily exist for
          // every release and this system may not have it installed, but if
          // available, the policy for updates is such that unattended-upgrades
          // should also install from here by default.
          "${distro_id}ESMApps:${distro_codename}-apps-security";
          "${distro_id}ESM:${distro_codename}-infra-security";
          "${distro_id}:${distro_codename}-updates";
          "${distro_id}:${distro_codename}-proposed";
          "${distro_id}:${distro_codename}-backports";
        };
        APT::Periodic::Update-Package-Lists "1";
        APT::Periodic::Download-Upgradeable-Packages "1";
        APT::Periodic::AutocleanInterval "7";
        APT::Periodic::Unattended-Upgrade "1";
    runcmd:
      - [sh, -c, "cat >> /root/.bashrc <<EOF\nif [ -f /etc/bash_completion ] && ! shopt -oq posix;\nthen\n        . /etc/bash_completion\nfi\n\nEOF" ]
  volatile.base_image: 0a4f3d88ed1c0e0d34c0f1e9be71b5dd73dc3de81a1e139b0ecd4e0faa958a30
  volatile.cloud-init.instance-id: 24b87d5a-d348-4a6d-8e19-8c6a1ad3ac3c
  volatile.eth0.host_name: veth7a4cf518
  volatile.eth0.hwaddr: 00:16:3e:21:4f:11
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 4b899469-e03b-4370-890e-a46f5c8147ef
  volatile.uuid.generation: 4b899469-e03b-4370-890e-a46f5c8147ef
devices:
  eth0:
    ipv4.address: 10.39.199.26
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
- ubuntu
stateful: false
description: ""

stgraber · October 2, 2024, 4:14am

That looks like a partially shifted container… that may be a bit painful to fix up.
If easily doable, I’d recommend rebuild that container.

If not, I’d need the output of ls -lh /var/lib/incus/containers/hass/rootfs/ to try to figure out what shifting is needed, but it being only partially wrong may make a fix a bit trickier.

Lox · October 2, 2024, 10:16pm

What do you mean by rebuild ? Start from a brand new container and restore everything ?

root@h2:~# ls -lh /var/lib/incus/containers/hass/rootfs/
total 198K
lrwxrwxrwx   1 1000000 1000000   7 juil. 21  2020 bin -> usr/bin
drwxr-xr-x   2 1000000 1000000   2 juil. 21  2020 boot
drwxr-xr-x   5 1000000 1000000  17 juil. 21  2020 dev
drwxr-xr-x 107 1000000 1000000 203 août  20 18:42 etc
drwxr-xr-x   4 1000000 1000000   4 juil. 30  2020 home
lrwxrwxrwx   1 1000000 1000000   7 juil. 21  2020 lib -> usr/lib
lrwxrwxrwx   1 1000000 1000000   9 juil. 21  2020 lib32 -> usr/lib32
lrwxrwxrwx   1 1000000 1000000   9 juil. 21  2020 lib64 -> usr/lib64
lrwxrwxrwx   1 1000000 1000000  10 juil. 21  2020 libx32 -> usr/libx32
drwxr-xr-x   3 1000000 1000000   3 mars  22  2021 media
drwxr-xr-x   2 1000000 1000000   2 juil. 21  2020 mnt
drwxr-xr-x   2 1000000 1000000   2 juil. 21  2020 opt
drwxr-xr-x   2 1000000 1000000   2 avril 15  2020 proc
drwx------  12 1000000 1000000  16 sept. 17  2023 root
drwxr-xr-x   2 1000000 1000000   2 juil. 21  2020 run
lrwxrwxrwx   1 1000000 1000000   8 juil. 21  2020 sbin -> usr/sbin
drwxr-xr-x   8 1000000 1000000   9 janv. 28  2024 snap
drwxr-xr-x   6 1000000 1000000   6 juin  14  2023 srv
drwxr-xr-x   2 root    root      2 avril 15  2020 sys
drwxrwxrwt  11 root    root     24 oct.   3 09:13 tmp
drwxr-xr-x  14 root    root     14 févr.  3  2022 usr
drwxr-xr-x  13 root    root     15 juil. 21  2020 var

stgraber · October 2, 2024, 10:35pm

Okay, given the above, one thing you can try, but that may very well fail is:

incus config set hass security.privileged=true
incus start hass
incus stop hass
incus config unset hass security.privileged
incus start hass

That should cause all the permissions that are shifted (1000000+) to be brought back to unshifted, then everything would be shifted back up to 1000000+ with that last start.

Lox · October 2, 2024, 11:13pm

root@h2:~# incus stop hass
root@h2:~# incus config set hass security.privileged=true
root@h2:~# incus start hass
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/lib/incus/storage-pools/default/containers/hass/rootfs/var/log/journal
Try `incus info --show-log hass` for more info
root@h2:~# incus info --show-log hass
Nom : hass
État : STOPPED
Type: container
Architecture : x86_64
Créé : 2020/07/30 00:02 +11
Last Used: 2024/09/19 10:31 +11

Journal :


root@h2:~# ls -lh /var/lib/incus/containers/hass/rootfs/
ls: cannot access '/var/lib/incus/containers/hass/rootfs/': No such file or directory
root@h2:~#

Lox · October 2, 2024, 11:53pm

This seems to be linked to a problem that suddenly appeared a while ago. I thought the problem had been completely solved, until I tried to upgrade the container : All files in container have the 1000000 uid

Is /var/lib/incus/storage-pools/default/containers/hass/rootfs/var/log/journal a special directory ?

# ls -lah /var/lib/incus/storage-pools/default/containers/hass/rootfs/var/log/journal
total 83K
drwxr-sr-x+ 3 root systemd-journal   3 juil. 30  2020 .
drwxrwxr-x  8 root uuidd            78 août  23 00:00 ..
drwxr-sr-x+ 2 root systemd-journal 102 août  23 12:31 e0fc56ca5143463c99a1061654b1a7ae

# ls -lah /var/lib/incus/storage-pools/default/containers/hass/rootfs/var/log/ | grep journal
drwxr-sr-x+  3 root      systemd-journal    3 juil. 30  2020 journal

candlerb · October 3, 2024, 8:55am

In the past I used python scripts to fix up partially-shifted containers (below). This mostly worked, but I found the permissions for systemd journal files still needed fixing up manually, as they use ACLs. I think this could be the same problem you’re seeing.

Check using:

getfacl -Rsp /mnt/var/log/journal

(replace /mnt as appropriate)

For me, the issue was around group ‘adm’ which needed changing from 1000004 to 4. I used a hairy script (don’t copy this blindly, use at your own risk!!)

getfacl -Rsp /mnt/var/log/journal | grep '^# file:' |
while read a b f; do getfacl "$f" | sed 's/:1000004:/:4:/g' | setfacl --set-file=- "$f"; done

HTH,

Brian.

Here are the scripts I recorded in my notes. I used one for shifting upwards:

#!/usr/bin/python3
import os

for root, dirnames, filenames in os.walk('/var/lib/incus/storage-pools/default/containers/nfsen/rootfs'):
    for name in dirnames + filenames:
        fullpath = os.path.join(root, name)
        st = os.lstat(fullpath)
        uid = st.st_uid
        uid = (1000000 + uid) if (uid >= 0 and uid <= 65535) else -1
        gid = st.st_gid
        gid = (1000000 + gid) if (gid >= 0 and gid <= 65535) else -1
        if uid != -1 or gid != -1:
            os.chown(fullpath, uid, gid, follow_symlinks=False)

And one for shifting downward, although in this one I can’t remember how or why I mounted the container filesystem onto /mnt, or why I had to skip sys/proc/dev.

#!/usr/bin/python3
import os

for root, dirnames, filenames in os.walk('/mnt'):
    if root[0:9] == '/mnt/sys/':
        continue
    if root[0:10] == '/mnt/proc/':
        continue
    if root in ['/mnt/dev', '/mnt/sys', '/mnt/proc']:
        continue
    for name in dirnames + filenames:
        fullpath = os.path.join(root, name)
        st = os.lstat(fullpath)
        uid = st.st_uid
        uid = (uid - 1000000) if (uid >= 1000000 and uid <= 1065535) else -1
        gid = st.st_gid
        gid = (gid - 1000000) if (gid >= 1000000 and gid <= 1065535) else -1
        if uid != -1 or gid != -1:
            os.chown(fullpath, uid, gid, follow_symlinks=False)

More detail here: Incus Container Wont stop or allow itself to be deleted - #10 by candlerb