Nobody nogroup after cancelled publish

Simon_Wibberley · May 10, 2022, 6:29pm

I cancelled an lxc publish as the partition it was using was about to run out of space, then after starting the container again all uid have been lost with nobody nogroup and services do no start:

[root@shl1 lxd]# lxc exec omeka-s bash
bash: /root/.bashrc: Permission denied

any clues on how to fix uid mappings?

Thanks in advance

For some reason:
/var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/default/containers/NAME/rootfs/
is empty.

config:
  image.architecture: amd64
  image.description: Debian stretch amd64 (20190228_05:24)
  image.os: Debian
  image.release: stretch
  image.serial: "20190228_05:24"
  security.idmap.isolated: "false"
  volatile.base_image: f8a458c2505a67f91437b4c994d853e07a33c1bfa30296e418d47577619fe819
  volatile.eth0.hwaddr: 00:16:3e:e9:83:35
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING


 * Distribution: centos
 * Distribution version: 7.9
 * The output of "lxc info" or if that fails:
   * Kernel version: 3.10.0-1160.59.1.el7.x86_64
   * LXC version:3.0.2
   * LXD version:3.6
   * Storage backend in use:lvm

stgraber · May 10, 2022, 10:59pm

It’s pretty odd that the rootfs looks empty with a running instance.
Can you try restarting the instance see if it shows up then?

This is a very old LXD version which is very much unsupported so I have no idea if it’s something that’s since been fixed or has otherwise evolved. LXD 3.6 was a monthly feature release from October 2018 and went end of support at the end of November 2018.

Your best bet to fix this is to get access to the rootfs from the host, then use fuidshift to manually re-shift everything to the correct uid/gid ranges.

Once you get things behaving again, please update to a supported LXD release so we can help you more in the future.

Simon_Wibberley · May 11, 2022, 7:57am

Thanks for this Stéphane, it looks like none of the instances on this host have rootfs linked… Also all other container have the same “Hostid”:1000000 in idmap config.

Also, all of the lvm commands report nothing which is quite concerning.

The full story is:

an attempt was made to expand the storage of the container with lxc config device set omeka-s root size 20GB
the container failed to restart (it never shut down fully, but services stopped)
the container was force stoped
on restart the resize failed
lxc config device unset omeka-s root size was run in an attempt to cancel the resize which also failed
the volume for the container disappear from lvm commands
ln -s ../dm-14 containers_omeka--s was run to re-establish the link to the lvm device
the container booted normally
attempt was made to publish the container to create a backup
root disk space became low so publish was cancelled
container list uif/gui shift info
current state

How do I access the rootfs from the host? They don’t appear to be mounted anywhere.

Is this fuidshift? GitHub - Mic92/fuidshift: Move Filesystem ownership into other subordinated uid ranges

Output from lvmdump appears to have the relevant entries. Should I just restart the host to recover state?

I will certainly update lxd next opportunity, this is somewhat an inherited situation. edit : it appears to have been set up with guidance from this : LXD on Centos 7 is LXD now stable on centos 7?

Many thanks,

Simon

Simon_Wibberley · May 12, 2022, 10:38am

It appears the id offset got set to 0 on the container.

The id remapping routine failed on restart due to log/journal having ACL set, so I had to copy the container to a dir storage pool since rootfs’s aren’t appearing to delete journal. I could then set the Hostid to 0 start the container with a remap, and it’s running again.