One container with error

paulo_bruck · April 16, 2024, 11:56am

Hello everyone

Today I restart my server and one of my containers didn 't run
Should I run : incus admin recover ?

Server Ubuntu-22.04
uname -a : Linux pauloric 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

container: Ubuntu22-04
root@pauloric:/var/lib/incus/containers/squid# incus version
Client version: 6.0.0
Server version: 6.0.0

root@pauloric:/var/lib/incus/containers/squid# incus start squid
Error: Failed to run: /opt/incus/bin/incusd forkstart squid /var/lib/incus/containers /run/incus/squid/lxc.conf: exit status 1
Try incus info --show-log squid for more info
root@pauloric:/var/lib/incus/containers/squid# incus info --show-log squid
Name: squid
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/04/12 17:00 -03
Last Used: 2024/04/16 08:44 -03

Snapshots:
±------±---------------------±---------------------±---------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
±------±---------------------±---------------------±---------+
| snap0 | 2024/04/16 03:00 -03 | 2024/04/17 03:00 -03 | NO |
±------±---------------------±---------------------±---------+

Log:

lxc squid 20240416114448.489 WARN idmap_utils - …/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc squid 20240416114448.489 WARN idmap_utils - …/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc squid 20240416114448.489 WARN idmap_utils - …/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc squid 20240416114448.489 WARN idmap_utils - …/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc squid 20240416114448.643 ERROR start - …/src/lxc/start.c:start:2204 - No such file or directory - Failed to exec “/sbin/init”
lxc squid 20240416114448.643 ERROR sync - …/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 7)
lxc squid 20240416114448.646 WARN network - …/src/lxc/network.c:lxc_delete_network_priv:3671 - Failed to rename interface with index 0 from “eth0” to its initial name “veth496b4268”
lxc squid 20240416114448.646 ERROR lxccontainer - …/src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state “ABORTING” instead of “RUNNING”
lxc squid 20240416114448.646 ERROR start - …/src/lxc/start.c:__lxc_start:2114 - Failed to spawn container “squid”
lxc squid 20240416114448.646 WARN start - …/src/lxc/start.c:lxc_abort:1037 - No such process - Failed to send SIGKILL via pidfd 17 for process 33041
lxc 20240416114448.746 ERROR af_unix - …/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240416114448.746 ERROR commands - …/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command “get_init_pid”

candlerb · April 16, 2024, 12:23pm

That looks like your problem.

What kind of storage is this container on? (i.e. dir, zfs, btrfs etc)

What does ls -l /var/lib/incus/containers/squid/rootfs/sbin/init show? If nothing there, what about ls -l /var/lib/incus/containers/squid/rootfs/sbin ?

Is it a privileged or normal container?

(Aside: you can get rid of the warnings “new[ug]idmap binary is missing” by installing the uidmap package)

I don’t think incus admin recover is what you want: that’s for when an instance exists on storage but not in the incus SQL database, but clearly it does.

paulo_bruck · April 16, 2024, 12:33pm

Hi Brian

root@pauloric:/var/lib/incus/containers/squid# ls -l /var/lib/incus/containers/squid/rootfs/sbin/init
ls: cannot access ‘/var/lib/incus/containers/squid/rootfs/sbin/init’: No such file or directory
root@pauloric:/var/lib/incus/containers/squid# ls -l
total 0

Normal container

install uidmap… IMHO this should be part of incus package…( at list as Suggests as been part of debian package)

and thanks for the tip about incus admin recover.

candlerb · April 16, 2024, 12:43pm

I have no experience with lvmcluster storage. What’s the underlying block storage - some sort of SAN? (iSCSI, fibrechannel, nbd…) Presumably this also means you have multiple incus nodes which have been clustered?

In any case, lvmcluster is where I suggest you focus attention. Maybe one of the other nodes has locked the logical volume so it’s only available on that node - but that’s just a wild guess.

paulo_bruck · April 16, 2024, 1:11pm

Hi.

I think is not the case. I am using lvmcluster at lab to implement later.
There is only one machine and I still didn’t configure lvmcluster .
Strange that all other containers are working…
The only one that disappeared is this one.

candlerb · April 16, 2024, 1:16pm

FYI, testing with regular lvm (rather than lvmcluster), I find that the storage volume is not mounted when the container isn’t running:

# incus create images:ubuntu/22.04/cloud testlvm -s vg0
# ls /var/lib/incus/storage-pools/vg0/containers/testlvm
# incus start testlvm
# ls /var/lib/incus/storage-pools/vg0/containers/testlvm
backup.yaml  lost+found  metadata.yaml  rootfs  templates
# mount | grep testlvm
/dev/mapper/vg0-containers_testlvm on /var/lib/incus/storage-pools/vg0/containers/testlvm type ext4 (rw,relatime,discard,stripe=32)
# incus stop testlvm
# ls /var/lib/incus/storage-pools/vg0/containers/testlvm
# mount | grep testlvm
#

So, there must be a point at container start time when the volume is mounted. And if there were a problem when that happened, I would expect incus to fail at that point; that is, I don’t think it would have blindly gone on to try to run /sbin/init inside a non-existent mountpoint.

Therefore, I suspect your container volume is there, but you might have to mount it manually to inspect it and/or fix the problem with /sbin/init.

If there’s a way to get debugging info from incus about LVM and volume mounting, I can’t find it. I tried:

# incus start testlvm --verbose --debug
# incus info --show-log testlvm

but this just shows the container as ‘RUNNING’ and an empty log. There’s:

# ls -l /var/log/incus/testlvm/
total 24
-rw------- 1 root root 23801 Apr 16 13:08 console.log
-rw-r--r-- 1 root root     0 Apr 16 13:08 forkstart.log
-rw-r----- 1 root root     0 Apr 16 13:08 lxc.log
-rw-r----- 1 root root     0 Apr 16 13:05 lxc.log.old

But console.log only shows the systemd/cloud-init boot messages.

paulo_bruck · April 16, 2024, 2:02pm

Humm

I think problem could be generated by space empty…

root@pauloric:/home/pauloric/Documentos/Debian/zeus8/pool# vgdisplay incus
— Volume group —
VG Name incus
System ID
Format lvm2
Metadata Areas 3
Metadata Sequence No 306
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 30
Open LV 7
Max PV 0
Cur PV 3
Act PV 3
VG Size <299,99 GiB
PE Size 4,00 MiB
Total PE 76797
Alloc PE / Size 76797 / <299,99 GiB
Free PE / Size 0 / 0
VG UUID sUsedN-viAg-0mYS-TX67-SV6x-ugGb-MXsEFJ

— Logical volume —
LV Path /dev/incus/containers_squid
LV Name containers_squid
VG Name incus
LV UUID ez39CM-NsSF-tNKL-Yeb0-ajOL-A2GL-2dVMeB
LV Write Access read/write
LV Creation host, time pauloric, 2024-04-12 17:00:40 -0300
LV snapshot status source of
containers_squid-snap0 [INACTIVE]
LV Status NOT available
LV Size 10,00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto

— Logical volume —
LV Path /dev/incus/containers_squid-snap0
LV Name containers_squid-snap0
VG Name incus
LV UUID 0payyA-tFrC-cwJS-CjUM-0Qjc-643k-4TCauX
LV Write Access read only
LV Creation host, time pauloric, 2024-04-16 03:00:16 -0300
LV snapshot status INACTIVE destination for containers_squid
LV Status NOT available
LV Size 10,00 GiB
Current LE 2560
COW-table size 10,04 GiB
COW-table LE 2571
Snapshot chunk size 4,00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto

I remove some snaps…

root@pauloric:/home/pauloric/Documentos/Debian/zeus8/pool# vgdisplay incus
— Volume group —
VG Name incus
System ID
Format lvm2
Metadata Areas 3
Metadata Sequence No 316
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 25
Open LV 7
Max PV 0
Cur PV 3
Act PV 3
VG Size <299,99 GiB
PE Size 4,00 MiB
Total PE 76797
Alloc PE / Size 63942 / 249,77 GiB
Free PE / Size 12855 / 50,21 GiB
VG UUID sUsedN-viAg-0mYS-TX67-SV6x-ugGb-MXsEFJ

stiill the same problem…

root@pauloric:~# incus snapshot list squid
±------±---------------------±---------------------±---------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
±------±---------------------±---------------------±---------+
| snap0 | 2024/04/16 03:00 -03 | 2024/04/17 03:00 -03 | NO |
±------±---------------------±---------------------±---------+

root@pauloric:~# incus snapshot restore squid snap0
Error: Error restoring LVM logical volume snapshot: Failed to run: rsync -a -HA --sparse --devices --delete --checksum --numeric-ids --xattrs --filter=-x security.selinux -q /var/lib/incus/storage-pools/default/containers-snapshots/squid/snap0/ /var/lib/incus/storage-pools/default/containers/squid: exit status 11 (rsync: [receiver] write failed on “/var/lib/incus/storage-pools/default/containers/squid/rootfs/var/log/squid/cache.log”: No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3.2.7]
rsync: [sender] write error: Broken pipe (32))

Question. How can I mount it to clear container logs?

stgraber · April 16, 2024, 2:55pm

Ah yeah, that could well be a problem, LVM also REALLY doesn’t like running out of space…

So you seem to have two problems:

Your VG is out of space. LVM will be very unhappy about that, your best bet to fix this one is by deleting some unused/unimportant snapshots, as clustered LVM is thick provisioned, snapshots can take a good chunk of space so that’s your easiest way to clear some space.
Your container itself appears to be full and so is having some trouble, for that I’d recommend trying incus file delete or incus file mount to access the container’s filesystem without having to start it. If that doesn’t work, then you’ll need to manually bring up the LV with lvchange and mount it outside of Incus to fix it.

Worth noting that the first point may also be affecting the second one as without any space on the VG, the snapshot restore may be hitting some issues there. I’d strongly recommend making some space on the VG first.

paulo_bruck · April 16, 2024, 3:48pm

Thanks Graber

Using incus file mount squid/ /media I could delete some logs… and gain some space inside container.
vgdisplay shows me that I have plenty of space before removing snaps…
But this particular conteiner refuses to start… not a great problem cause I am playing at lab, but this concern me if this occurs at production…

Any other ideia ?

and Yes the problem occurs because there was no space available at VG… but the interest part is that I have other containet that is using 100% os space and its is running…

stgraber · April 16, 2024, 4:29pm

The error you showed indicates that /sbin/init in the container cannot be executed, so this suggests the container either got badly corrupted due to LVM being out of space or it ran out of space partway through a package update or something.

/sbin/init is most commonly a symlink to /lib/systemd/systemd, so you’d want to make sure that the symlink itself exists and so does its target. That said Linux is a bit funny and will say “No such file or directory” also if a library used by the binary cannot be found which in the case of systemd can be a reasonably long list as seen here:

stgraber@dakara:~$ ldd /lib/systemd/systemd
	linux-vdso.so.1 (0x00007ffcbdfe7000)
	libsystemd-core-252.so => /usr/lib/x86_64-linux-gnu/systemd/libsystemd-core-252.so (0x00007f0583326000)
	libsystemd-shared-252.so => /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-252.so (0x00007f0582e00000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0583131000)
	libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f0582de0000)
	libpam.so.0 => /lib/x86_64-linux-gnu/libpam.so.0 (0x00007f0582dce000)
	libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f0582d9b000)
	libkmod.so.2 => /lib/x86_64-linux-gnu/libkmod.so.2 (0x00007f0582d7e000)
	libapparmor.so.1 => /lib/x86_64-linux-gnu/libapparmor.so.1 (0x00007f0582d69000)
	libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f0582d3b000)
	libmount.so.1 => /lib/x86_64-linux-gnu/libmount.so.1 (0x00007f0582cd8000)
	libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007f0582ccd000)
	libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007f0582c74000)
	libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007f0582c68000)
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0582c2c000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f0582ae5000)
	libip4tc.so.2 => /lib/x86_64-linux-gnu/libip4tc.so.2 (0x00007f0582adb000)
	liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f0582ab5000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f0582600000)
	libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f0582544000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f0582a84000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0582465000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0583538000)
	libcap-ng.so.0 => /lib/x86_64-linux-gnu/libcap-ng.so.0 (0x00007f058245d000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f05823c3000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f058239b000)

Any of those being missing or corrupted would prevent Incus from executing /sbin/init, giving you the error you’ve been running into.

paulo_bruck · April 16, 2024, 5:05pm

Solved

#) incus file mount squid/ /media

#) cd /media

Comparing with a conteiner thats is running I discover that this specif container looses all links…

#) ls
drwxr-xr-x 2 root root 4096 Feb 26 09:58 bin.usr-is-merged
drwxr-xr-x 2 root root 4096 Mar 30 21:04 boot
drwxr-xr-x 8 root root 500 Apr 16 13:56 dev
drwxr-x–x 115 root root 12288 Apr 14 06:02 etc
drwxr-xr-x 5 root root 4096 Apr 12 10:55 home
drwxr-xr-x 2 root root 4096 Mar 31 04:30 lib.usr-is-merged
drwxr-xr-x 2 root root 4096 Apr 12 04:43 media
drwxr-xr-x 2 root root 4096 Apr 12 04:43 mnt
drwxr-xr-x 2 root root 4096 Apr 12 04:43 opt
dr-xr-xr-x 936 nobody nogroup 0 Apr 16 13:56 proc
drwx------ 5 root root 4096 Apr 12 17:46 root
drwxr-xr-x 23 root root 820 Apr 16 13:56 run
drwxr-xr-x 2 root root 4096 Mar 31 06:00 sbin.usr-is-merged
drwxr-xr-x 4 root root 4096 Apr 12 10:56 srv
dr-xr-xr-x 13 nobody nogroup 0 Apr 16 13:53 sys
drwxrwxrwt 10 root root 4096 Apr 16 13:56 tmp
drwxr-xr-x 12 root root 4096 Apr 12 04:43 usr
drwxr-xr-x 12 root root 4096 Apr 12 10:52 var

#) ln -s usr/bin bin
#) ln -s usr/sbin sbin
#) ln -s usr/lib lib
#) ln -s usr/lib64 lib64

After this, I could start squid conteiner again.

Thanks for all support. Now when I face a problem like this I know how to deal with.
One important comment that I try to turn snap image to conteiner, but , I think, with lack of VG space, this particular snap didn’t work…