LXD issues on s390x

john-cabaj · December 1, 2022, 6:41pm

I’m having some issues with the workflow that I’m attempting with a LXD VM on s390x architecture. At a high level, I’m attempting to Livepatch a test kernel. This requires me to build a kernel on the LXD VM, install it on the LXD VM, then attempt to build a Livepatch with kpatch-build using a copy of the kernel source code. This boils down to running the following commands on the linked LXD VM instance export:

cd linux/
fakeroot debian/rules clean
fakeroot debian/rules binary-ibm-gt skipdbg=false

This generally yields some type of vsock error:

Error: write vsock vm(4294967295):1582350692->vm(6):8443: i/o timeout
Error: read vsock vm(4294967295):1582350684->vm(6):8443: connection reset by peer

Any help would be greatly appreciated!

tomp · December 1, 2022, 8:17pm

Thanks I’ll take a look.

tomp · December 2, 2022, 2:26pm

Looks like you created the export without syncing or shutting down the VM:

lxc start jammy-ibm-gt --console
...
Begin: Will now check root file system ... fsck from util-linux 2.37.2
[/usr/sbin/fsck.ext4 (1) -- /dev/sda1] fsck.ext4 -a -C0 /dev/sda1 
cloudimg-rootfs: Superblock needs_recovery flag is clear, but journal has data.
cloudimg-rootfs: Run journal anyway

cloudimg-rootfs: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)
fsck exited with status code 4
done.
Failure: File system check of the root filesystem failed
The root filesystem on /dev/sda1 requires a manual fsck

I’ll see if I can run an fsck on it

tomp · December 2, 2022, 2:37pm

Fixed it by reimporting into a dir pool using lxc import <file> -d mydirpool and then following:

parted /var/snap/lxd/common/lxd/storage-pools/dir/virtual-machines/jammy-ibm-gt/root.img
(parted) unit
Unit?  [B]? B                                                             
(parted) p
Model:  (file)
Disk /var/snap/lxd/common/lxd/storage-pools/dir/virtual-machines/jammy-ibm-gt/root.img: 50000003072B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start     End           Size          Type     File system  Flags
 1      1048576B  50000003071B  49998954496B  primary  ext4

losetup -f -o 1048576 /var/snap/lxd/common/lxd/storage-pools/dir/virtual-machines/jammy-ibm-gt/root.img

e2fsck /dev/loop4
e2fsck 1.45.5 (07-Jan-2020)
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes
cloudimg-rootfs: recovering journal

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (8778275, counted=8778227).
Fix<y>? yes
Free inodes count wrong (5716656, counted=5716640).
Fix<y>? yes

cloudimg-rootfs: ***** FILE SYSTEM WAS MODIFIED *****
cloudimg-rootfs: 251360/5968000 files (0.1% non-contiguous), 3428549/12206776 blocks

Am into the VM now

tomp · December 2, 2022, 3:24pm

Build still running, over an hour now.

john-cabaj · December 2, 2022, 3:29pm

Is your host s390x as well?

tomp · December 2, 2022, 3:30pm

Yes, otherwise I wouldn’t be able to start the VM (LXD only does virtualization not emulation).

I’m running on a Canonical mainframe.

I’m sshing to the server and then running lxc shell <instance> before running the commands you gave me.

tomp · December 2, 2022, 3:31pm

Can you get me access to your system please?

john-cabaj · December 2, 2022, 3:59pm

I’ll get you the link to the host machine through chat.

john-cabaj · December 2, 2022, 9:41pm

It looks like I was likely bumping up into the installed memory limit on the host machine with the LXD instance. I re-imported the LXD image on another machine with more memory, and things are working much better.

sdeziel · December 5, 2022, 5:03pm

One way to confirm if the problem was OOM killing the lxd-agent, would be to look at dmesg inside the VM for evidence of the kill.