Couldn't boot up Rocky linux 9 VM

sihara · May 16, 2023, 3:40am

I’m deploying rocky linux9 VM on LXD, but VM couldn’t boot up.

# lxc init images:rockylinux/9 rl91 --vm
# lxc start rl91

somehow vm never booted up. if I had a look at console

# lxc console rl91

[    3.518122] ata2: SATA link down (SStatus 0 SControl 300)
[    3.523275] ata4: SATA link down (SStatus 0 SControl 300)
[    3.527172] ata3: SATA link down (SStatus 0 SControl 300)
[    3.532115] ata5: SATA link down (SStatus 0 SControl 300)
[**    ] A start job is running for /dev/loop26p2 (2min 43s / no limit)

even more than 10mins wait, it was still same.

lxd-5.13 installed from snap.
# snap list | grep lxd
lxd                        5.13-8e2d7eb      24846  latest/stable    canonical**  -
# lxd --version 
5.13

Any ideas why stuck boot process? other VM images, (rockylinux8, ubuntu, etc. are all fine)

sihara · May 16, 2023, 5:58am

root device is now /dev/loopXX instead of /dev/sda?
rockylinux9
[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-5.14.0-162.23.1.el9_1.x86_64 root=/dev/loop26p2 ro console=tty1 console=ttyS0

rokeylinux8
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-425.19.2.el8_7.x86_64 root=/dev/sda2 ro console=tty1 console=ttyS0

roekylinux8 works fine and it booted up properly, but rockylinux9 is not.

sihara · May 18, 2023, 1:04am

Any ideas? 9-Stream, roekylinux8 and almalinux9’s VM all images are same situation.
They are tying to boot up with on loopback device (instead of /dev/sdX) somehow…, and it all failed to boot up.

disputin · May 19, 2023, 6:50pm

This may be related. I’ve struggled over the last week to get VMs working on a new install of Rocky Linux/8 and snap lxd. I could create the containers but they would never start.

I had set up this same machine in late March using Centos/7 as the physical OS, and didn’t have any issues creating VMs.

I finally install
5.11/stable: 5.11-ad0b61e 2023-02-23 (24483) 149MB -

and the VM’s started right up. There appears to be a qemu bug in the latest version (or the 5.12 release) of lxd that doesn’t allow VMs to start on Red Hat.

In my troubleshooting I install ubuntu/22.04 as the physical OS and snap/lxd (current version) on the same hardware and VMs started just fine.

tomp · May 22, 2023, 10:36am

Hey @monstermunchkin I’ve confirmed this is and issue for me too.
And its occurring on the Jammy version of QEMU too (6.2.0) so it doesn’t appear to be QEMU related.

Any ideas?

sihara · May 29, 2023, 1:03am

After more investigations, I think an difference of workable and non-workable VM image (mostly RHEL9 related image) is root device configuration.
As far as I see, a loop back device is configured as root device in non-workable VM images (e.g. Rockylinux9, Alamalinux9, centos9-stream)
However, workable VM image’s root disk configuration is /dev/sdX, not /dev/loopX below.

Rockylinux8 VM

EFI stub: UEFI Secure Boot is enabled.
[    0.000000] Linux version 4.18.0-425.19.2.el8_7.x86_64 (mockbuild@dal1-prod-builder001.bld.equ.rockylinux.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)) #1 SMP Tue Apr 4 22:38:11 UTC 2023
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-425.19.2.el8_7.x86_64 root=/dev/sda2 ro console=tty1 console=ttyS0

Rockylinux9 VM

EFI stub: UEFI Secure Boot is enabled.
[    0.000000] Linux version 5.14.0-284.11.1.el9_2.x86_64 (mockbuild@x64-builder01.almalinux.org) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023
[    0.000000] The list of certified hardware and cloud instances for Red Hat Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-5.14.0-284.11.1.el9_2.x86_64 root=/dev/loop30p2 ro console=tty1 console=ttyS0

Wraith · May 31, 2023, 7:01pm

I think something in ditrobuilder is causing problems as it looks like the issue is in the image, in centos 9-steam the loader entries has the following :

title CentOS Stream (5.14.0-319.el9.x86_64) 9
version 5.14.0-319.el9.x86_64
linux /boot/vmlinuz-5.14.0-319.el9.x86_64
initrd /boot/initramfs-5.14.0-319.el9.x86_64.img
options root=/dev/loop30p2 ro   console=tty1 console=ttyS0
grub_users $grub_users
grub_arg --unrestricted
grub_class centos

which is /boot/loader/entries/f73d769963d041ddbee2b12f3bd11d9c-5.14.0-319.el9.x86_64.conf
editing that the loader file int the rootfs.img and setting the root option to root=/dev/sda2 ro and reimporting the image works fine.

monstermunchkin · June 2, 2023, 9:08am

tomp · June 2, 2023, 9:08am

What was the fix in the end?

monstermunchkin · June 2, 2023, 9:18am

Setting root=/dev/sda2 ro instead of $kernelopts in the options setting in the files in /boot/loader/entries/.

sihara · June 2, 2023, 9:25am

Thanks @monstermunchkin I wonder if same fix is needed not only for rockylinux9 but also other rhel9 base distro? e.g. 9-Stream and almalinux9

monstermunchkin · June 2, 2023, 9:29am

I will check that, and fix those if needed.

Freedom_Fury · June 3, 2023, 9:49pm

I see the same issue with any rhel9 clones, like Alma 9 and Oracle. They all hung on boot.

tomp · June 4, 2023, 6:40am

Alma and centos were also fixed. @monstermunchkin does oracle need a fix too?

sihara · June 5, 2023, 9:08pm

Thanks @monstermunchkin and @tomp
I’ve confirmed the latest rockylinux9 image(at least as far as I checked) fixed and VM was loaded properly.

Tom_Jabber · June 9, 2023, 9:31am

Hey guys,
I cannot stop/delete a vm in this precise hung state (stuck in “A start job is running for /dev/loop30p2”).
How should I proceed ?
Then I understand it was a matter of image build ?
This hung state vm was made a while ago (~ 2/3 weeks → Created: 2023/05/19 15:18 UTC).
I remember I’ve faced this on several vm trials, but I thought it was an issue with zfs.
If I was to do it again today it wouldn’t happen anymore ?

monstermunchkin · June 12, 2023, 7:44am

No, we don’t publish Oracle Linux VMs, only containers.

Tom_Jabber · June 13, 2023, 1:29pm

@tomp Hi,
Sorry to insist, that’s just to be sure you did actually read my question about the hung machine.
Any idea how I should delete this vm the clean way ?

tomp · June 13, 2023, 1:38pm

You can do lxc delete <instance> --force.

As for the cause of the hang, that is more in @monstermunchkin area.

monstermunchkin · June 13, 2023, 1:46pm

@Tom_Jabber what image are you using? The hung state should be fixed.