Wrong boot order for (maas generated) lxd config?

Env:
lxd latest/stable: 4.18 2021-09-06 (21468)
maas 3.0.0-10029-g.986ea3e45 (15003)

By way of maas 3.0 stable, I’m creating LXD virtual machines with multiple disks.
For reference:
maas vm-host compose 11 storage=‘10(default),11(default),12(default)’

It appears LXD is generating the incorrect scsi-ids/BootIndex for the disks defined by maas.

With a maas generated LXD VM config as:

#lxc config show --expanded smooth-marlin
architecture: x86_64
config:
limits.cpu: "1"
limits.memory: "2147483648"
limits.memory.hugepages: "false"
security.secureboot: "false"
volatile.eth0.host_name: tap353230a9
volatile.eth0.hwaddr: 00:16:3e:8d:18:0e
volatile.last_state.power: RUNNING
volatile.uuid: 8e894963-17ee-4506-9f48-b857215f9c86
volatile.vsock_id: "18"
devices:
disk1:
    path: ""
    pool: default
    source: maas-04f1b9a2-03ce-4961-a31d-7d441f9f08b6
    type: disk
disk2:
    path: ""
    pool: default
    source: maas-c708b294-41dd-4087-855c-2f9aa9d2bef4
    type: disk
eth0:
    boot.priority: "1"
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
root:
    boot.priority: "0"
    path: /
    pool: default
    size: "10000000000"
    type: disk
ephemeral: false
profiles: []
stateful: false
description: ""

In the resulting VM. It appears the ‘root’ device that maas defined is now sdc.

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0 10.3G  0 disk
├─sda1   8:1    0  512M  0 part /boot/efi
└─sda2   8:2    0  9.8G  0 part /
sdb      8:16   0 11.2G  0 disk
sdc      8:32   0  9.3G  0 disk

I narrowed this down to the bootIndex/scsi-id in the qemu.conf being incorrect:

# disk1 drive
[drive "lxd_disk1"]
file = "/var/snap/lxd/common/lxd/storage-pools/default/custom/default_maas-04f1b9a2-03ce-4961-a31d-7d441f9f08b6/root.img"
format = "raw"
if = "none"
cache = "none"
aio = "native"
discard = "on"
media = "disk"
file.locking = "off"
readonly = "off"

[device "dev-lxd_disk1"]
driver = "scsi-hd"
bus = "qemu_scsi.0"
channel = "0"
scsi-id = "1"
lun = "1"
drive = "lxd_disk1"
bootindex = "1"


# disk2 drive
[drive "lxd_disk2"]
file = "/var/snap/lxd/common/lxd/storage-pools/default/custom/default_maas-c708b294-41dd-4087-855c-2f9aa9d2bef4/root.img"
format = "raw"
if = "none"
cache = "none"
aio = "native"
discard = "on"
media = "disk"
file.locking = "off"
readonly = "off"

[device "dev-lxd_disk2"]
driver = "scsi-hd"
bus = "qemu_scsi.0"
channel = "0"
scsi-id = "2"
lun = "1"
drive = "lxd_disk2"
bootindex = "2"


# root drive
[drive "lxd_root"]
file = "/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/maas_smooth-marlin/root.img"
format = "raw"
if = "none"
cache = "none"
aio = "native"
discard = "on"
media = "disk"
file.locking = "off"
readonly = "off"

[device "dev-lxd_root"]
driver = "scsi-hd"
bus = "qemu_scsi.0"
channel = "0"
scsi-id = "3"
lun = "1"
drive = "lxd_root"
bootindex = "3"

This seems to be a bug, unless the LXD configuration isn’t valid?

The boot priority only affects well, the boot priority, it doesn’t affect the order in which we attach the device to the bus.

For bus ordering, LXD simply uses alphabetical ordering. As your root disk goes after disk-1 and disk-2 in alphabetical order, it’s correctly getting sdc.

In general, I’d very very strongly recommend against making any assumptions on the /dev/sdX name. The probing order isn’t guaranteed by Linux so even if the drives were in the expected order on the bus, a probing delay at boot can result in a different order.

Instead you really should use one of the stable /dev/disk entries which will work regardless of what name the kernel may give the device at boot time.

Hi @stgraber, really appreciate you getting back to me.

I think I’ve found the issue is the boot.priority, is set wrong for the root/boot drive by maas. LXD code https://github.com/lxc/lxd/blame/master/lxd/instance/drivers/driver_qemu.go#L2324
generates a qemu bootIndex (lowest first) based on an LXD, boot.priority (highest first). maas sets boot.priority: 0 for the root volume (lxd.py\pod\drivers\provisioningserver\src - maas - [no description]) which is “last”.

If I either drop the boot.priority or set it higher, I get the expected behaviour so I’ll see if I can get that changed).

FWIW the bus order appears to match the generated bootIndex: https://github.com/lxc/lxd/blob/master/lxd/instance/drivers/driver_qemu_templates.go#L450 which is good in my book as it’s easier to understand. To your note about not placing dependencies on the /dev/sdX name, I completely agree.

root:
    boot.priority: "0"
    path: /
    pool: default
    size: "10000000000"
    type: disk
1 Like

To keep everyone in the loop, I’ve logged the maas bug:

FYI @tomp there’s a note in there about the scsi id for the disk devices. I understand why you have it match the bootindex, but it is slightly strange when a nic has bootindex=0, then you don’t have any disks with scsi-id=0.

IMO it would be nice to decouple the two especially as a modification to the boot priority would currently renumber the devices, not that we should be relying on such a thing (i.e. don’t mount fs by /dev/sdX), but I don’t think an end user would have the expectation that the two are linked.

Also @tomp if you have the background on why boot.priority:0 was set on disk devices in maas (my guess is the NIC bootindex bug I’ve referenced), I’d appreciate your confirmation on the LP bug above.

Hi Sam,

Above you mentioned:

But then you say

IMO it would be nice to decouple the two especially as a modification to the boot priority would currently renumber the devices

Do you mean that we should decouple the NICs from influencing (causing a ‘hole’) in the SCSI-ID sequence? As opposed to decoupling the bootindex from the SCSI-ID?

@tomp sorry, yes I’m contradicting myself a little there!

Yes, decoupling the NICs so that we don’t have a hole in the SCSI-ID sequence would be ideal. This just appears to be an artifact of the LXD implementation. Of course the scsi-id could still be based on the bootindex, it could just be re-numbered so that the first disk’s id is always 0, regardless of how many other boot devices may be added before (e.g. iso, nics).

To speak to my first point, having the boot sequence boot scsi-ids, 0, 1, 2 etc. in sequence is an elegant property that makes the boot sequence easier to understand and I think is a reasonable limitation to have compared to raw qemu configuration, where you could have an out of order boot sequence.

1 Like

I created an issue for this:

1 Like