KVM internal error when running Ubuntu 21.04 as lxd VM guest

candlerb · July 15, 2021, 11:27am

This is on an Ubuntu 18.04 system with lxd 4.16 from snap, running the HWE kernel 5.4.0-77-generic #86~18.04.1-Ubuntu. CPU is Intel(R) Core™ i7-6770HQ CPU @ 2.60GHz, and I’m running directly on the metal (i.e. no nested virtualization)

Problem:

# This works
lxc launch ubuntu:20.04 --vm focal-vm

# This fails
lxc launch ubuntu:21.04 --vm hirstute

After downloading the 21.04 image, lxc list shows state as “RUNNING” for a few seconds, then it changes to “ERROR”.

$ lxc info hirstute --show-log
Name: hirstute
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/07/15 11:00 UTC
Status: Error
Type: virtual-machine
Profiles: default
Pid: 32131
Resources:
  Processes: 0

Log:

warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied
KVM internal error. Suberror: 1
emulation failure
RAX=000000003ffb6400 RBX=000000003ff99290 RCX=000000003e90f398 RDX=00000000000013bd
RSI=000000002b86dad7 RDI=000000003df8d1c4 RBP=8000000000000001 RSP=000000003ff99178
R8 =0000000000000028 R9 =000000003f3f8567 R10=000000003f9ef000 R11=0000000000000000
R12=0000000000000000 R13=000000003e90f398 R14=000000002b86b01c R15=000000003ff99280
RIP=00000000000b0000 RFL=00210206 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000003f9de000 00000047
IDT=     000000003f3d2018 00000fff
CR0=80010033 CR2=0000000000000000 CR3=000000003fc01000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d00
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Nothing interesting in main lxd logs:

t=2021-07-15T11:38:05+0100 lvl=info msg="Downloading image" alias=21.04 fingerprint=ca7e30da2ac018cde9d47ceef9432b3c6a43f65f919b27bb78bc72c5affbe757 operation=5010ab60-d12b-4306-8e20-abdd3fc7278b se
rver=https://cloud-images.ubuntu.com/releases trigger=/1.0/operations/5010ab60-d12b-4306-8e20-abdd3fc7278b
t=2021-07-15T11:38:24+0100 lvl=info msg="Image downloaded" alias=21.04 fingerprint=ca7e30da2ac018cde9d47ceef9432b3c6a43f65f919b27bb78bc72c5affbe757 operation=5010ab60-d12b-4306-8e20-abdd3fc7278b ser
ver=https://cloud-images.ubuntu.com/releases trigger=/1.0/operations/5010ab60-d12b-4306-8e20-abdd3fc7278b
t=2021-07-15T11:38:24+0100 lvl=info msg="Creating instance" ephemeral=false instance=hirstute instanceType=virtual-machine project=default
t=2021-07-15T11:38:24+0100 lvl=info msg="Created instance" ephemeral=false instance=hirstute instanceType=virtual-machine project=default
t=2021-07-15T11:42:27+0100 lvl=info msg="Deleting instance" created=2021-07-15T11:38:24+0100 ephemeral=false instance=hirstute instanceType=virtual-machine project=default used=2021-07-15T11:39:08+0100
t=2021-07-15T11:42:27+0100 lvl=info msg="Deleted instance" created=2021-07-15T11:38:24+0100 ephemeral=false instance=hirstute instanceType=virtual-machine project=default used=2021-07-15T11:39:08+0100

Any suggestions? I’m pretty sure my lxd config is OK given that it runs a 20.04 virtual machine just fine. And the host system is fully up-to-date with its packages.

candlerb · July 15, 2021, 11:38am

This is weird. If I do:

lxc stop --force hirstute
lxc start --console hirstute

then it comes up OK. And again:

lxc stop hirstute
lxc start hirstute

Again it comes up OK. So it’s something to do with first boot. Starting a fresh VM, this time with console, the problem reoccurs:

$ lxc launch ubuntu:21.04 --vm --console hirstute2
Creating hirstute2
Retrieving image: Unpack: 100% (3.63GB/s)
Starting hirstute2
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
error: can't find command `hwmatch'.
GRUB_FORCE_PARTUUID set, attempting initrdless boot.
EFI stub: UEFI Secure Boot is enabled.

And that’s as far as it gets before going into ERROR mode again, with lxc info hirstute2 --show-log showing what appears to be the same error as before.

Comparing the ps command lines for the running VM and the errored VM, they are identical:

/snap/lxd/20987/bin/qemu-system-x86_64 -S -name hirstute -uuid 21b3c738-8915-444f-a327-1f5f20e4ac1b -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/hirstute/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/hirstute/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/hirstute/qemu.pid -D /var/snap/lxd/common/lxd/logs/hirstute/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
/snap/lxd/20987/bin/qemu-system-x86_64 -S -name hirstute2 -uuid dfafe5b0-cc7b-4ce9-aca1-f59365d75b82 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/hirstute2/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/hirstute2/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/hirstute2/qemu.pid -D /var/snap/lxd/common/lxd/logs/hirstute2/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

stgraber · July 15, 2021, 1:56pm

Maybe the same as Bug #1935880 “lxc c2-m2 focal VM causes KVM internal error durin...” : Bugs : linux package : Ubuntu ?

candlerb · July 15, 2021, 3:27pm

Yes it could be, I can reproduce that way too:

$ lxc launch ubuntu:20.04 dannf-test2 -t c2-m2 --vm
Creating dannf-test2
Starting dannf-test2
$ lxc list dannf-test2
+-------------+-------+------+------+-----------------+-----------+
|    NAME     | STATE | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+-------------+-------+------+------+-----------------+-----------+
| dannf-test2 | ERROR |      |      | VIRTUAL-MACHINE | 0         |
+-------------+-------+------+------+-----------------+-----------+
$ lxc info dannf-test2 --show-log
Name: dannf-test2
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/07/15 15:09 UTC
Status: Error
Type: virtual-machine
Profiles: default
Pid: 3827
Resources:
  Processes: 0

Log:

warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied
KVM internal error. Suberror: 3
extra data[0]: 800000ec
extra data[1]: 31
extra data[2]: 83
extra data[3]: 30d10
RAX=0000000000000000 RBX=0000000000000001 RCX=0000000000000001 RDX=00000000000000f2
RSI=ffff9af6b851cba8 RDI=0000000000000001 RBP=ffffae6880077e90 RSP=ffffae6880077e78
R8 =0000000006de3213 R9 =0000000000000000 R10=0000000000001c00 R11=0000000000001c00
R12=0000000000000001 R13=ffff9af6401c8000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffffb4956e14 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 ffff9af6b8500000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 fffffe0000036000 0000206f 00008b00 DPL=0 TSS64-busy
GDT=     fffffe0000034000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=00000000ffffffff CR3=000000003b80a001 CR4=001606a0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=00 85 c0 7e 07 0f 00 2d b6 9f 4b 00 fb f4 8b 05 34 6d 78 00 <65> 44 8b 25 14 93 6b 4b 85 c0 0f 8f 85 00 00 00 5b 41 5c 41 5d 5d c3 65 8b 05 fe 92 6b 4b

Launching with --console is different though: it boots the kernel then immediately does a clean shutdown and then reboot:

To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
System BootOrder not found.  Initializing defaults.
Creating boot entry "Boot0007" with label "ubuntu" for file "\EFI\ubuntu\shimx64.efi"

error: can't find command `hwmatch'.
EFI stub: UEFI Secure Boot is enabled.
Linux version 5.4.0-1040-kvm (buildd@lgw01-amd64-047) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #41-Ubuntu SMP Fri May 14 20:43:17 UTC 2021 (Ubuntu 5.4.0-1040.41-kvm 5.4.114)
Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1040-kvm root=PARTUUID=afdb79ec-8379-4b24-99fc-4136f04c295b ro console=tty1 console=ttyS0 panic=-1
...
[  OK  ] Finished File System Check on /dev/disk/by-label/UEFI.
         Mounting /boot/efi...
[  OK  ] Mounted /boot/efi.
[  OK  ] Mounted Mount unit for core18, revision 2066.
[  OK  ] Mounted Mount unit for lxd, revision 20326.
[  OK  ] Mounted Mount unit for snapd, revision 12159.
[  OK  ] Reached target Local File Systems.
         Starting Load AppArmor profiles...
         Starting Set console font and keymap...
         Starting Create final runt…dir for shutdown pivot root...
         Starting LXD - agent - 9p mount...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting Commit a transient machine-id on disk...
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Create final runt…e dir for shutdown pivot root.
[  OK  ] Finished Set console font and keymap.
[  OK  ] Finished Tell Plymouth To Write Out Runtime Data.
[  OK  ] Finished Commit a transient machine-id on disk.
[  OK  ] Finished Create Volatile Files and Directories.
[  OK  ] Finished LXD - agent - 9p mount.
[  OK  ] Started LXD - agent.
         Starting Network Name Resolution...
         Starting Network Time Synchronization...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[  OK  ] Removed slice system-modprobe.slice.
[  OK  ] Closed LVM2 poll daemon socket.
         Stopping Create final runt…dir for shutdown pivot root...
         Stopping Load/Save Random Seed...
[  OK  ] Removed slice system-serial\x2dgetty.slice.
[  OK  ] Stopped Wait for Network to be Configured.
[  OK  ] Stopped target User and Group Name Lookups.
[  OK  ] Stopped target Slices.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Closed Syslog Socket.
[  OK  ] Stopped target Local Encrypted Volumes.
[  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
[  OK  ] Stopped target Swap.
[  OK  ] Stopped Commit a transient machine-id on disk.
         Stopping Update UTMP about System Boot/Shutdown...
[  OK  ] Stopped Network Name Resolution.
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Stopped Network Time Synchronization.
[  OK  ] Stopped Update UTMP about System Boot/Shutdown.
         Stopping Network Service...
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped Network Service.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Finished Load AppArmor profiles.
         Starting Load AppArmor pro…managed internally by snapd...
[  OK  ] Finished Load AppArmor pro…s managed internally by snapd.
[  OK  ] Stopped Create final runtime dir for shutdown pivot root.
[  OK  ] Stopped target Local File Systems.
         Unmounting /boot/efi...
         Unmounting /run/lxd_config/9p...
         Unmounting Mount unit for core18, revision 2066...
         Unmounting Mount unit for lxd, revision 20326...
         Unmounting Mount unit for snapd, revision 12159...
[  OK  ] Unmounted /boot/efi.
[FAILED] Failed unmounting /run/lxd_config/9p.
[  OK  ] Stopped File System Check on /dev/disk/by-label/UEFI.
[  OK  ] Removed slice system-systemd\x2dfsck.slice.
[  OK  ] Unmounted Mount unit for core18, revision 2066.
[  OK  ] Unmounted Mount unit for lxd, revision 20326.
[  OK  ] Unmounted Mount unit for snapd, revision 12159.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Reached target Unmount All Filesystems.
         Stopping Monitoring of LVM…meventd or progress polling...
         Stopping Device-Mapper Multipath Device Controller...
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped File System Check on Root Device.
[  OK  ] Stopped Device-Mapper Multipath Device Controller.
[  OK  ] Stopped Monitoring of LVM2… dmeventd or progress polling.
[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
[  OK  ] Finished Reboot.
[  OK  ] Reached target Reboot.
reboot: Restarting system

On reboot, it’s different:

$ lxc stop -f dannf-test2
$ lxc start dannf-test2 --console
To detach from the console, press: <ctrl>+a q
BdsDxe: loading Boot0007 "ubuntu" from HD(15,GPT,91C2E827-D1A2-410F-B5A5-24C1C3BE032C,0x2800,0x35000)/\EFI\ubuntu\shimx64.efi
BdsDxe: starting Boot0007 "ubuntu" from HD(15,GPT,91C2E827-D1A2-410F-B5A5-24C1C3BE032C,0x2800,0x35000)/\EFI\ubuntu\shimx64.efi
EFI stub: UEFI Secure Boot is enabled.
Linux version 5.4.0-1040-kvm (buildd@lgw01-amd64-047) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #41-Ubuntu SMP Fri May 14 20:43:17 UTC 2021 (Ubuntu 5.4.0-1040.41-kvm 5.4.114)
Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1040-kvm root=PARTUUID=afdb79ec-8379-4b24-99fc-4136f04c295b ro console=tty1 console=ttyS0
...
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
acpi PNP0A08:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x7a100000-0xafffffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [mem 0x800000000-0xfffffffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000
pci 0000:00:01.0: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.0: reg 0x10: [mem 0xc1a49000-0xc1a49fff]
pci 0000:00:01.1: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.1: reg 0x10: [mem 0xc1a48000-0xc1a48fff]
pci 0000:00:01.2: [1b36:000c] type 01 class 0x060400
pci 0000:00:01.2: reg 0x10: [mem 0xc1a47000-0xc1a47fff]
pci 0000:00:01.3: [1b36:000c] type 01 class 0x060400
<< hangs here >>

And at this point it has gone into ERROR state again. It’s rather consistently broken

Aside: I was unaware of the -t instance types until now. I found this blog post about them.

stgraber · July 15, 2021, 3:36pm

Does images:ubuntu/21.04/cloud work any better?

candlerb · July 15, 2021, 3:49pm

$ lxc launch images:ubuntu/21.04/cloud test2104 -t c2-m2 --vm

Yes that’s better: well, I tried it three times and it came up correctly each time. Also a shutdown and restart of the VM worked correctly.

stgraber · July 15, 2021, 4:45pm

Ok, so it may be the -kvm kernel that the official images use that’s problematic.

candlerb · July 18, 2021, 6:16pm

That’s interesting. images:ubuntu/21.04/cloud gives me 5.11.0-22-generic. So I could perhaps try the ubuntu:20.04 image, mounting it with nbd (say), chrooting and changing the kernel to generic, and see if that makes a difference.

If the -kvm kernel is the problem, then I see two possibilities.

The -kvm kernel is generally broken. But then I’d expect this issue to be seen in other virtualization frameworks (e.g. libvirt, proxmox, openstack, etc)
There’s something about the way that lxd invokes qemu/kvm which is different to the others and tickles this bug. For example, I notice that lxd starts it with some sandboxing options which I haven’t seen before; and there’s some virtiofsd stuff too.

root     21656  0.0  0.0  79884  3356 ?        Ssl  19:10   0:00 /snap/lxd/20987/bin/virtiofsd --socket-path=/var/snap/lxd/common/lxd/logs/test2104/virtio-fs.config.sock -o source=/var/snap/lxd/common/lxd/virtual-machines/test2104/config.mount
lxd      21711  9.3  0.7 2869232 470080 ?      Sl   19:10   0:11 /snap/lxd/20987/bin/qemu-system-x86_64 -S -name test2104 -uuid bc185073-c93f-4ef0-a8fa-05832d47f5cc -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/test2104/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/test2104/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/test2104/qemu.pid -D /var/snap/lxd/common/lxd/logs/test2104/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
root     21713  0.0  0.0 2376108 13548 ?       Sl   19:10   0:00 /snap/lxd/20987/bin/virtiofsd --socket-path=/var/snap/lxd/common/lxd/logs/test2104/virtio-fs.config.sock -o source=/var/snap/lxd/common/lxd/virtual-machines/test2104/config.mount

candlerb · July 18, 2021, 6:29pm

A kernel problem also doesn’t explain why the VM fails on first boot, but runs on second boot, as per this example:

lxc init ubuntu:21.04 --vm -t c2-m2 hirstute
lxc start --console hirstute

<<< goes into ERROR state >>>

lxc stop -f hirstute
lxc start --console hirstute

<<< works; no ssh.service, but then there's no network interface created >>>

The fact that there’s no network interface created suggests that cloud-init isn’t being run. The cloud-init packages are present:

root@ubuntu:~# dpkg-query -l | grep cloud
ii  cloud-guest-utils              0.32-18-g5b59de87-0ubuntu1                                           all          cloud guest utilities
ii  cloud-init                     21.2-3-g899bfaa9-0ubuntu2~21.04.1                                    all          initialization and customization tool for cloud instances
ii  cloud-initramfs-copymods       0.47ubuntu1                                                          all          copy initramfs modules into root filesystem for later use
ii  cloud-initramfs-dyn-netconf    0.47ubuntu1                                                          all          write a network interface file in /run for BOOTIF

but no /var/log/cloud* or /var/lib/cloud/ is created, and I see no cloud-init messages during system boot.

stgraber · July 18, 2021, 6:53pm

The kernel bug I linked above does account for everything you listed

LXD is the only one of those emulators which relies on UEFI with secure boot.
When booting with that firmware, the initial boot is different from subsequent boot as the nvram must get initialized. This causes a different firmware memory map on the first boot. The kernel does a number of firmware calls early in the boot sequence prior to calling ExitBootServices. It’s in that code that the crash happens.

As this triggers a firmware level crash, it fully crashes qemu rather than just panicing.
As far as LXD is concerned, the VM was started already, so on your second boot, the template files necessary to kick in cloud-init won’t be passed to the agent.

The UEFI firmware behavior explains why you only get this crash on first boot and the LXD template file behavior explains why cloud-init won’t be triggered on the second start, leading to you not having network config in place.

As far as we can tell, there is a difference between that -kvm flavor and -generic flavor of the kernel, which explains why our images work but the official images don’t.

candlerb · July 18, 2021, 7:22pm

Thank you. For kernel bug you mean #1935880 ? The last comment by the author says “the kernel may not be the correct component” so I thought there was some uncertainty. However, your explanation of UEFI booting makes sense.

From previous work with cloud-init, I know that it has a default fallback datasource which is used if no other source is detected, and that’s why I was expecting it to kick in regardless. I’m still not sure why it doesn’t, but I’ll accept that there’s something about the boot environment it expects to be there.

Actually, I can ask it:

root@ubuntu:~# DEBUG_LEVEL=2 DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force
[up 19.12s] ds-identify --force
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, Vultr, None ]
DMI_PRODUCT_NAME=Standard PC (Q35 + ICH9, 2009)
DMI_SYS_VENDOR=QEMU
DMI_PRODUCT_SERIAL=
DMI_PRODUCT_UUID=f838ec59-bfad-4553-9855-9884485c4aef
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
FS_LABELS=cloudimg-rootfs,UEFI,UEFI
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/boot/vmlinuz-5.11.0-1008-kvm root=PARTUUID=6aca70ea-bd07-4577-b348-d1f817dd88ae ro console=tty1 console=ttyS0
VIRT=kvm
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=5.11.0-1008-kvm
UNAME_KERNEL_VERSION=#8-Ubuntu SMP Fri May 14 13:05:32 UTC 2021
UNAME_MACHINE=x86_64
UNAME_NODENAME=ubuntu
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=NoCloud ConfigDrive OpenNebula DigitalOcean Azure AltCloud OVF MAAS GCE OpenStack CloudSigma SmartOS Bigstep Scaleway AliYun Ec2 CloudStack Hetzner IBMCloud Oracle Exoscale RbxCloud UpCloud Vultr None
MODE=search
...
Checking for datasource 'UpCloud' via 'dscheck_UpCloud'
check for 'UpCloud' returned not-found[1]
Checking for datasource 'Vultr' via 'dscheck_Vultr'
check for 'Vultr' returned not-found[1]
Checking for datasource 'None' via 'dscheck_None'
check for 'None' returned not-found[1]
found= maybe=
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 19.16s] returning 1
root@ubuntu:~#

Hmm, so even data source None is not accepted. And the source says:

dscheck_None() {
    return ${DS_NOT_FOUND}
}

I’m sure I had a case before (non-lxd) where the ‘None’ data source was being used. However I can’t remember the details.

adamcstephens · September 30, 2021, 11:45pm

I’ve been hitting this too. Glad I found this discussion.

I guess I’ll switch to linuxcontainers ubuntu image instead of the official ones. Thanks!

TomvB · October 12, 2021, 3:11pm

Interesting. I have the same bug on a new server and migrated the* ubuntu VM to it. No issues on the old node.

TomvB · February 7, 2022, 8:19pm

It’s strange that I don’t have this error on all nodes. When I move the VM* to an OVH dedicated server, I get this VM error.

I dont have this error on my previous dedicated server. The error appears on 2 different dedicated OVH servers and i have to stop and start the vm several times to get it running. Could it be a hardware/kernel combination?