VMs do not start on LXD 4.10/4.11 on aarch64 with kernel 5.10

,

Hi guys,
I have an issue with starting VMs on LXD 4.10 (Rev. 19168) and on LXD 4.11 (REv. 19390) with kernel 5.10.11 on arm64 architecture with Ubuntu 20.04 as host. The guest has the same architecture and OS version but diffrent kernel version 5.4.
I am getting following error:

Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 – /snap/lxd/19168/bin/qemu-system-aarch64 -S -name ABCD4A -uuid 16b0b086-7d20-4489-8beb-64af07a6e2f1 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-reboot -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/ABCD4A/qemu.conf -pidfile /var/snap/lxd/common/lxd/logs/ABCD4A/qemu.pid -D /var/snap/lxd/common/lxd/logs/ABCD4A/qemu.log -chroot /var/snap/lxd/common/lxd/virtual-machines/ABCD4A -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: char device redirected to /dev/pts/2 (label console)
: Process exited with a non-zero value
Try lxc info --show-log ABCD4A for more info

lxc info --show-log ABCD4A

Name: ABCD4A
Location: ABCD30
Remote: unix://
Architecture: aarch64
Created: 2021/02/03 17:47 UTC
Status: Stopped
Type: virtual-machine
Profiles: vm, p4A
Pid: -1
Resources:
Processes: 0
Disk usage:
root: 67.53MB

Log:

qemu-system-aarch64: …/target/arm/helper.c:1948: pmevcntr_rawwrite: Assertion `counter < pmu_num_counters(env)’ failed.


Have you seen this error before? Any idea what can it be and how to solve it?
Thanks a lot in advance for your support.

Hmm, never saw that one before. Did it used to work with a previous kernel on the host?

Yes, on host nodes with kernel 5.8.5, the same VMs work and they run pretty stable for days.

Hmm, suggests a kernel regression of some kind…

@brauner any idea where to best report this?

Looks to be caused by this patch, backported to 5.10.8

https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210107112101.2297944-2-maz@kernel.org/

So it probably should be a qemu thing

nice, good work tracking it down!

Some further info:

This does not happen on Linux 5.11, searching through 5.11 log there’s another commit metioning PMU registers:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.11&id=11663111cd49b4c6dd27479774e420f139e4c447

However this one is not backported into 5.10 tree.

Maybe this is where the problem lies, as QEMU will see the absent PMU registers while KVM doesn’t.

Super, thanks a lot for this hint. :grinning:
I will recompile the kernel with 5.11 and test.