VM's frozen at boot after snap LXD update (4.24) io_uring problems

Hi,

My LXD VM’s are frozen after apt update, apt upgrade & reboot.

±---------±--------±-----±-----±----------------±----------+
| VM-01 | RUNNING | | | VIRTUAL-MACHINE | 0 |
±---------±--------±-----±-----±----------------±----------+
| VM-02 | RUNNING | | | VIRTUAL-MACHINE | 0 |
±---------±--------±-----±-----±----------------±----------+

  1. The servers no longer receive an IP address.
  2. lxc list is having a hard time and the VMs are using 1 cpu core at 100% with the VM’s up & running.
  3. after lxc stop VM-01 --force, LXD no longer responds at all
  4. I have to reset the server during the reboot.
  5. This is only for Virtual Machines in LXD

Operating system on host and VM’s: 20.04
Latest patches on the host and VM;s
libnetplan0/focal-updates 0.103-0ubuntu5~20.04.6 amd64 [upgradable from: 0.103-0ubuntu5~20.04.5]
libxml2/focal-updates,focal-security 2.9.10+dfsg-5ubuntu0.20.04.2 amd64 [upgradable from: 2.9.10+dfsg-5ubuntu0.20.04.1]
netplan.io/focal-updates 0.103-0ubuntu5~20.04.6 amd64 [upgradable from: 0.103-0ubuntu5~20.04.5]

Do you have any idea? I will try old backups now.

Quitting the VMs with --force no longer works either. The host hangs on reboot. Only a reset helps during shutdown. The containers are healthy and running.

Name: VM-01
Status: RUNNING
Type: virtual-machine
Architecture: x86_64
PID: 3834
Created: 2022/02/01 13:01 UTC
Last Used: 2022/03/15 17:36 UTC

Resources:
  Processes: -1
  Disk usage:
    root: 11.46GiB

Log:

warning: tap: open vhost char device failed: Permission denied

Name    Version   Rev    Tracking       Publisher   Notes
core20  20220304  1376   latest/stable  canonical✓  base
lxd     4.24      22662  latest/stable  canonical✓  -
snapd   2.54.4    15177  latest/stable  canonical✓  snapd

@stgraber @tomp It is unfortunately due to version 4.24 in the latest/stable.
Please provide some info about the changes and the workaround to fix this bug.

Related to I/O uring? How to disable this?

It may be IO uring related indeed.

Can you share:

  • lxc storage list
  • lxc info

And then for an affected instance, the content of /var/snap/lxd/common/lxd/logs/NAME/qemu.conf

lxc storage list:

+-------+--------+--------+-------------+---------+---------+
| NAME  | DRIVER | SOURCE | DESCRIPTION | USED BY |  STATE  |
+-------+--------+--------+-------------+---------+---------+
| local | zfs    | LXD    |             | 7       | CREATED |
+-------+--------+--------+-------------+---------+---------+

lxc info:

qemu.log:
warning: tap: open vhost char device failed: Permission denied
qemu.conf:

Please can you show output of lxc config show <instance> --expanded as I’m interested by the error warning: tap: open vhost char device failed: Permission denied.

Also please can you show the output of sudo dmesg | grep DENIED after trying to start one of the instances.

lxc config show VM-01 --expanded
architecture: x86_64
config:
  boot.autostart: "false"
  environment.TZ: Europe/Paris
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210223)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210223"
  image.type: disk-kvm.img
  image.version: "20.04"
  limits.cpu: "4"
  limits.memory: 4096MB
  volatile.base_image: a548372a4ccb5fc4fb1243de4ba5e4b130f861bb73f40ad1b6ffb0f534f8d168
  volatile.last_state.power: STOPPED
  volatile.uuid: 19096fef-5d0d-4bb1-91b2-a27bd29f277e
  volatile.vsock_id: "13"
devices:
  eth0:
    hwaddr: 02:00:00:xx:xx:xx
    nictype: macvlan
    parent: eno3
    type: nic
  root:
    path: /
    pool: local
    size: 40GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

sudo dmesg | grep DENIED
no output

I see this message more often with VMs. I don’t think this is the problem.

Note: I did not grep after starting the VMs. I have to (hard)reset my node with chance of corruption with VM’s running. I prefer not to boot the VMs right now.

What happens to your LXD server that causes you to need to reset it? Is it high CPU/disk I/O or a kernel crash?

What is the output of uname -a on the host?

The VMs use 1 core at 100% after starting. The LXD process hangs after a while and it stops responding to commands. I didnt check the disk I/O after this behavior. I have 2 enterprise NVMe SSDs in the server. But I still lean towards the io_uring. Can I turn this off on a VM to test it?

uname -a:
Linux 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

I can’t disable the LXD snap either. The process hangs and therefore the server does not shut down with this issue. LXD is therefore in the way when shutting down the server, because it is not possible to kill a VM with --force. LXD freezes after the lxc stop VM-01 --force command.

You cannot manually disable io_uring currently.

Can you show the output of lxc storage show local please.

lxc storage show local
config:
  source: LXD
  volatile.initial_source: LXD
  zfs.pool_name: LXD
description: ""
name: local
driver: zfs
used_by:
- /1.0/images/06460ff79260729ba686608f11eb3d6eff26a72449dfd71e9d22a42f0038b897
- /1.0/instances/LC-03
- /1.0/instances/LC-02
- /1.0/instances/LC-01
- /1.0/instances/VM-02
- /1.0/instances/VM-01
- /1.0/profiles/default
status: Created
locations:
- none

And the LXD ZFS pool is backed directly onto physical disk(s) and not a loop file?

I don’t have any special configurations on my LXD servers at all.

2 NVMe SSDs with each 1 ZFS partition in mirror. I use LUKS encryption on the SSDs.
Other than that it’s just a normal simple pool with no special configurations.

zpool status
  pool: LXD
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:07:58 with 0 errors on Tue Mar 15 20:56:40 2022
config:

        NAME          STATE     READ WRITE CKSUM
        LXD           ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            LXDNVME0  ONLINE       0     0     0
            LXDNVME1  ONLINE       0     0     0

errors: No known data errors

Where is LUKS added in the equation?

LUKS is enabled on the ZFS partitions. Not on the OS partition.

This is standard LUKS encryption on the partition. I have to unlock this after a reboot on both SSDs with a passphrase. Usually I disable LXD before reboot and enable it after unlocking.

Example with a new host:

cryptsetup luksOpen /dev/nvme0n1p5 LXDNVME0
cryptsetup luksOpen /dev/nvme1n1p5 LXDNVME1
zpool create -o ashift=12 LXD mirror /dev/mapper/LXDNVME0 /dev/mapper/LXDNVME1

and of course after:

cryptsetup luksFormat /dev/nvme0n1p5
cryptsetup luksFormat /dev/nvme1n1p5

Thanks.

I’ve not reproduced the crash but I do see an issue with either starting VMs or networking. I’m looking into it.

lxc info --show-log v1
Name: v1
Status: RUNNING
Type: virtual-machine
Architecture: x86_64
PID: 11780
Created: 2022/03/16 09:56 UTC
Last Used: 2022/03/16 09:58 UTC

Resources:
  Processes: -1
  Disk usage:
    root: 1.00KiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: tap36ac250e
      MAC address: 00:16:3e:f6:06:a6
      MTU: 1500
      Bytes received: 506B
      Bytes sent: 0B
      Packets received: 3
      Packets sent: 0
      IP addresses:
        inet6: fd42:bef9:ad24:28f0:216:3eff:fef6:6a6/64 (global)

Log:

warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied
1 Like

Which is weird as we run daily tests of this stuff on Focal using the snap for various network types (mine is failing just on normal bridged NIC type).

https://jenkins.linuxcontainers.org/job/lxd-test-network/420/console

OK so the warning: tap: open vhost char device failed: Permission denied is a red herring, that shows up when everything works OK.

Interestingly I’ve recreated the problem when using ZFS ontop of LVM.
I’ve not tried it on ZFS direct to disk yet. Is this something you can try (i.e creating a ZFS pool without LUKS?).

I’ve also confirmed its working on on a loop backed ZFS pool (we already disable io_uring in this case).

disabling io_uring fixes it (via a custom build), but want to see if there is something else at play here.
Although generally io_uring seems to be very unreliable with QEMU on stacked storage layers.

Sure, I will test it in a Hyper-V VM