VM won't start - empty console output

Hi everyone,

My virtual machine suddenly stopped working. I’ve been looking for similar problems, but most people see some output when they started their virtual machines with --console parameter. When I start with --console the output is completely empty. Any clues or things I could try would be helpful.

Output looks like this:

To detach from the console, press: <ctrl>+a q

Install virt-viewer, view the VM’s display with lxc console yourvmhere --type=vga

Guest has not initialized the display (yet):

obraz

lxc info vm --show-log

Status: RUNNING
Type: virtual-machine
Architecture: x86_64
PID: 30483
Created: 2022/05/02 16:35 CEST
Last Used: 2022/12/03 21:19 CET

Resources:
  Processes: -1
  Disk usage:
    root: 25.41GiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: tapfbe92983
      MAC address: 00:16:3e:1c:3e:23
      MTU: 1500
      Bytes received: 42B
      Bytes sent: 0B
      Packets received: 1
      Packets sent: 0
      IP addresses:

Log:

I’ve noticed that now also another virtual machine went down and suffers from the same symptoms…

lxd --version → 5.8
QEMU 7.1.0

I can create and start new virtual machines.
Would it be possible that LXD has been upgraded and LXD has troubles with virtual machines created by earlier version?

How long have VMS been running? There aren’t any known issues with vms on 5.8.

Also using the snap?

Hi Thomas,

No snap here.

I’ve created first VM 2022/05/02 and the second one 2022/06/02. No (at least visible) problems until now. It’s a transactional server so it updates itself and often reboots frequently, thus uptimes are rarely higher than max few days.

BTW containers work as usually, only VMs went down.

I forget to mention - first VM runs OpenSUSE (15.3) the second one is Win10. I have first noticed OpenSUSE is down, cause there was no IPV4 IP listed as usually in lxc list table. Win10 seemed to be OK, but maybe it was only an artifact and it was already broken too.

Do you see the qemu processes associated with the uuid in lxc config show (instance) --expanded?

Do you see anything in dmesg? Perhaps an apparmor denial?

Qemu 7.1.0 is what we use in the snap.
But we build our own in the snap so it could be a difference if you’re using a system package.

Do new vms start?

Yeah, both processes are running with the same uuid as returned by lxc config show --expanded

dmesg | grep apparmor

[    1.600181] evm: security.apparmor
[   15.704248] audit: type=1400 audit(1670038456.907:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1114 comm="apparmor_parser"
[   15.746152] audit: type=1400 audit(1670038456.947:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1116 comm="apparmor_parser"
[   15.747713] audit: type=1400 audit(1670038456.947:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1116 comm="apparmor_parser"
[   15.749004] audit: type=1400 audit(1670038456.951:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="klogd" pid=1123 comm="apparmor_parser"
[   15.982751] audit: type=1400 audit(1670038457.183:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lessopen.sh" pid=1126 comm="apparmor_parser"
[   15.984308] audit: type=1400 audit(1670038457.187:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=1127 comm="apparmor_parser"
[   15.986717] audit: type=1400 audit(1670038457.187:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="dovecot-anvil" pid=1129 comm="apparmor_parser"
[   16.209665] audit: type=1400 audit(1670038457.411:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=1115 comm="apparmor_parser"
[   16.209674] audit: type=1400 audit(1670038457.411:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=1115 comm="apparmor_parser"
[   16.209678] audit: type=1400 audit(1670038457.411:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=1115 comm="apparmor_parser"

Nothing wrong there

Yes new VMs start without any troubles. I can start, stop, restart them… That boggles me.

Try stopping the vm with --force if needed and then start with --console flag to see boot output (non vga).

Hmmm noticed something new. Newly created virtual machines is now also broken. I will play around with VMs and return to you, after I have checked this out reliably. Thanks for your help Thomas.

Please show instance expanded config.

But configs are as follow:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Opensuse 15.3 amd64 (20220502_04:49)
  image.os: Opensuse
  image.release: "15.3"
  image.serial: "20220502_04:49"
  image.type: disk-kvm.img
  image.variant: default
  limits.cpu: "16"
  limits.memory: 8GiB
  volatile.base_image: cc04c4d48c73c56ec312a7a96071d0d9f576427d23e87fa07c57c5aecd525f62
  volatile.eth0.host_name: tapde687920
  volatile.eth0.hwaddr: 00:16:3e:1c:3e:23
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 48260c83-2d9f-4075-8df9-ae4640c9542a
  volatile.vsock_id: "94"
devices:
  backup:
    path: /mnt/backup/
    source: /var/backup/gitlab/
    type: disk
  eth0:
    ipv4.address: 172.16.0.2
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 32GiB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Win10

architecture: x86_64
config:
  limits.cpu: "4"
  limits.memory: 4GiB
  security.secureboot: "false"
  volatile.cloud-init.instance-id: 92b90ea0-11ea-48f6-a0ec-f8ad6bb87775
  volatile.eth0.host_name: tap9e8880e0
  volatile.eth0.hwaddr: 00:16:3e:e8:c6:ab
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 40e74e31-7361-434c-a15d-fae1d422b2d6
  volatile.vsock_id: "102"
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 100GiB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Try removing limits.cpu setting and see if that helps. Lxd 5.8 changed how multiple CPUs were added to support cpu hot plugging. Just a hunch.

O.K. so I typed:

lxc init images:opensuse/15.4 vmtest --vm
Creating vmtest
Retrieving image: Unpack: 100% (1.82GB/s)

lxc list

| vmtest          | STOPPED |                      |      | VIRTUAL-MACHINE | 0         |

lxc start vmtest --console

BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
  Booting `openSUSE Leap 15.4'

Loading Linux 5.14.21-150400.24.33-default ...
Loading initial ramdisk ...
[    0.000000][    T0] Linux version 5.14.21-150400.24.33-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.37) #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60)

(...)
[FAILED] Failed to start LXD - agent.
See 'systemctl status lxd-agent.service' for details.
(...)

[  OK  ] Finished Record Runlevel Change in UTMP.

vmtest login:

But I have launched a shell and LXD agent is OK by then:

● lxd-agent.service - LXD - agent
     Loaded: loaded (/usr/lib/systemd/system/lxd-agent.service; enabled; vendor preset: disabled)
     Active: active (running) since Sat 2022-12-03 21:50:58 UTC; 3min 30s ago
       Docs: https://linuxcontainers.org/lxd
    Process: 543 ExecStartPre=/usr/lib/systemd/lxd-agent-setup (code=exited, status=0/SUCCESS)
   Main PID: 553 (lxd-agent)
      Tasks: 11 (limit: 1103)
     CGroup: /system.slice/lxd-agent.service
             ├─  553 /run/lxd_agent/lxd-agent
             ├─ 1040 bash
             ├─ 1062 systemctl status lxd-agent.service
             └─ 1063 "(pager)"

Dec 03 21:50:58 localhost systemd[1]: Starting LXD - agent...
Dec 03 21:50:58 vmtest systemd[1]: Started LXD - agent.

Now I try to stop vmtest (lxc stop vmtest) and it stops cleanly.

| vmtest          | STOPPED |                      |      | VIRTUAL-MACHINE | 0         |

But when I start it again lxc start vmtest --console it won’t boot, so perhaps I was wrong that I can start and stop virtual machines.

To detach from the console, press: <ctrl>+a q

lxc list

| vmtest          | RUNNING | 172.16.0.200 (eth0)  |      | VIRTUAL-MACHINE | 0         |