M11
(Michał Policht)
December 2, 2022, 8:58pm
1
Hi everyone,
My virtual machine suddenly stopped working. I’ve been looking for similar problems, but most people see some output when they started their virtual machines with --console
parameter. When I start with --console
the output is completely empty. Any clues or things I could try would be helpful.
Output looks like this:
To detach from the console, press: <ctrl>+a q
WereCatf
(Were Catf)
December 2, 2022, 10:48pm
2
Install virt-viewer, view the VM’s display with lxc console yourvmhere --type=vga
M11
(Michał Policht)
December 3, 2022, 8:22pm
3
Guest has not initialized the display (yet):
lxc info vm --show-log
Status: RUNNING
Type: virtual-machine
Architecture: x86_64
PID: 30483
Created: 2022/05/02 16:35 CEST
Last Used: 2022/12/03 21:19 CET
Resources:
Processes: -1
Disk usage:
root: 25.41GiB
Network usage:
eth0:
Type: broadcast
State: UP
Host interface: tapfbe92983
MAC address: 00:16:3e:1c:3e:23
MTU: 1500
Bytes received: 42B
Bytes sent: 0B
Packets received: 1
Packets sent: 0
IP addresses:
Log:
I’ve noticed that now also another virtual machine went down and suffers from the same symptoms…
lxd --version
→ 5.8
QEMU 7.1.0
I can create and start new virtual machines.
Would it be possible that LXD has been upgraded and LXD has troubles with virtual machines created by earlier version?
tomp
(Thomas Parrott)
December 3, 2022, 8:55pm
4
How long have VMS been running? There aren’t any known issues with vms on 5.8.
M11
(Michał Policht)
December 3, 2022, 9:08pm
6
Hi Thomas,
No snap here.
I’ve created first VM 2022/05/02 and the second one 2022/06/02. No (at least visible) problems until now. It’s a transactional server so it updates itself and often reboots frequently, thus uptimes are rarely higher than max few days.
BTW containers work as usually, only VMs went down.
I forget to mention - first VM runs OpenSUSE (15.3) the second one is Win10. I have first noticed OpenSUSE is down, cause there was no IPV4 IP listed as usually in lxc list
table. Win10 seemed to be OK, but maybe it was only an artifact and it was already broken too.
tomp
(Thomas Parrott)
December 3, 2022, 9:15pm
7
Do you see the qemu processes associated with the uuid in lxc config show (instance) --expanded?
tomp
(Thomas Parrott)
December 3, 2022, 9:15pm
8
Do you see anything in dmesg? Perhaps an apparmor denial?
tomp
(Thomas Parrott)
December 3, 2022, 9:16pm
9
Qemu 7.1.0 is what we use in the snap.
But we build our own in the snap so it could be a difference if you’re using a system package.
M11
(Michał Policht)
December 3, 2022, 9:23pm
11
Yeah, both processes are running with the same uuid as returned by lxc config show --expanded
M11
(Michał Policht)
December 3, 2022, 9:24pm
12
dmesg | grep apparmor
[ 1.600181] evm: security.apparmor
[ 15.704248] audit: type=1400 audit(1670038456.907:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1114 comm="apparmor_parser"
[ 15.746152] audit: type=1400 audit(1670038456.947:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1116 comm="apparmor_parser"
[ 15.747713] audit: type=1400 audit(1670038456.947:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1116 comm="apparmor_parser"
[ 15.749004] audit: type=1400 audit(1670038456.951:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="klogd" pid=1123 comm="apparmor_parser"
[ 15.982751] audit: type=1400 audit(1670038457.183:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lessopen.sh" pid=1126 comm="apparmor_parser"
[ 15.984308] audit: type=1400 audit(1670038457.187:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=1127 comm="apparmor_parser"
[ 15.986717] audit: type=1400 audit(1670038457.187:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="dovecot-anvil" pid=1129 comm="apparmor_parser"
[ 16.209665] audit: type=1400 audit(1670038457.411:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=1115 comm="apparmor_parser"
[ 16.209674] audit: type=1400 audit(1670038457.411:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=1115 comm="apparmor_parser"
[ 16.209678] audit: type=1400 audit(1670038457.411:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=1115 comm="apparmor_parser"
M11
(Michał Policht)
December 3, 2022, 9:25pm
14
Yes new VMs start without any troubles. I can start, stop, restart them… That boggles me.
tomp
(Thomas Parrott)
December 3, 2022, 9:26pm
15
Try stopping the vm with --force if needed and then start with --console flag to see boot output (non vga).
M11
(Michał Policht)
December 3, 2022, 9:28pm
16
Hmmm noticed something new. Newly created virtual machines is now also broken. I will play around with VMs and return to you, after I have checked this out reliably. Thanks for your help Thomas.
tomp
(Thomas Parrott)
December 3, 2022, 9:30pm
17
Please show instance expanded config.
M11
(Michał Policht)
December 3, 2022, 9:38pm
18
But configs are as follow:
architecture: x86_64
config:
image.architecture: amd64
image.description: Opensuse 15.3 amd64 (20220502_04:49)
image.os: Opensuse
image.release: "15.3"
image.serial: "20220502_04:49"
image.type: disk-kvm.img
image.variant: default
limits.cpu: "16"
limits.memory: 8GiB
volatile.base_image: cc04c4d48c73c56ec312a7a96071d0d9f576427d23e87fa07c57c5aecd525f62
volatile.eth0.host_name: tapde687920
volatile.eth0.hwaddr: 00:16:3e:1c:3e:23
volatile.last_state.power: RUNNING
volatile.last_state.ready: "false"
volatile.uuid: 48260c83-2d9f-4075-8df9-ae4640c9542a
volatile.vsock_id: "94"
devices:
backup:
path: /mnt/backup/
source: /var/backup/gitlab/
type: disk
eth0:
ipv4.address: 172.16.0.2
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: default
size: 32GiB
type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
Win10
architecture: x86_64
config:
limits.cpu: "4"
limits.memory: 4GiB
security.secureboot: "false"
volatile.cloud-init.instance-id: 92b90ea0-11ea-48f6-a0ec-f8ad6bb87775
volatile.eth0.host_name: tap9e8880e0
volatile.eth0.hwaddr: 00:16:3e:e8:c6:ab
volatile.last_state.power: RUNNING
volatile.last_state.ready: "false"
volatile.uuid: 40e74e31-7361-434c-a15d-fae1d422b2d6
volatile.vsock_id: "102"
devices:
eth0:
name: eth0
nictype: bridged
parent: lxdbr0
type: nic
root:
path: /
pool: default
size: 100GiB
type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
tomp
(Thomas Parrott)
December 3, 2022, 9:48pm
19
Try removing limits.cpu setting and see if that helps. Lxd 5.8 changed how multiple CPUs were added to support cpu hot plugging. Just a hunch.
M11
(Michał Policht)
December 3, 2022, 9:59pm
20
O.K. so I typed:
lxc init images:opensuse/15.4 vmtest --vm
Creating vmtest
Retrieving image: Unpack: 100% (1.82GB/s)
lxc list
| vmtest | STOPPED | | | VIRTUAL-MACHINE | 0 |
lxc start vmtest --console
BdsDxe: loading Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
BdsDxe: starting Boot0001 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/Scsi(0x0,0x1)
Welcome to GRUB!
Booting `openSUSE Leap 15.4'
Loading Linux 5.14.21-150400.24.33-default ...
Loading initial ramdisk ...
[ 0.000000][ T0] Linux version 5.14.21-150400.24.33-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.37) #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60)
(...)
[FAILED] Failed to start LXD - agent.
See 'systemctl status lxd-agent.service' for details.
(...)
[ OK ] Finished Record Runlevel Change in UTMP.
vmtest login:
But I have launched a shell and LXD agent is OK by then:
● lxd-agent.service - LXD - agent
Loaded: loaded (/usr/lib/systemd/system/lxd-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2022-12-03 21:50:58 UTC; 3min 30s ago
Docs: https://linuxcontainers.org/lxd
Process: 543 ExecStartPre=/usr/lib/systemd/lxd-agent-setup (code=exited, status=0/SUCCESS)
Main PID: 553 (lxd-agent)
Tasks: 11 (limit: 1103)
CGroup: /system.slice/lxd-agent.service
├─ 553 /run/lxd_agent/lxd-agent
├─ 1040 bash
├─ 1062 systemctl status lxd-agent.service
└─ 1063 "(pager)"
Dec 03 21:50:58 localhost systemd[1]: Starting LXD - agent...
Dec 03 21:50:58 vmtest systemd[1]: Started LXD - agent.
Now I try to stop vmtest (lxc stop vmtest
) and it stops cleanly.
| vmtest | STOPPED | | | VIRTUAL-MACHINE | 0 |
But when I start it again lxc start vmtest --console
it won’t boot, so perhaps I was wrong that I can start and stop virtual machines.
To detach from the console, press: <ctrl>+a q
lxc list
| vmtest | RUNNING | 172.16.0.200 (eth0) | | VIRTUAL-MACHINE | 0 |