Cant start VM " Failed to run: forklimits limit=memlock:unlimited:unlimited" LXD 4.0.9 LTS snap

LXD version 4.0.9 ubuntu 18 snap, clusterized. This was originally with windows VM now created test ubuntu VM.

Basically VM was rebooted after half year uptime and it cannot start any longer. Tried updating and rebooting 1 server didint help.

I kinda had a hunch that this might be related to bridged network like here Bridged networking on Ubuntu Server with systemd-networkd instead network-manager? - Multipass - Ubuntu Community Hub ,

Dec 01 20:21:37 blazar-linux.int.o4.lt systemd-networkd[270475]: tap429e8661: Link DOWN
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: ERROR:Unknown interface index 104 seen even after reload
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: WARNING:Unknown index 104 seen, reloading interface list
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: ERROR:Unknown interface index 104 seen even after reload
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: WARNING:Unknown index 104 seen, reloading interface list
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: ERROR:Unknown interface index 104 seen even after reload
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: WARNING:Unknown index 104 seen, reloading interface list
Dec 01 20:21:37 blazar-linux.int.o4.lt networkd-dispatcher[298143]: ERROR:Unknown interface index 104 seen even after reload

so removed network profile later but problem persists.

lxc start ubuntu
Error: Failed to run: forklimits limit=memlock:unlimited:unlimited – /snap/lxd/23991/bin/qemu-system-x86_64 -S -name ubuntu -uuid 44c9b9b0-7bca-4ccd-962d-5979b377907c -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsole
te=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/ubuntu/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/ubuntu/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/ubuntu/qemu.pid -D
/var/snap/lxd/common/lxd/logs/ubuntu/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value -1

lxc info --show-log ubuntu is doesnt have any logs

[Thu Dec 1 19:57:25 2022] rbd10: p1 p14 p15
[Thu Dec 1 19:57:25 2022] rbd: rbd10: capacity 35000000000 features 0x1
[Thu Dec 1 19:57:25 2022] rbd: rbd11: capacity 104857600 features 0x1
[Thu Dec 1 19:57:25 2022] EXT4-fs (rbd11): mounted filesystem with ordered data mode. Opts: discard
[Thu Dec 1 19:57:25 2022] ext4 filesystem being mounted at /var/snap/lxd/common/lxd/storage-pools/ceph-lxd/virtual-machines/ubuntu supports timestamps until 2038 (0x7fffffff)
[Thu Dec 1 19:57:25 2022] audit: type=1400 audit(1669925522.472:3156): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“lxd-ubuntu_</var/snap/lxd/common/lxd>” pid=394022 comm=“apparmor_parser”
[Thu Dec 1 19:57:25 2022] audit: type=1326 audit(1669925522.532:3157): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=394023 comm=“qemu-system-x86” exe="/snap/lxd/23991/bin/qemu-system-x86_64" sig=31 arch=c000003e syscall=56 compat=0 ip=0x7f6ae64f7f3f code=0x80000000

Dec 01 20:18:03 blazar-linux.int.o4.lt systemd[3111092]: Started snap.lxd.lxc.236d5c2f-44bc-4b4f-b590-12720e906c2d.scope.
Dec 01 20:18:03 blazar-linux.int.o4.lt kernel:  rbd10: p1 p14 p15
Dec 01 20:18:03 blazar-linux.int.o4.lt kernel: rbd: rbd10: capacity 35000000000 features 0x1
Dec 01 20:18:03 blazar-linux.int.o4.lt kernel: rbd: rbd11: capacity 104857600 features 0x1
Dec 01 20:18:03 blazar-linux.int.o4.lt kernel: EXT4-fs (rbd11): mounted filesystem with ordered data mode. Opts: discard
Dec 01 20:18:03 blazar-linux.int.o4.lt kernel: ext4 filesystem being mounted at /var/snap/lxd/common/lxd/storage-pools/ceph-lxd/virtual-machines/ubuntu supports timestamps until 2038 (0x7fffffff)
Dec 01 20:18:04 blazar-linux.int.o4.lt audit[397773]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-ubuntu_</var/snap/lxd/common/lxd>" pid=397773 comm="apparmor_parser"
Dec 01 20:18:04 blazar-linux.int.o4.lt kernel: audit: type=1400 audit(1669925884.044:3164): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-ubuntu_</var/snap/lxd/common/lxd>" pid=397773 comm="apparmor_parser"
Dec 01 20:18:04 blazar-linux.int.o4.lt audit[397774]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=397774 comm="qemu-system-x86" exe="/snap/lxd/23991/bin/qemu-system-x86_64" sig=31 arch=c000003e syscall=56 compat=0 ip=0x7f103bf32f3f code=0x80000000
Dec 01 20:18:04 blazar-linux.int.o4.lt kernel: audit: type=1326 audit(1669925884.108:3165): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=397774 comm="qemu-system-x86" exe="/snap/lxd/23991/bin/qemu-system-x86_64" sig=31 arch=c000003e syscall=56 compat=0 ip=0x7f103bf32f3f code=0x80000000

Do you get the same problem if you refresh to the 5.0/stable snap channel?

We can’t update lxd from 4.0.9-eb5e237 (23991 4.0/stable) to 5.0/stable because this is production server with lots of virtual machines and containers running - we can’t allow downtime for other vm/containers
Do you have any other suggestions how to fix this important issue without refreshing lxd to 5.0 or restarting server ?

What does snap info lxd show?

I don’t think that LXD version has been updated for a long time, which then suggests something else has changed on that server.

But lets check.

Also just as a side note, the LXD 4.0 LTS series is only receiving security bug fixes now, so for continued general bug fix/environmental change support you need to be running the LXD 5.0 LTS series.

See Managing the LXD snap for more info about the different snap channels.

Do you know of anything that has changed on that server recently? Any updates applied?

Can new VMs be launched?

Looking at the snap change log it seems there were some cherry picks of dependency updates 10 days ago on the 22nd Nov, and the latest 4.0/stable package was built on 25th Nov so would include those changes.

I suspect the qemu change is the most likely candidate for the breakage.

I’ll have a look and see if I can recreate.

Please can you provide lxc config show <instance> --expanded and lxc storage show <pool> and lxc network show <network> for the relevant instance, pool and network.

Thanks

I just recreated the same issue on the LXD 4.0/stable channel:

snap install lxd --channel=4.0/stable
lxd init --auto
lxc launch images:ubuntu/focal v1 --vm
Creating v1
Starting v1                                   
Error: Failed to run: forklimits limit=memlock:unlimited:unlimited -- /snap/lxd/23991/bin/qemu-system-x86_64 -S -name v1 -uuid d44bab24-1a63-4f5a-b072-d5eef160a1aa -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/v1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/v1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/v1/qemu.pid -D /var/snap/lxd/common/lxd/logs/v1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : Process exited with non-zero value -1
Try `lxc info --show-log local:v1` for more info

lxc info --show-log local:v1
Name: v1
Location: none
Remote: unix://
Architecture: x86_64
Created: 2022/12/02 11:17 UTC
Status: Stopped
Type: virtual-machine
Profiles: default

Log:

What does sudo snap changes show for LXD? I wonder if we can get a past revision.

New VMs can’t be launched - the same error :frowning:
We have latest 4.0.9 release from 4.0/stable channel (updated about a week ago):
snap list |grep lxd
lxd 4.0.9-eb5e237 23991 4.0/stable canonical** in-cohort

lxc config show w10-terminal-ssd --expanded

architecture: x86_64
config:
  boot.autostart: "true"
  boot.autostart.priority: "195"
  limits.cpu: "8"
  limits.memory: 24GB
  security.secureboot: "false"
  volatile.eth0.hwaddr: 00:16:3e:8d:2a:6e
  volatile.last_state.power: STOPPED
  volatile.uuid: ccfe6235-1186-446c-9e15-a28ee1b2a21a
  volatile.vsock_id: "662"
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: ssd
    size: 240GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

lxc storage show ssd

config: {}
description: ""
name: ssd
driver: btrfs
used_by:
- /1.0/instances/vartai2ssd
- /1.0/instances/w10-terminal-ssd
- /1.0/instances/w10-unsql-ssd
- /1.0/profiles/default
- /1.0/profiles/no_net
status: Created
locations:
- paralel-linux
- universe-linux
- cluster-linux
- blazar-linux

lxc network show br0

config: {}
description: ""
name: br0
type: bridge
used_by:
- /1.0/instances/gitlab
- /1.0/instances/nextcloud-dev
- /1.0/instances/w10-grybas
- /1.0/instances/w10-terminal-ssd
- /1.0/instances/wiki
- /1.0/instances/zabbix
- /1.0/profiles/ceph-hdd
- /1.0/profiles/ceph-ssd
- /1.0/profiles/default
- /1.0/profiles/default_hdd
managed: false
status: ""
locations: []

snap changes
no changes found

snap changes lxd
no changes found

:frowning:

What does this show?

snap list lxd  --all

I’ve test server (not connected to LXD cluster) with latest lxd 4.0.9 from 4.0/stable channel and the VMs doesn’t start there too. Then I’ve refreshed lxd in test server to 5.0/stable channel and issue is fixed in latest LXD 5.0 !!! I’m pasting snap changes output:
mantas@neutron-star:/# snap changes
ID Status Spawn Ready Summary
109 Error 8 days ago, at 08:17 UTC today at 07:21 UTC Auto-refresh snap “lxd”
110 Done today at 07:21 UTC today at 09:16 UTC Auto-refresh snaps “lxd”, “snapd”
111 Done today at 09:17 UTC today at 09:18 UTC Remove “lxd” snap
113 Done today at 09:20 UTC today at 09:20 UTC Install “lxd” snap from “4.0/stable” channel
114 Done today at 11:33 UTC today at 11:33 UTC Refresh “lxd” snap from “5.0/stable” channel

Yes I expected 5.0 LTS would work, as its similar/same as this issue:

snap list lxd --all

Name  Version        Rev    Tracking    Publisher   Notes
lxd   4.0.9-8e2046b  22753  4.0/stable  canonical✓  disabled,in-cohort
lxd   4.0.9-eb5e237  23991  4.0/stable  canonical✓  in-cohort

Ah OK so you could start by increasing the number of revision kept so you dont lose a working one:

sudo snap set system refresh.retain=n

Then trying:

sudo snap revert lxd --revision <revision number>

As you’re running the LTS series, reverting should be possible as we dont include DB/API schema changes that would prevent reverting.

Just asking if other virtual machines and containers running on the same server will be restarted when I revert lxd ? AFAIK I should run this command, right?:

snap revert lxd --revision 22753

No they shoudn’t be as this is the same as snap refresh that occurs automatically, only in the other direction.

However as you’re running a cluster you might need to do this on the other members. Although you’ll know if you need to because the snap refresh will pause waiting for the other members to arrive at same revision.

I think this shouldn’t be needed though as its just a minor snap revision and not a schema or API change.

@stgraber is looking into this now, but we suspect the issue is the more recent QEMU version in the 4.0 LTS snap is causing a seccomp violation.

We think this commit also need to be backported into the LTS 4.0 series: