Unable to launch LXD VM

iiilIiiilI · November 10, 2023, 9:32am

I’m trying to create a VM using LXD. The information on the server I am working on is as follows.

os: Linux cpu01 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
CPU cores: 32
lxd–version: 5.19

When I issue the lxd command as follows, the result is as follows.

user:~$ lxc launch ubuntu:20.04 testVm -c limits.cpu=64 --vm
Creating testVm
Starting testVm
Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 fd=4 – /snap/lxd/26093/bin/qemu-system-x86_64 -S -name testVm -uuid a26d8db4-1ee8-4616-822c-1080fc818bf4 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/testVm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/testVm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/testVm/qemu.pid -D /var/snap/lxd/common/lxd/logs/testVm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : exit status 1 Try lxc info --show-log local:testVm for more info

When the log is output, it says something like this.

user:~$ lxc info --show-log testVm
Log:
qemu-system-x86_64: warning: Number of hotpluggable cpus requested (383) exceeds the recommended cpus supported by KVM (240) Number of hotpluggable cpus requested (383) exceeds the maximum cpus supported by KVM (288)

What should I do in these cases?

stgraber · November 10, 2023, 5:45pm

Can you show lscpu from your host machine?

iiilIiiilI · November 13, 2023, 5:03am

I’m sorry for late reply. And I think some of the facts I knew were wrong and need to be corrected.
I said the number of cpu cores was 32, but as a result of entering the command you mentioned, I confirmed that it was 384. Below is the lscpu output result.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 384
On-line CPU(s) list: 0-191,193-383
Off-line CPU(s) list: 192
Thread(s) per core: 1
Core(s) per socket: 96
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9654 96-Core Processor
Stepping: 1
Frequency boost: enabled
CPU MHz: 3699.939
CPU max MHz: 2400.0000
CPU min MHz: 1500.0000
BogoMIPS: 4800.04
Virtualization: AMD-V
L1d cache: 6 MiB
L1i cache: 6 MiB
L2 cache: 192 MiB
L3 cache: 384 MiB
NUMA node0 CPU(s): 0-95,193-287
NUMA node1 CPU(s): 96-191,288-383
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

stgraber · November 14, 2023, 5:07pm

Very interesting. Based on the error, I think we should change our number of max hot-plugable CPUs to 240 in Incus, which will then fix this issue.

stgraber · November 14, 2023, 5:31pm

So I talked to a friend who works at AMD and he confirmed this was a QEMU bug but that this has been resolved.

I’m going to test it now using the latest Incus packages to see what happens on current QEMU.

stgraber · November 14, 2023, 5:49pm

Can you show lxc info on the affected system?

I’ve confirmed that newer QEMU has bumped the limit to 1024, so it’s odd that this is still happening.

stgraber · November 14, 2023, 5:50pm

It looks like QEMU 8.1.x should be fine, QEMU 8.0.x however still has the 288 limit.

iiilIiiilI · November 15, 2023, 1:12am

Here is the lxc info information. The qemu version is 8.1.1.

config: {}
api_extensions:

storage_zfs_remove_snapshots

container_host_shutdown_timeout

container_stop_priority

container_syscall_filtering

auth_pki

container_last_used_at

etag

patch

usb_devices

https_allowed_credentials

image_compression_algorithm

directory_manipulation

container_cpu_time

storage_zfs_use_refquota

storage_lvm_mount_options

network

profile_usedby

container_push

container_exec_recording

certificate_update

container_exec_signal_handling

gpu_devices

container_image_properties

migration_progress

id_map

network_firewall_filtering

network_routes

storage

file_delete

file_append

network_dhcp_expiry

storage_lvm_vg_rename

storage_lvm_thinpool_rename

network_vlan

image_create_aliases

container_stateless_copy

container_only_migration

storage_zfs_clone_copy

unix_device_rename

storage_lvm_use_thinpool

storage_rsync_bwlimit

network_vxlan_interface

storage_btrfs_mount_options

entity_description

image_force_refresh

storage_lvm_lv_resizing

id_map_base

file_symlinks

container_push_target

network_vlan_physical

storage_images_delete

container_edit_metadata

container_snapshot_stateful_migration

storage_driver_ceph

storage_ceph_user_name

resource_limits

storage_volatile_initial_source

storage_ceph_force_osd_reuse

storage_block_filesystem_btrfs

resources

kernel_limits

storage_api_volume_rename

macaroon_authentication

network_sriov

console

restrict_devlxd

migration_pre_copy

infiniband

maas_network

devlxd_events

proxy

network_dhcp_gateway

file_get_symlink

network_leases

unix_device_hotplug

storage_api_local_volume_handling

operation_description

clustering

event_lifecycle

storage_api_remote_volume_handling

nvidia_runtime

container_mount_propagation

container_backup

devlxd_images

container_local_cross_pool_handling

proxy_unix

proxy_udp

clustering_join

proxy_tcp_udp_multi_port_handling

network_state

proxy_unix_dac_properties

container_protection_delete

unix_priv_drop

pprof_http

proxy_haproxy_protocol

network_hwaddr

proxy_nat

network_nat_order

container_full

candid_authentication

backup_compression

candid_config

nvidia_runtime_config

storage_api_volume_snapshots

storage_unmapped

projects

candid_config_key

network_vxlan_ttl

container_incremental_copy

usb_optional_vendorid

snapshot_scheduling

snapshot_schedule_aliases

container_copy_project

clustering_server_address

clustering_image_replication

container_protection_shift

snapshot_expiry

container_backup_override_pool

snapshot_expiry_creation

network_leases_location

resources_cpu_socket

resources_gpu

resources_numa

kernel_features

id_map_current

event_location

storage_api_remote_volume_snapshots

network_nat_address

container_nic_routes

rbac

cluster_internal_copy

seccomp_notify

lxc_features

container_nic_ipvlan

network_vlan_sriov

storage_cephfs

container_nic_ipfilter

resources_v2

container_exec_user_group_cwd

container_syscall_intercept

container_disk_shift

storage_shifted

resources_infiniband

daemon_storage

instances

image_types

resources_disk_sata

clustering_roles

images_expiry

resources_network_firmware

backup_compression_algorithm

ceph_data_pool_name

container_syscall_intercept_mount

compression_squashfs

container_raw_mount

container_nic_routed

container_syscall_intercept_mount_fuse

container_disk_ceph

virtual-machines

image_profiles

clustering_architecture

resources_disk_id

storage_lvm_stripes

vm_boot_priority

unix_hotplug_devices

api_filtering

instance_nic_network

clustering_sizing

firewall_driver

projects_limits

container_syscall_intercept_hugetlbfs

limits_hugepages

container_nic_routed_gateway

projects_restrictions

custom_volume_snapshot_expiry

volume_snapshot_scheduling

trust_ca_certificates

snapshot_disk_usage

clustering_edit_roles

container_nic_routed_host_address

container_nic_ipvlan_gateway

resources_usb_pci

resources_cpu_threads_numa

resources_cpu_core_die

api_os

container_nic_routed_host_table

container_nic_ipvlan_host_table

container_nic_ipvlan_mode

resources_system

images_push_relay

network_dns_search

container_nic_routed_limits

instance_nic_bridged_vlan

network_state_bond_bridge

usedby_consistency

custom_block_volumes

clustering_failure_domains

resources_gpu_mdev

console_vga_type

projects_limits_disk

network_type_macvlan

network_type_sriov

container_syscall_intercept_bpf_devices

network_type_ovn

projects_networks

projects_networks_restricted_uplinks

custom_volume_backup

backup_override_name

storage_rsync_compression

network_type_physical

network_ovn_external_subnets

network_ovn_nat

network_ovn_external_routes_remove

tpm_device_type

storage_zfs_clone_copy_rebase

gpu_mdev

resources_pci_iommu

resources_network_usb

resources_disk_address

network_physical_ovn_ingress_mode

network_ovn_dhcp

network_physical_routes_anycast

projects_limits_instances

network_state_vlan

instance_nic_bridged_port_isolation

instance_bulk_state_change

network_gvrp

instance_pool_move

gpu_sriov

pci_device_type

storage_volume_state

network_acl

migration_stateful

disk_state_quota

storage_ceph_features

projects_compression

projects_images_remote_cache_expiry

certificate_project

network_ovn_acl

projects_images_auto_update

projects_restricted_cluster_target

images_default_architecture

network_ovn_acl_defaults

gpu_mig

project_usage

network_bridge_acl

warnings

projects_restricted_backups_and_snapshots

clustering_join_token

clustering_description

server_trusted_proxy

clustering_update_cert

storage_api_project

server_instance_driver_operational

server_supported_storage_drivers

event_lifecycle_requestor_address

resources_gpu_usb

clustering_evacuation

network_ovn_nat_address

network_bgp

network_forward

custom_volume_refresh

network_counters_errors_dropped

metrics

image_source_project

clustering_config

network_peer

linux_sysctl

network_dns

ovn_nic_acceleration

certificate_self_renewal

instance_project_move

storage_volume_project_move

cloud_init

network_dns_nat

database_leader

instance_all_projects

clustering_groups

ceph_rbd_du

instance_get_full

qemu_metrics

gpu_mig_uuid

event_project

clustering_evacuation_live

instance_allow_inconsistent_copy

network_state_ovn

storage_volume_api_filtering

image_restrictions

storage_zfs_export

network_dns_records

storage_zfs_reserve_space

network_acl_log

storage_zfs_blocksize

metrics_cpu_seconds

instance_snapshot_never

certificate_token

instance_nic_routed_neighbor_probe

event_hub

agent_nic_config

projects_restricted_intercept

metrics_authentication

images_target_project

cluster_migration_inconsistent_copy

cluster_ovn_chassis

container_syscall_intercept_sched_setscheduler

storage_lvm_thinpool_metadata_size

storage_volume_state_total

instance_file_head

instances_nic_host_name

image_copy_profile

container_syscall_intercept_sysinfo

clustering_evacuation_mode

resources_pci_vpd

qemu_raw_conf

storage_cephfs_fscache

network_load_balancer

vsock_api

instance_ready_state

network_bgp_holdtime

storage_volumes_all_projects

metrics_memory_oom_total

storage_buckets

storage_buckets_create_credentials

metrics_cpu_effective_total

projects_networks_restricted_access

storage_buckets_local

loki

acme

internal_metrics

cluster_join_token_expiry

remote_token_expiry

init_preseed

storage_volumes_created_at

cpu_hotplug

projects_networks_zones

network_txqueuelen

cluster_member_state

instances_placement_scriptlet

storage_pool_source_wipe

zfs_block_mode

instance_generation_id

disk_io_cache

amd_sev

storage_pool_loop_resize

migration_vm_live

ovn_nic_nesting

oidc

network_ovn_l3only

ovn_nic_acceleration_vdpa

cluster_healing

instances_state_total

auth_user

security_csm

instances_rebuild

numa_cpu_placement

custom_volume_iso

network_allocations

storage_api_remote_volume_snapshot_copy

zfs_delegate

operations_get_query_all_projects

metadata_configuration

syslog_socket

event_lifecycle_name_and_project

instances_nic_limits_priority

disk_initial_volume_configuration

operation_wait

api_status: stable
api_version: “1.0”
auth: trusted
public: false
auth_methods:

tls

auth_user_name: user_name
auth_user_method: unix
environment:
addresses:
architectures:

x86_64

i686

certificate: |
-----BEGIN CERTIFICATE-----
{cert}
-----END CERTIFICATE-----
certificate_fingerprint: {fingerprint}
driver: lxc | qemu
driver_version: 5.0.3 | 8.1.1
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: “false”
netnsid_getifaddrs: “true”
seccomp_listener: “true”
seccomp_listener_continue: “true”
shiftfs: “false”
uevent_injection: “true”
unpriv_fscaps: “true”
kernel_version: 5.4.0-165-generic
lxc_features:
cgroup2: “true”
core_scheduling: “true”
devpts_fd: “true”
idmapped_mounts_v2: “true”
mount_injection_file: “true”
network_gateway_device_route: “true”
network_ipvlan: “true”
network_l2proxy: “true”
network_phys_macvlan_mtu: “true”
network_veth_router: “true”
pidfd: “true”
seccomp_allow_deny_syntax: “true”
seccomp_notify: “true”
seccomp_proxy_send_notify_fd: “true”
os_name: Ubuntu
os_version: “20.04”
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: server_name
server_pid: 796560
server_version: “5.19”
storage: zfs | dir
storage_version: 0.8.3-1ubuntu12.15 | 1
storage_supported_drivers:

name: lvm
version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.41.0
remote: false

name: zfs
version: 0.8.3-1ubuntu12.15
remote: false

name: btrfs
version: 5.16.2
remote: false

name: ceph
version: 17.2.6
remote: true

name: cephfs
version: 17.2.6
remote: true

name: cephobject
version: 17.2.6
remote: true

name: dir
version: “1”
remote: false

stgraber · November 15, 2023, 4:54am

That’s odd, QEMU 8.1.1 should have the much higher limit in place…

stgraber · November 18, 2023, 10:52pm

As I don’t own a dual-socket EPYC with an insane number of cores, I’ve had to simulate it using QEMU:

root@ubuntu:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  384
  On-line CPU(s) list:   0-383
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 5700G with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  2
    Core(s) per socket:  96
    Socket(s):           2
    Stepping:            0
    BogoMIPS:            7599.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid
                          extd_apicid amd_dcm tsc_known_freq pni pclmulqdq ssse3
                          fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadlin
                         e_timer aes xsave avx f16c rdrand hypervisor lahf_lm cm
                         p_legacy svm cr8_legacy abm sse4a misalignsse 3dnowpref
                         etch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmm
                         call fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpc
                         id rdseed adx smap clflushopt clwb sha_ni xsaveopt xsav
                         ec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt l
                         brv nrip_save tsc_scale vmcb_clean pausefilter pfthresh
                         old v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq
                          rdpid fsrm arch_capabilities
Virtualization features: 
  Virtualization:        AMD-V
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   12 MiB (192 instances)
  L1i:                   12 MiB (192 instances)
  L2:                    96 MiB (192 instances)
  L3:                    256 MiB (16 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-383
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIB
                         P always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

I’ve then asked Incus to create me a VM similar to what you used for LXD:

root@ubuntu:~# incus launch images:ubuntu/22.04 v1 -c limits.cpu=64 -c limits.memory=4GiB --vm
Creating v1
Starting v1
root@ubuntu:~# incus exec v1 -- lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0
  Off-line CPU(s) list:  1-63
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 5700G with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            7599.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid
                          extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx1
                         6 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
                         aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
                          svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                         w perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase ts
                         c_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx sm
                         ap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsave
                         s clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save ts
                         c_scale vmcb_clean pausefilter pfthreshold v_vmsave_vml
                         oad vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch
                         _capabilities
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   64 KiB (1 instance)
  L1i:                   64 KiB (1 instance)
  L2:                    512 KiB (1 instance)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Mitigation; safe RET, no microcode
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
                          and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIB
                         P disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@ubuntu:~#

I’ve also confirmed that it’s possible to create a VM on Incus with more CPUs then the old limit:

root@ubuntu:~# incus launch images:ubuntu/22.04 v2 -c limits.cpu=256 -c limits.memory=4GiB --vm
Creating v2
Starting v2

So the newer version of QEMU we ship with Incus seems to be behaving just fine with those larger systems. I’m unsure why the one you’ve got on your system is somehow failing to handle this.