Unable to launch LXD VM

,

I’m trying to create a VM using LXD. The information on the server I am working on is as follows.

os: Linux cpu01 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
CPU cores: 32
lxd–version: 5.19

When I issue the lxd command as follows, the result is as follows.

user:~$ lxc launch ubuntu:20.04 testVm -c limits.cpu=64 --vm
Creating testVm
Starting testVm
Error: Failed to run: forklimits limit=memlock:unlimited:unlimited fd=3 fd=4 – /snap/lxd/26093/bin/qemu-system-x86_64 -S -name testVm -uuid a26d8db4-1ee8-4616-822c-1080fc818bf4 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/testVm/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/testVm/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/testVm/qemu.pid -D /var/snap/lxd/common/lxd/logs/testVm/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd: : exit status 1 Try lxc info --show-log local:testVm for more info

When the log is output, it says something like this.

user:~$ lxc info --show-log testVm
Log:
qemu-system-x86_64: warning: Number of hotpluggable cpus requested (383) exceeds the recommended cpus supported by KVM (240) Number of hotpluggable cpus requested (383) exceeds the maximum cpus supported by KVM (288)

What should I do in these cases?

Can you show lscpu from your host machine?

I’m sorry for late reply. And I think some of the facts I knew were wrong and need to be corrected.
I said the number of cpu cores was 32, but as a result of entering the command you mentioned, I confirmed that it was 384. Below is the lscpu output result.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 52 bits physical, 57 bits virtual
CPU(s): 384
On-line CPU(s) list: 0-191,193-383
Off-line CPU(s) list: 192
Thread(s) per core: 1
Core(s) per socket: 96
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9654 96-Core Processor
Stepping: 1
Frequency boost: enabled
CPU MHz: 3699.939
CPU max MHz: 2400.0000
CPU min MHz: 1500.0000
BogoMIPS: 4800.04
Virtualization: AMD-V
L1d cache: 6 MiB
L1i cache: 6 MiB
L2 cache: 192 MiB
L3 cache: 384 MiB
NUMA node0 CPU(s): 0-95,193-287
NUMA node1 CPU(s): 96-191,288-383
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Very interesting. Based on the error, I think we should change our number of max hot-plugable CPUs to 240 in Incus, which will then fix this issue.

So I talked to a friend who works at AMD and he confirmed this was a QEMU bug but that this has been resolved.

I’m going to test it now using the latest Incus packages to see what happens on current QEMU.

Can you show lxc info on the affected system?

I’ve confirmed that newer QEMU has bumped the limit to 1024, so it’s odd that this is still happening.

It looks like QEMU 8.1.x should be fine, QEMU 8.0.x however still has the 288 limit.

Here is the lxc info information. The qemu version is 8.1.1.

config: {}
api_extensions:

  • storage_zfs_remove_snapshots
  • container_host_shutdown_timeout
  • container_stop_priority
  • container_syscall_filtering
  • auth_pki
  • container_last_used_at
  • etag
  • patch
  • usb_devices
  • https_allowed_credentials
  • image_compression_algorithm
  • directory_manipulation
  • container_cpu_time
  • storage_zfs_use_refquota
  • storage_lvm_mount_options
  • network
  • profile_usedby
  • container_push
  • container_exec_recording
  • certificate_update
  • container_exec_signal_handling
  • gpu_devices
  • container_image_properties
  • migration_progress
  • id_map
  • network_firewall_filtering
  • network_routes
  • storage
  • file_delete
  • file_append
  • network_dhcp_expiry
  • storage_lvm_vg_rename
  • storage_lvm_thinpool_rename
  • network_vlan
  • image_create_aliases
  • container_stateless_copy
  • container_only_migration
  • storage_zfs_clone_copy
  • unix_device_rename
  • storage_lvm_use_thinpool
  • storage_rsync_bwlimit
  • network_vxlan_interface
  • storage_btrfs_mount_options
  • entity_description
  • image_force_refresh
  • storage_lvm_lv_resizing
  • id_map_base
  • file_symlinks
  • container_push_target
  • network_vlan_physical
  • storage_images_delete
  • container_edit_metadata
  • container_snapshot_stateful_migration
  • storage_driver_ceph
  • storage_ceph_user_name
  • resource_limits
  • storage_volatile_initial_source
  • storage_ceph_force_osd_reuse
  • storage_block_filesystem_btrfs
  • resources
  • kernel_limits
  • storage_api_volume_rename
  • macaroon_authentication
  • network_sriov
  • console
  • restrict_devlxd
  • migration_pre_copy
  • infiniband
  • maas_network
  • devlxd_events
  • proxy
  • network_dhcp_gateway
  • file_get_symlink
  • network_leases
  • unix_device_hotplug
  • storage_api_local_volume_handling
  • operation_description
  • clustering
  • event_lifecycle
  • storage_api_remote_volume_handling
  • nvidia_runtime
  • container_mount_propagation
  • container_backup
  • devlxd_images
  • container_local_cross_pool_handling
  • proxy_unix
  • proxy_udp
  • clustering_join
  • proxy_tcp_udp_multi_port_handling
  • network_state
  • proxy_unix_dac_properties
  • container_protection_delete
  • unix_priv_drop
  • pprof_http
  • proxy_haproxy_protocol
  • network_hwaddr
  • proxy_nat
  • network_nat_order
  • container_full
  • candid_authentication
  • backup_compression
  • candid_config
  • nvidia_runtime_config
  • storage_api_volume_snapshots
  • storage_unmapped
  • projects
  • candid_config_key
  • network_vxlan_ttl
  • container_incremental_copy
  • usb_optional_vendorid
  • snapshot_scheduling
  • snapshot_schedule_aliases
  • container_copy_project
  • clustering_server_address
  • clustering_image_replication
  • container_protection_shift
  • snapshot_expiry
  • container_backup_override_pool
  • snapshot_expiry_creation
  • network_leases_location
  • resources_cpu_socket
  • resources_gpu
  • resources_numa
  • kernel_features
  • id_map_current
  • event_location
  • storage_api_remote_volume_snapshots
  • network_nat_address
  • container_nic_routes
  • rbac
  • cluster_internal_copy
  • seccomp_notify
  • lxc_features
  • container_nic_ipvlan
  • network_vlan_sriov
  • storage_cephfs
  • container_nic_ipfilter
  • resources_v2
  • container_exec_user_group_cwd
  • container_syscall_intercept
  • container_disk_shift
  • storage_shifted
  • resources_infiniband
  • daemon_storage
  • instances
  • image_types
  • resources_disk_sata
  • clustering_roles
  • images_expiry
  • resources_network_firmware
  • backup_compression_algorithm
  • ceph_data_pool_name
  • container_syscall_intercept_mount
  • compression_squashfs
  • container_raw_mount
  • container_nic_routed
  • container_syscall_intercept_mount_fuse
  • container_disk_ceph
  • virtual-machines
  • image_profiles
  • clustering_architecture
  • resources_disk_id
  • storage_lvm_stripes
  • vm_boot_priority
  • unix_hotplug_devices
  • api_filtering
  • instance_nic_network
  • clustering_sizing
  • firewall_driver
  • projects_limits
  • container_syscall_intercept_hugetlbfs
  • limits_hugepages
  • container_nic_routed_gateway
  • projects_restrictions
  • custom_volume_snapshot_expiry
  • volume_snapshot_scheduling
  • trust_ca_certificates
  • snapshot_disk_usage
  • clustering_edit_roles
  • container_nic_routed_host_address
  • container_nic_ipvlan_gateway
  • resources_usb_pci
  • resources_cpu_threads_numa
  • resources_cpu_core_die
  • api_os
  • container_nic_routed_host_table
  • container_nic_ipvlan_host_table
  • container_nic_ipvlan_mode
  • resources_system
  • images_push_relay
  • network_dns_search
  • container_nic_routed_limits
  • instance_nic_bridged_vlan
  • network_state_bond_bridge
  • usedby_consistency
  • custom_block_volumes
  • clustering_failure_domains
  • resources_gpu_mdev
  • console_vga_type
  • projects_limits_disk
  • network_type_macvlan
  • network_type_sriov
  • container_syscall_intercept_bpf_devices
  • network_type_ovn
  • projects_networks
  • projects_networks_restricted_uplinks
  • custom_volume_backup
  • backup_override_name
  • storage_rsync_compression
  • network_type_physical
  • network_ovn_external_subnets
  • network_ovn_nat
  • network_ovn_external_routes_remove
  • tpm_device_type
  • storage_zfs_clone_copy_rebase
  • gpu_mdev
  • resources_pci_iommu
  • resources_network_usb
  • resources_disk_address
  • network_physical_ovn_ingress_mode
  • network_ovn_dhcp
  • network_physical_routes_anycast
  • projects_limits_instances
  • network_state_vlan
  • instance_nic_bridged_port_isolation
  • instance_bulk_state_change
  • network_gvrp
  • instance_pool_move
  • gpu_sriov
  • pci_device_type
  • storage_volume_state
  • network_acl
  • migration_stateful
  • disk_state_quota
  • storage_ceph_features
  • projects_compression
  • projects_images_remote_cache_expiry
  • certificate_project
  • network_ovn_acl
  • projects_images_auto_update
  • projects_restricted_cluster_target
  • images_default_architecture
  • network_ovn_acl_defaults
  • gpu_mig
  • project_usage
  • network_bridge_acl
  • warnings
  • projects_restricted_backups_and_snapshots
  • clustering_join_token
  • clustering_description
  • server_trusted_proxy
  • clustering_update_cert
  • storage_api_project
  • server_instance_driver_operational
  • server_supported_storage_drivers
  • event_lifecycle_requestor_address
  • resources_gpu_usb
  • clustering_evacuation
  • network_ovn_nat_address
  • network_bgp
  • network_forward
  • custom_volume_refresh
  • network_counters_errors_dropped
  • metrics
  • image_source_project
  • clustering_config
  • network_peer
  • linux_sysctl
  • network_dns
  • ovn_nic_acceleration
  • certificate_self_renewal
  • instance_project_move
  • storage_volume_project_move
  • cloud_init
  • network_dns_nat
  • database_leader
  • instance_all_projects
  • clustering_groups
  • ceph_rbd_du
  • instance_get_full
  • qemu_metrics
  • gpu_mig_uuid
  • event_project
  • clustering_evacuation_live
  • instance_allow_inconsistent_copy
  • network_state_ovn
  • storage_volume_api_filtering
  • image_restrictions
  • storage_zfs_export
  • network_dns_records
  • storage_zfs_reserve_space
  • network_acl_log
  • storage_zfs_blocksize
  • metrics_cpu_seconds
  • instance_snapshot_never
  • certificate_token
  • instance_nic_routed_neighbor_probe
  • event_hub
  • agent_nic_config
  • projects_restricted_intercept
  • metrics_authentication
  • images_target_project
  • cluster_migration_inconsistent_copy
  • cluster_ovn_chassis
  • container_syscall_intercept_sched_setscheduler
  • storage_lvm_thinpool_metadata_size
  • storage_volume_state_total
  • instance_file_head
  • instances_nic_host_name
  • image_copy_profile
  • container_syscall_intercept_sysinfo
  • clustering_evacuation_mode
  • resources_pci_vpd
  • qemu_raw_conf
  • storage_cephfs_fscache
  • network_load_balancer
  • vsock_api
  • instance_ready_state
  • network_bgp_holdtime
  • storage_volumes_all_projects
  • metrics_memory_oom_total
  • storage_buckets
  • storage_buckets_create_credentials
  • metrics_cpu_effective_total
  • projects_networks_restricted_access
  • storage_buckets_local
  • loki
  • acme
  • internal_metrics
  • cluster_join_token_expiry
  • remote_token_expiry
  • init_preseed
  • storage_volumes_created_at
  • cpu_hotplug
  • projects_networks_zones
  • network_txqueuelen
  • cluster_member_state
  • instances_placement_scriptlet
  • storage_pool_source_wipe
  • zfs_block_mode
  • instance_generation_id
  • disk_io_cache
  • amd_sev
  • storage_pool_loop_resize
  • migration_vm_live
  • ovn_nic_nesting
  • oidc
  • network_ovn_l3only
  • ovn_nic_acceleration_vdpa
  • cluster_healing
  • instances_state_total
  • auth_user
  • security_csm
  • instances_rebuild
  • numa_cpu_placement
  • custom_volume_iso
  • network_allocations
  • storage_api_remote_volume_snapshot_copy
  • zfs_delegate
  • operations_get_query_all_projects
  • metadata_configuration
  • syslog_socket
  • event_lifecycle_name_and_project
  • instances_nic_limits_priority
  • disk_initial_volume_configuration
  • operation_wait

api_status: stable
api_version: “1.0”
auth: trusted
public: false
auth_methods:

  • tls

auth_user_name: user_name
auth_user_method: unix
environment:
addresses:
architectures:

  • x86_64
  • i686

certificate: |
-----BEGIN CERTIFICATE-----
{cert}
-----END CERTIFICATE-----
certificate_fingerprint: {fingerprint}
driver: lxc | qemu
driver_version: 5.0.3 | 8.1.1
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: “false”
netnsid_getifaddrs: “true”
seccomp_listener: “true”
seccomp_listener_continue: “true”
shiftfs: “false”
uevent_injection: “true”
unpriv_fscaps: “true”
kernel_version: 5.4.0-165-generic
lxc_features:
cgroup2: “true”
core_scheduling: “true”
devpts_fd: “true”
idmapped_mounts_v2: “true”
mount_injection_file: “true”
network_gateway_device_route: “true”
network_ipvlan: “true”
network_l2proxy: “true”
network_phys_macvlan_mtu: “true”
network_veth_router: “true”
pidfd: “true”
seccomp_allow_deny_syntax: “true”
seccomp_notify: “true”
seccomp_proxy_send_notify_fd: “true”
os_name: Ubuntu
os_version: “20.04”
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: server_name
server_pid: 796560
server_version: “5.19”
storage: zfs | dir
storage_version: 0.8.3-1ubuntu12.15 | 1
storage_supported_drivers:

  • name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.41.0
    remote: false
  • name: zfs
    version: 0.8.3-1ubuntu12.15
    remote: false
  • name: btrfs
    version: 5.16.2
    remote: false
  • name: ceph
    version: 17.2.6
    remote: true
  • name: cephfs
    version: 17.2.6
    remote: true
  • name: cephobject
    version: 17.2.6
    remote: true
  • name: dir
    version: “1”
    remote: false

That’s odd, QEMU 8.1.1 should have the much higher limit in place…

As I don’t own a dual-socket EPYC with an insane number of cores, I’ve had to simulate it using QEMU:

root@ubuntu:~# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  384
  On-line CPU(s) list:   0-383
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 5700G with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  2
    Core(s) per socket:  96
    Socket(s):           2
    Stepping:            0
    BogoMIPS:            7599.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid
                          extd_apicid amd_dcm tsc_known_freq pni pclmulqdq ssse3
                          fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadlin
                         e_timer aes xsave avx f16c rdrand hypervisor lahf_lm cm
                         p_legacy svm cr8_legacy abm sse4a misalignsse 3dnowpref
                         etch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmm
                         call fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpc
                         id rdseed adx smap clflushopt clwb sha_ni xsaveopt xsav
                         ec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt l
                         brv nrip_save tsc_scale vmcb_clean pausefilter pfthresh
                         old v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq
                          rdpid fsrm arch_capabilities
Virtualization features: 
  Virtualization:        AMD-V
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   12 MiB (192 instances)
  L1i:                   12 MiB (192 instances)
  L2:                    96 MiB (192 instances)
  L3:                    256 MiB (16 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-383
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIB
                         P always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

I’ve then asked Incus to create me a VM similar to what you used for LXD:

root@ubuntu:~# incus launch images:ubuntu/22.04 v1 -c limits.cpu=64 -c limits.memory=4GiB --vm
Creating v1
Starting v1
root@ubuntu:~# incus exec v1 -- lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  64
  On-line CPU(s) list:   0
  Off-line CPU(s) list:  1-63
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 5700G with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            7599.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n
                         x mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid
                          extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx1
                         6 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
                         aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
                          svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                         w perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase ts
                         c_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx sm
                         ap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsave
                         s clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save ts
                         c_scale vmcb_clean pausefilter pfthreshold v_vmsave_vml
                         oad vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch
                         _capabilities
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   64 KiB (1 instance)
  L1i:                   64 KiB (1 instance)
  L2:                    512 KiB (1 instance)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Mitigation; safe RET, no microcode
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
                          and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIB
                         P disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@ubuntu:~# 

I’ve also confirmed that it’s possible to create a VM on Incus with more CPUs then the old limit:

root@ubuntu:~# incus launch images:ubuntu/22.04 v2 -c limits.cpu=256 -c limits.memory=4GiB --vm
Creating v2
Starting v2

So the newer version of QEMU we ship with Incus seems to be behaving just fine with those larger systems. I’m unsure why the one you’ve got on your system is somehow failing to handle this.