Incus VM guest shutdown does not complete

I’m encountering a strange problem with an Incus VM, and so far I haven’t been able to dig up any similar issues here or on GitHub. Sometimes, but not always, when a restart is initiated from within the VM, it doesn’t complete the shutdown until I “observe” it from Incus.

After a lot of searching and poking around, I haven’t made any progress so I was hoping I might find some assistance here. I’m happy to open a ticket if that’s the preferred path.

Any guidance is much appreciated! :slight_smile:

Environment Summary

  • Host: Alpine 3.21.3 6.12.25-0-lts SMP PREEMPT_DYNAMIC 2025-04-25 12:52:49 x86_64
  • Host filesystem: ZFS
  • Incus 6.12
  • VM: MicroOS (BRTFS)

Steps to reproduce

  1. SSH into VM and run shutdown -h now or reboot now.
  2. When the problem occurs, the SSH session will just sit at the prompt and the shutdown will never complete.
  3. The instant I run incus info vm on the host, the shutdown completes and the VM reboots.

Additional information

  • This also occurs during unattended reboots for system updates, so it’s probably not related to open SSH sessions.
  • Once I’ve forced Incus to “pay attention” to the VM, subsequent shutdowns are successful for a while. I haven’t figured out the exact time things need to sit before it gets distracted again, but it seems that 15 minutes is usually enough
  • On the VM, journalctl and dmesg output are identical between unsuccessful and successful shutdowns.
  • When Incus is in a bad state, incus monitor --pretty does not show any output during a shutdown until I run incus info vm.
incus info
config:
  acme.agree_tos: "true"
  acme.challenge: DNS-01
  acme.domain: <domain>
  acme.email: <email>
  acme.provider: <provider>
  acme.provider.environment: |-
    <config>
  core.https_address: :<port>
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
- storage_initial_owner
- storage_live_migration
- instance_console_screenshot
- image_import_alias
- authorization_scriptlet
- console_force
- network_ovn_state_addresses
- network_bridge_acl_devices
- instance_debug_memory
- init_preseed_storage_volumes
- init_preseed_profile_project
- instance_nic_routed_host_address
- instance_smbios11
- api_filtering_extended
- acme_dns01
- security_iommu
- network_ipv4_dhcp_routes
- network_state_ovn_ls
- network_dns_nameservers
- acme_http01_port
- network_ovn_ipv4_dhcp_expiry
- instance_state_cpu_time
- network_io_bus
- disk_io_bus_usb
- storage_driver_linstor
- instance_oci_entrypoint
- network_address_set
- server_logging
- network_forward_snat
- memory_hotplug
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: <user>
auth_user_method: unix
environment:
  addresses:
  - <real nic ipv4>:<port>
  - '[<real nic ipv6>]:<port>'
  - '[<real nic ipv6>]:<port>'
  - <macvlan ipv4>:<port>
  - '[<macvlan ipv6>]:<port>'
  - '[<macvlan ipv6>]:<port>'
  - <bridge ipv4>:<port>
  - '[<bridge ipv6>]:<port>'
  architectures:
  - x86_64
  - i686
  certificate: |
    <cert>
  certificate_fingerprint: <fingerprint>
  driver: qemu | lxc
  driver_version: 9.1.2 | 6.0.2
  firewall: xtables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.12.25-0-lts
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Alpine Linux
  os_version: 3.21.3
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: <hostname>
  server_pid: 3497
  server_version: "6.12"
  storage: zfs
  storage_version: 2.2.7-1
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: zfs
    version: 2.2.7-1
    remote: false
incus config show vm
architecture: x86_64
config:
  boot.autostart.priority: "100"
  cloud-init.user-data: |+
    #cloud-config
    <snip>
  limits.memory: 10GiB
  raw.idmap: |-
    uid 1000 1000
    gid 1000 1001
  security.secureboot: "false"
  volatile.cloud-init.instance-id: <id>
  volatile.eth0.host_name: mac63ef0cae
  volatile.eth0.last_state.created: "false"
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 08a8ac0c-93ec-4112-8265-321840d0c634
  volatile.uuid.generation: ee012678-c292-46b9-901d-00a20a4151af
  volatile.vm.definition: pc-q35-9.1
  volatile.vsock_id: "553373809"
devices:
  eth0:
    hwaddr: <macvlan MAC (to get static DHCP assignment from router)>
    nictype: macvlan
    parent: eth0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
- macvlan
- cloud-init-microos
stateful: false
description: ""
incus profile show default
config: {}
description: Default Incus profile
devices:
  root:
    path: /
    pool: default
    type: disk
name: default
used_by:
- /1.0/instances/vm
project: default
incus profile show macvlan
config: {}
description: ""
devices:
  eth0:
    nictype: macvlan
    parent: eth0
    type: nic
name: macvlan
used_by:
- /1.0/instances/vm
project: default
incus profile show cloud-init-microos
config:
  boot.autorestart: "true"
  boot.autostart: "true"
  cloud-init.vendor-data: |-
    #cloud-config
    timezone: <Time/Zone>
    resize_rootfs: false  # MicroOS doesn't like it when you try this.
description: ""
devices:
  cloud-init:
    source: cloud-init:config
    type: disk
name: cloud-init-microos
used_by:
- /1.0/instances/vm
project: default

As noted, journalctl and dmesg on the VM look the same for both unsuccessful and successful shutdowns (execept for the gap between when the system thinks it should go down and when it actually does, of course). But here are the logs for completeness.

journalctl during shutdown on vm
<<<<< All the normal shutdown stuff >>>>>
May 05 09:58:21 vm systemd[1]: Reached target Late Shutdown Services.
May 05 09:58:21 vm systemd[1]: systemd-reboot.service: Deactivated successfully.
May 05 09:58:21 vm systemd[1]: Finished System Reboot.
May 05 09:58:21 vm systemd[1]: Reached target System Reboot.
May 05 09:58:21 vm systemd[1]: Shutting down.
<<<<< System should go down here but shutdown doesn’t actually complete until ~40s later when I poke Incus. >>>>>
May 05 09:58:31 vm kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 10082 jiffies old.
May 05 09:59:02 vm kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 40194 jiffies old.
dmesg during shutdown on vm
[ 1301.834571] [     T27] audit: type=1305 audit(1746453501.427:414): op=set audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditctl_t:s0 res=1
[ 1301.885302] [    T631] systemd-journald[631]: Received client request to relinquish /var/log/journal/08a8ac0c93ec41128265321840d0c634 access.
<<<<< System should go down here but shutdown doesn’t actually complete until ~40s later when I poke Incus. >>>>>
[ 1312.299916] [     T14] rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 10082 jiffies old.
[ 1342.411039] [     T14] rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 40194 jiffies old.
[ 1393.618660] [      T1] watchdog: watchdog0: watchdog did not stop!
[ 1393.621222] [    T631] systemd-journald[631]: Failed to send WATCHDOG=1 notification message: Connection refused
[ 1393.636090] [      T1] systemd-shutdown[1]: Using hardware watchdog 'iTCO_wdt', version 2, device /dev/watchdog0
[ 1393.636177] [      T1] systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
[ 1393.641027] [      T1] systemd-shutdown[1]: Syncing filesystems and block devices.