I’m encountering a strange problem with an Incus VM, and so far I haven’t been able to dig up any similar issues here or on GitHub. Sometimes, but not always, when a restart is initiated from within the VM, it doesn’t complete the shutdown until I “observe” it from Incus.
After a lot of searching and poking around, I haven’t made any progress so I was hoping I might find some assistance here. I’m happy to open a ticket if that’s the preferred path.
Any guidance is much appreciated!
Environment Summary
- Host: Alpine 3.21.3 6.12.25-0-lts SMP PREEMPT_DYNAMIC 2025-04-25 12:52:49 x86_64
- Host filesystem: ZFS
- Incus 6.12
- VM: MicroOS (BRTFS)
Steps to reproduce
- SSH into VM and run
shutdown -h now
orreboot now
. - When the problem occurs, the SSH session will just sit at the prompt and the shutdown will never complete.
- The instant I run
incus info vm
on the host, the shutdown completes and the VM reboots.
Additional information
- This also occurs during unattended reboots for system updates, so it’s probably not related to open SSH sessions.
- Once I’ve forced Incus to “pay attention” to the VM, subsequent shutdowns are successful for a while. I haven’t figured out the exact time things need to sit before it gets distracted again, but it seems that 15 minutes is usually enough
- On the VM,
journalctl
anddmesg
output are identical between unsuccessful and successful shutdowns. - When Incus is in a bad state,
incus monitor --pretty
does not show any output during a shutdown until I runincus info vm
.
incus info
config:
acme.agree_tos: "true"
acme.challenge: DNS-01
acme.domain: <domain>
acme.email: <email>
acme.provider: <provider>
acme.provider.environment: |-
<config>
core.https_address: :<port>
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
- storage_initial_owner
- storage_live_migration
- instance_console_screenshot
- image_import_alias
- authorization_scriptlet
- console_force
- network_ovn_state_addresses
- network_bridge_acl_devices
- instance_debug_memory
- init_preseed_storage_volumes
- init_preseed_profile_project
- instance_nic_routed_host_address
- instance_smbios11
- api_filtering_extended
- acme_dns01
- security_iommu
- network_ipv4_dhcp_routes
- network_state_ovn_ls
- network_dns_nameservers
- acme_http01_port
- network_ovn_ipv4_dhcp_expiry
- instance_state_cpu_time
- network_io_bus
- disk_io_bus_usb
- storage_driver_linstor
- instance_oci_entrypoint
- network_address_set
- server_logging
- network_forward_snat
- memory_hotplug
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: <user>
auth_user_method: unix
environment:
addresses:
- <real nic ipv4>:<port>
- '[<real nic ipv6>]:<port>'
- '[<real nic ipv6>]:<port>'
- <macvlan ipv4>:<port>
- '[<macvlan ipv6>]:<port>'
- '[<macvlan ipv6>]:<port>'
- <bridge ipv4>:<port>
- '[<bridge ipv6>]:<port>'
architectures:
- x86_64
- i686
certificate: |
<cert>
certificate_fingerprint: <fingerprint>
driver: qemu | lxc
driver_version: 9.1.2 | 6.0.2
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
uevent_injection: "true"
unpriv_binfmt: "true"
unpriv_fscaps: "true"
kernel_version: 6.12.25-0-lts
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Alpine Linux
os_version: 3.21.3
project: default
server: incus
server_clustered: false
server_event_mode: full-mesh
server_name: <hostname>
server_pid: 3497
server_version: "6.12"
storage: zfs
storage_version: 2.2.7-1
storage_supported_drivers:
- name: dir
version: "1"
remote: false
- name: zfs
version: 2.2.7-1
remote: false
incus config show vm
architecture: x86_64
config:
boot.autostart.priority: "100"
cloud-init.user-data: |+
#cloud-config
<snip>
limits.memory: 10GiB
raw.idmap: |-
uid 1000 1000
gid 1000 1001
security.secureboot: "false"
volatile.cloud-init.instance-id: <id>
volatile.eth0.host_name: mac63ef0cae
volatile.eth0.last_state.created: "false"
volatile.last_state.power: RUNNING
volatile.last_state.ready: "false"
volatile.uuid: 08a8ac0c-93ec-4112-8265-321840d0c634
volatile.uuid.generation: ee012678-c292-46b9-901d-00a20a4151af
volatile.vm.definition: pc-q35-9.1
volatile.vsock_id: "553373809"
devices:
eth0:
hwaddr: <macvlan MAC (to get static DHCP assignment from router)>
nictype: macvlan
parent: eth0
type: nic
root:
path: /
pool: default
type: disk
ephemeral: false
profiles:
- default
- macvlan
- cloud-init-microos
stateful: false
description: ""
incus profile show default
config: {}
description: Default Incus profile
devices:
root:
path: /
pool: default
type: disk
name: default
used_by:
- /1.0/instances/vm
project: default
incus profile show macvlan
config: {}
description: ""
devices:
eth0:
nictype: macvlan
parent: eth0
type: nic
name: macvlan
used_by:
- /1.0/instances/vm
project: default
incus profile show cloud-init-microos
config:
boot.autorestart: "true"
boot.autostart: "true"
cloud-init.vendor-data: |-
#cloud-config
timezone: <Time/Zone>
resize_rootfs: false # MicroOS doesn't like it when you try this.
description: ""
devices:
cloud-init:
source: cloud-init:config
type: disk
name: cloud-init-microos
used_by:
- /1.0/instances/vm
project: default
As noted, journalctl
and dmesg
on the VM look the same for both unsuccessful and successful shutdowns (execept for the gap between when the system thinks it should go down and when it actually does, of course). But here are the logs for completeness.
journalctl
during shutdown on vm
<<<<< All the normal shutdown stuff >>>>>
May 05 09:58:21 vm systemd[1]: Reached target Late Shutdown Services.
May 05 09:58:21 vm systemd[1]: systemd-reboot.service: Deactivated successfully.
May 05 09:58:21 vm systemd[1]: Finished System Reboot.
May 05 09:58:21 vm systemd[1]: Reached target System Reboot.
May 05 09:58:21 vm systemd[1]: Shutting down.
<<<<< System should go down here but shutdown doesn’t actually complete until ~40s later when I poke Incus. >>>>>
May 05 09:58:31 vm kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 10082 jiffies old.
May 05 09:59:02 vm kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 40194 jiffies old.
dmesg
during shutdown on vm
[ 1301.834571] [ T27] audit: type=1305 audit(1746453501.427:414): op=set audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditctl_t:s0 res=1
[ 1301.885302] [ T631] systemd-journald[631]: Received client request to relinquish /var/log/journal/08a8ac0c93ec41128265321840d0c634 access.
<<<<< System should go down here but shutdown doesn’t actually complete until ~40s later when I poke Incus. >>>>>
[ 1312.299916] [ T14] rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 10082 jiffies old.
[ 1342.411039] [ T14] rcu_tasks_wait_gp: rcu_tasks grace period number 33 (since boot) is 40194 jiffies old.
[ 1393.618660] [ T1] watchdog: watchdog0: watchdog did not stop!
[ 1393.621222] [ T631] systemd-journald[631]: Failed to send WATCHDOG=1 notification message: Connection refused
[ 1393.636090] [ T1] systemd-shutdown[1]: Using hardware watchdog 'iTCO_wdt', version 2, device /dev/watchdog0
[ 1393.636177] [ T1] systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
[ 1393.641027] [ T1] systemd-shutdown[1]: Syncing filesystems and block devices.