Can not stop lxc containers with a systemd service on host reboot

bitranox · September 9, 2021, 8:49pm

Required information

Distribution: Ubuntu Hirsute
The output of “lxc info”:

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
  xxx
   -----END CERTIFICATE-----
  certificate_fingerprint: xxx
  driver: lxc | qemu
  driver_version: 4.0.10 | 6.1.0
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.11.0-34-generic
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "21.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: vmsrv3.local.rotek.at
  server_pid: 3174
  server_version: "4.18"
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.13
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.43.0
    remote: false
  - name: zfs
    version: 2.0.2-1ubuntu5.1
    remote: false
  - name: ceph
    version: 15.2.13
    remote: true

Kernel version: 5.11.0.34
LXC version: 4.18
LXD version: 418
SNAPD Version : 2.51.4
SNAP core20
Storage backend in use: dir

Issue description

I can not stop LXC Containers with a service - always some errors like :

interner Fehler, bitte melden: Ausführung von "lxd.lxc" fehlgeschlagen: cannot create transient scope: DBus error "org.freedesktop.systemd1.TransactionIsDestructive": [Transaction for snap.lxd.lxc.8fc29d87-d0d3-478c-8594-50e2760a45da.scope/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in transaction).]

Steps to reproduce

just any service description does not work - nothing seem to work. I tried a lot of different Targets and Services, there seems to be always a conflict like described above.

[Unit]
Description=Rotek Servercontrol
########################################################################
# Manpage : https://www.freedesktop.org/software/systemd/man/index.html
########################################################################

########################################################################
# show units with: sudo systemctl list-units
# show targets with: sudo systemctl list-units --type target
# reload services with : sudo systemctl daemon-reload
########################################################################

After=NetworkManager-wait-online.service
After=network.target
After=networking.service
After=network-online.target
After=media-srv\x2dmain\x2dinstall.mount
After=media-srv\x2dbackupserver.mount
After=postfix.service
After=dnsmasq.service 
After=local-fs.target
After=snap.lxd.daemon.service
After=snap.lxd.daemon.unix.socket

########################################################################
# Wants = A weaker version of Requires=. Units listed in this option 
# will be started if the configuring unit is. However, if the listed 
# units fail to start or cannot be added to the transaction, this has no impact 
# on the validity of the transaction as a whole. 
# This is the recommended way to hook start-up of one unit to the start-up of another unit.
########################################################################
Wants=network.target
Wants=networking.service
Wants=network-online.target
Wants=postfix.service
Wants=dnsmasq.service 
Wants=local-fs.target
Wants=snap.lxd.daemon.service
Wants=snap.lxd.daemon.unix.socket
Wants=media-srv\x2dmain\x2dinstall.mount
Wants=media-srv\x2dbackupserver.mount

########################################################################
# Requires = Configures requirement dependencies on other units. 
# If this unit gets activated, the units listed here will be activated as well. 
# If one of the other units gets deactivated or its activation fails, 
# this unit will be deactivated. 
########################################################################

########################################################################
# Service Section 
########################################################################
[Service]
Type=oneshot
RemainAfterExit=yes

########################################################################
# TimeoutStopSec : Configures the time to wait for stop. If a service is asked to stop, 
# but does not terminate in the specified time, it will be terminated forcibly via SIGTERM
########################################################################
TimeoutStopSec=1200 

# this python script does basically : lxc info <container-name> and depending on the state 
# lxc-stop <containername>
# the error occurs on ExecStop when lxc info <container-name> is issued
ExecStart=-/opt/python3/bin/python3 /opt/rotek-apps/bin/servercontrol/servercontrol.py linux_startup_jobs
ExecStop=-/opt/python3/bin/python3 /opt/rotek-apps/bin/servercontrol/servercontrol.py linux_shutdown_jobs

######################################################################################
# INSTALL
######################################################################################
[Install]
WantedBy=multi-user.target

The question is, how to configure such a service so that lxc info <name> and lxc stop <name> can work on reboot/shutdown ?

stgraber · September 9, 2021, 8:49pm

The systemd error seems to be coming from the fact that your script is being triggered through a stop action on the same systemd target as what snapd uses for the transient units created whenever a lxc command is run. This then causes systemd to fail with that error telling you that you’re trying to run a start action as part of a stop operation and fail.

You may have some luck by moving dependencies or targets around to ensure your service triggers separately from what causes the LXD shutdown. Or you could have your python script use pylxd to bypass that systemd wrapper on the lxc tool (though you may still hit some issues as systemd manages connections on the unix socket and may still block that).

I’m also a bit confused as to what your service is doing. You say it’s looking at lxc info and then doing lxc stop on reboot and shutdown. That’s exactly what LXD itself does out of the box, so I’m not sure why any of this is needed in the first place

bitranox · September 9, 2021, 9:19pm

thats exactly the problem - I did not find a way to get it working, no matter which dependencies I use …

I have some services running on the lxc containers, which need some time to stop (longer then the standard 60 seconds) - so I need to stop those containers gracefully (without kill).

The script worked before - but after upgrading from old snapd it stops to work …

The whole configuration works like that :

If some shutdown or reboot occurs, that service is triggered (ExecStop) and it stops a number of containers (vmware and lcx) gracefully, as well as some other jobs which does not matter here.

So - a minimal working service to accomplish that would be really great.

pylxd is not really an option - I have bad experience with that, hangups etc …
So I prefer to issue shell commands from my python script.
A pure shellscript or service would also help - I guess this will make no difference.

stgraber · September 9, 2021, 9:48pm

DId you try adjusting boot.host_shutdown_timeout on those instances?
It lets you increase the default time (30s) that LXD waits for the instance to shutdown before forcing it.

bitranox · September 9, 2021, 10:11pm

No I did not - also I am not sure if boot.host_shutdown_timeout will help when the server is rebooted - because lxd itself will be stopped after 60s (right ? I did not test that)
my servercontrol.py starts (and stops) the lxc containers (and other vm´s) one-by-one, not to overload the cpu, so I would prefer to do it manually.
Also some services in the containers have some dependencies, so I like to have an influence in which order the containers are started and stopped.

So it should be possible to start/stop lxc containers with a service - and that seems not be possible at the moment. It seems to be a regression, because it worked before.

I know that I can work with boot.* properties, but that is somehow limiting (thanks anyway for the hint)

stgraber · September 9, 2021, 10:19pm

The LXD service has a 10min stop timeout to allow for boot.host_shutdown_timeout to be far higher than 30s/60s.

boot.stop.priority can be used to influence shutdown ordering. All instances at the same priority level will be shutdown concurrently, once those are done, the next set at the next priority level will be processed.

The regression you noticed is most likely because of how snapd is now interacting with systemd, so it’s something that’s outside of our control. You can avoid this by not going through the snap (which is what pylxd would get you) but you’re trying to have a service perform actions while it’s at the same time being told to shutdown by systemd too, so you’re very likely to be racing LXD’s normal shutdown process here.

Maybe you could try two different units, one that starts after LXD (easy enough) and another one which is part of the shutdown.target and so starts when a shutdown is triggered, that may give you more control on ordering during shutdown.

bitranox · September 11, 2021, 8:22am

Stephane, thanks for pointing me in the right direction - I use now the several boot.* options in the container profiles to accomplish my needs.
But it should somehow be possible to get a grip at the shutdown target of snapd/lxd, because there can be situations which can not be solved with onboard configuration options.
That was possible on older versions of snap/lxd, so I would consider it as a regression.
Maybe it would be a solution to create some target on which one can hook up when needing such config, without create race conditions with systemd/snap/lxd.

greetings and huge thanks
Robert / Vienna