Stateful snapshot not working: Can't dump nested uts namespace / Can't make utsns id

Hello,

I don’t get stateful snapshot working. Live migration isn’t working, too.

My error:

root@lxd-host-2:/home/lxd-host-user# lxc snapshot lxd-container-1 --stateful
Error: snapshot dump failed
(00.030588) Error (criu/namespaces.c:420): Can't dump nested uts namespace for 1220
(00.030591) Error (criu/namespaces.c:679): Can't make utsns id
(00.032533) Error (criu/util.c:618): exited, status=1
(00.033727) Error (criu/util.c:618): exited, status=1
(00.034166) Error (criu/cr-dump.c:1764): Dumping FAILED.

Does anyone have an idea?

More informations (click to show):

lxc list
+-----------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
|      NAME       |  STATE  |         IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+-----------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
| lxd-container-1 | RUNNING | 10.52.121.226 (eth0) | fd42:1b2d:6647:fc9b:216:3eff:feef:7b34 (eth0) | CONTAINER | 0         |
+-----------------+---------+----------------------+-----------------------------------------------+-----------+-----------+
lxc version
Client version: 4.3
Server version: 4.3
snap get lxd criu.enable
true
lxc info
config:
  cluster.https_address: 192.168.178.38:8443
  core.https_address: 192.168.178.38:8443
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 192.168.178.38:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    [censored]
    -----END CERTIFICATE-----
  certificate_fingerprint: [censored]
  driver: lxc
  driver_version: 4.0.3
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-42-generic
  lxc_features:
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_notify: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: lxd-host-2
  server_pid: 930
  server_version: "4.3"
  storage: btrfs
  storage_version: 4.15.1
ps aux|grep 1220
1000000     1220  0.0  0.2  21588  4840 ?        Ss   13:19   0:00 /lib/systemd/systemd-udevd

Greetings

Hativ

The error is fairly clear, the stateful snapshot can’t be made because of CRIU’s inability to process nested containers of any kind.

The process with PID 1220 in your container is somehow using a nested UTS namespace (that’s what’s used to provide an alternate hostname to a process) and so cannot be serialized.

Thank you!

The process with PID 1220 (one the host, but from the container) is /lib/systemd/systemd-udevd. It’s a completely fresh Ubuntu 20.04 in the container (and on the host too). If even something that simple isn’t working, the stateful snapshot feature is relatively useless in the real world. Is the developing not that far yet?

Yeah, CRIU (the external tool that handles this) has been in a losing battle pretty much from the start. On the one hand you have the Linux kernel with tens of thousands of developers adding new features every day, on the other, you have less than 10 people which need to be able to extract every last bit of kernel state for all of those features, write that down to a file and restore it on the target.

CRIU is actively used in production, most notably by Google, but the places where it’s used tend to be very simple containers (as in, notably, no systemd) and workloads that are designed with checkpoint/restore in mind and so do not make any use of features which CRIU is unable to handle.

For LXD, we do have a couple of annoying limitations in CRIU (other than that you’ve hit) which we intend to resolve in the next 6 months or so, to make it easier for some folks to use it in production, but we don’t really expect CRIU to improve significantly in its ability to checkpoint arbitrary complex workloads.

Thank you for the explanation. I probably use something simpler productively without systemd. What about common software that runs in the container? Does PHP, Apache, nginx, HaProxy and such software cause problems with CRIU, do you know that?