LXD 5.9 crashes on Centos 7

wrkilu · January 8, 2023, 8:36pm

Hi,
Mother server - newest Centos 7 (7.9.2009)
LXD container - also newest Centos 7
lxd package version on mother server - 5.9-9879096 installed by Snap.

And after some time container crashes little bit. I mean I can ping from it any IP on internet so network looks is working. But e.g. df command or others return following errors:

df: ‘/proc/cpuinfo’: Transport endpoint is not connected
df: ‘/proc/diskstats’: Transport endpoint is not connected
df: ‘/proc/loadavg’: Transport endpoint is not connected
df: ‘/proc/meminfo’: Transport endpoint is not connected
df: ‘/proc/slabinfo’: Transport endpoint is not connected
df: ‘/proc/stat’: Transport endpoint is not connected
df: ‘/proc/swaps’: Transport endpoint is not connected
df: ‘/proc/uptime’: Transport endpoint is not connected
df: ‘/sys/devices/system/cpu/online’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/blkio’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/cpu’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/cpuset’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/devices’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/freezer’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/hugetlb’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/memory’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/net_cls’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/perf_event’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/pids’: Transport endpoint is not connected
df: ‘/sys/fs/cgroup/systemd’: Transport endpoint is not connected

And on mother server there is error in dmesg:
[309274.787698] lxcfs[29285]: segfault at 0 ip 00007f96c28d03ce sp 00007f96c15dcc38 error 4 in libc-2.31.so[7f96c2848000+178000]

Does anybody have idea how to solve that issue ?
Thank you.

stgraber · January 8, 2023, 11:22pm

Hmm, yeah, that’s a LXCFS crash.
Does that happen repeatedly?

To recover from this you’d need to:

systemctl reload snap.lxd.daemon
lxc restart --all

Which will both reload LXD (and restart LXCFS) and then restart all the containers on the system.

amikhalitsyn · January 9, 2023, 1:22pm

Looks related lxcfs crash on lxd 5.9 rev 24164 · Issue #573 · lxc/lxcfs · GitHub

@wrkilu you need to setup your core_pattern the same way as I’ve described here:

github.com/lxc/lxcfs

lxcfs crash on lxd 5.9 rev 24164

opened 01:45PM - 21 Dec 22 UTC

zrav

Due to https://discuss.linuxcontainers.org/t/number-of-cpus-reported-by-proc-sta…t-fluctuates-causing-issues/15780 we are running LXD 5.9 revision 24164. After running a few days lxcfs crashed: ``` Dec 21 05:33:41 kernel: show_signal_msg: 14 callbacks suppressed Dec 21 05:33:41 kernel: lxcfs[3219179]: segfault at 0 ip 00007f8084afdf81 sp 00007f8084a2e780 error 6 in libc-2.31.so[7f8084a94000+178000] Dec 21 05:33:41 kernel: Code: 00 00 4c 89 ef 4c 89 4c 24 08 e8 3a 68 00 00 48 89 e9 4c 89 e2 48 89 ee 48 8d 05 2a d2 15 00 4c 89 ef 48 89 84 24 e8 00 00 00 <c6> 45 00 00 e8 06 7e 00 00 89 d9 4c 89 fa 4c 89 f6 4c 89 ef e8 c6 ``` This is an Ubuntu 22.04.1 running kernel 5.15.0-56-generic on an AMD Epyc 7702P (128 thread) system with 512GB RAM. ``` lxc info config: core.https_address: '[::]:8443' core.trust_password: true images.auto_update_interval: "0" api_extensions: - storage_zfs_remove_snapshots - container_host_shutdown_timeout - container_stop_priority - container_syscall_filtering - auth_pki - container_last_used_at - etag - patch - usb_devices - https_allowed_credentials - image_compression_algorithm - directory_manipulation - container_cpu_time - storage_zfs_use_refquota - storage_lvm_mount_options - network - profile_usedby - container_push - container_exec_recording - certificate_update - container_exec_signal_handling - gpu_devices - container_image_properties - migration_progress - id_map - network_firewall_filtering - network_routes - storage - file_delete - file_append - network_dhcp_expiry - storage_lvm_vg_rename - storage_lvm_thinpool_rename - network_vlan - image_create_aliases - container_stateless_copy - container_only_migration - storage_zfs_clone_copy - unix_device_rename - storage_lvm_use_thinpool - storage_rsync_bwlimit - network_vxlan_interface - storage_btrfs_mount_options - entity_description - image_force_refresh - storage_lvm_lv_resizing - id_map_base - file_symlinks - container_push_target - network_vlan_physical - storage_images_delete - container_edit_metadata - container_snapshot_stateful_migration - storage_driver_ceph - storage_ceph_user_name - resource_limits - storage_volatile_initial_source - storage_ceph_force_osd_reuse - storage_block_filesystem_btrfs - resources - kernel_limits - storage_api_volume_rename - macaroon_authentication - network_sriov - console - restrict_devlxd - migration_pre_copy - infiniband - maas_network - devlxd_events - proxy - network_dhcp_gateway - file_get_symlink - network_leases - unix_device_hotplug - storage_api_local_volume_handling - operation_description - clustering - event_lifecycle - storage_api_remote_volume_handling - nvidia_runtime - container_mount_propagation - container_backup - devlxd_images - container_local_cross_pool_handling - proxy_unix - proxy_udp - clustering_join - proxy_tcp_udp_multi_port_handling - network_state - proxy_unix_dac_properties - container_protection_delete - unix_priv_drop - pprof_http - proxy_haproxy_protocol - network_hwaddr - proxy_nat - network_nat_order - container_full - candid_authentication - backup_compression - candid_config - nvidia_runtime_config - storage_api_volume_snapshots - storage_unmapped - projects - candid_config_key - network_vxlan_ttl - container_incremental_copy - usb_optional_vendorid - snapshot_scheduling - snapshot_schedule_aliases - container_copy_project - clustering_server_address - clustering_image_replication - container_protection_shift - snapshot_expiry - container_backup_override_pool - snapshot_expiry_creation - network_leases_location - resources_cpu_socket - resources_gpu - resources_numa - kernel_features - id_map_current - event_location - storage_api_remote_volume_snapshots - network_nat_address - container_nic_routes - rbac - cluster_internal_copy - seccomp_notify - lxc_features - container_nic_ipvlan - network_vlan_sriov - storage_cephfs - container_nic_ipfilter - resources_v2 - container_exec_user_group_cwd - container_syscall_intercept - container_disk_shift - storage_shifted - resources_infiniband - daemon_storage - instances - image_types - resources_disk_sata - clustering_roles - images_expiry - resources_network_firmware - backup_compression_algorithm - ceph_data_pool_name - container_syscall_intercept_mount - compression_squashfs - container_raw_mount - container_nic_routed - container_syscall_intercept_mount_fuse - container_disk_ceph - virtual-machines - image_profiles - clustering_architecture - resources_disk_id - storage_lvm_stripes - vm_boot_priority - unix_hotplug_devices - api_filtering - instance_nic_network - clustering_sizing - firewall_driver - projects_limits - container_syscall_intercept_hugetlbfs - limits_hugepages - container_nic_routed_gateway - projects_restrictions - custom_volume_snapshot_expiry - volume_snapshot_scheduling - trust_ca_certificates - snapshot_disk_usage - clustering_edit_roles - container_nic_routed_host_address - container_nic_ipvlan_gateway - resources_usb_pci - resources_cpu_threads_numa - resources_cpu_core_die - api_os - container_nic_routed_host_table - container_nic_ipvlan_host_table - container_nic_ipvlan_mode - resources_system - images_push_relay - network_dns_search - container_nic_routed_limits - instance_nic_bridged_vlan - network_state_bond_bridge - usedby_consistency - custom_block_volumes - clustering_failure_domains - resources_gpu_mdev - console_vga_type - projects_limits_disk - network_type_macvlan - network_type_sriov - container_syscall_intercept_bpf_devices - network_type_ovn - projects_networks - projects_networks_restricted_uplinks - custom_volume_backup - backup_override_name - storage_rsync_compression - network_type_physical - network_ovn_external_subnets - network_ovn_nat - network_ovn_external_routes_remove - tpm_device_type - storage_zfs_clone_copy_rebase - gpu_mdev - resources_pci_iommu - resources_network_usb - resources_disk_address - network_physical_ovn_ingress_mode - network_ovn_dhcp - network_physical_routes_anycast - projects_limits_instances - network_state_vlan - instance_nic_bridged_port_isolation - instance_bulk_state_change - network_gvrp - instance_pool_move - gpu_sriov - pci_device_type - storage_volume_state - network_acl - migration_stateful - disk_state_quota - storage_ceph_features - projects_compression - projects_images_remote_cache_expiry - certificate_project - network_ovn_acl - projects_images_auto_update - projects_restricted_cluster_target - images_default_architecture - network_ovn_acl_defaults - gpu_mig - project_usage - network_bridge_acl - warnings - projects_restricted_backups_and_snapshots - clustering_join_token - clustering_description - server_trusted_proxy - clustering_update_cert - storage_api_project - server_instance_driver_operational - server_supported_storage_drivers - event_lifecycle_requestor_address - resources_gpu_usb - clustering_evacuation - network_ovn_nat_address - network_bgp - network_forward - custom_volume_refresh - network_counters_errors_dropped - metrics - image_source_project - clustering_config - network_peer - linux_sysctl - network_dns - ovn_nic_acceleration - certificate_self_renewal - instance_project_move - storage_volume_project_move - cloud_init - network_dns_nat - database_leader - instance_all_projects - clustering_groups - ceph_rbd_du - instance_get_full - qemu_metrics - gpu_mig_uuid - event_project - clustering_evacuation_live - instance_allow_inconsistent_copy - network_state_ovn - storage_volume_api_filtering - image_restrictions - storage_zfs_export - network_dns_records - storage_zfs_reserve_space - network_acl_log - storage_zfs_blocksize - metrics_cpu_seconds - instance_snapshot_never - certificate_token - instance_nic_routed_neighbor_probe - event_hub - agent_nic_config - projects_restricted_intercept - metrics_authentication - images_target_project - cluster_migration_inconsistent_copy - cluster_ovn_chassis - container_syscall_intercept_sched_setscheduler - storage_lvm_thinpool_metadata_size - storage_volume_state_total - instance_file_head - instances_nic_host_name - image_copy_profile - container_syscall_intercept_sysinfo - clustering_evacuation_mode - resources_pci_vpd - qemu_raw_conf - storage_cephfs_fscache - network_load_balancer - vsock_api - instance_ready_state - network_bgp_holdtime - storage_volumes_all_projects - metrics_memory_oom_total - storage_buckets - storage_buckets_create_credentials - metrics_cpu_effective_total - projects_networks_restricted_access - storage_buckets_local - loki - acme - internal_metrics - cluster_join_token_expiry - remote_token_expiry - init_preseed - storage_volumes_created_at - cpu_hotplug - projects_networks_zones api_status: stable api_version: "1.0" auth: trusted public: false auth_methods: - tls environment: addresses: - ...:8443 architectures: - x86_64 - i686 certificate: ... certificate_fingerprint: ... driver: qemu | lxc driver_version: 7.1.0 | 5.0.1 firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" shiftfs: "false" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.15.0-56-generic lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "22.04" project: default server: lxd server_clustered: false server_event_mode: full-mesh server_name: server.domain.com server_pid: 1657650 server_version: "5.9" storage: zfs storage_version: 2.1.4-0ubuntu0.1 storage_supported_drivers: - name: zfs version: 2.1.4-0ubuntu0.1 remote: false - name: btrfs version: 5.4.1 remote: false - name: ceph version: 15.2.17 remote: true - name: cephfs version: 15.2.17 remote: true - name: cephobject version: 15.2.17 remote: true - name: dir version: "1" remote: false - name: lvm version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0 remote: false ``` As requested, further information: ``` service apport status ● apport.service - LSB: automatic crash report generation Loaded: loaded (/etc/init.d/apport; generated) Active: active (exited) since Sat 2022-12-17 08:45:41 CET; 4 days ago Docs: man:systemd-sysv-generator(8) CPU: 27ms Dec 17 08:45:40 server.domain.com systemd[1]: Starting LSB: automatic crash report generation... Dec 17 08:45:41 server.domain.com apport[3908]: * Starting automatic crash report generation: apport Dec 17 08:45:41 server.domain.com apport[3908]: ...done. Dec 17 08:45:41 server.domain.com systemd[1]: Started LSB: automatic crash report generation. ``` ``` cat /proc/sys/kernel/core_pattern |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E ``` ``` ls -la /var/crash total 8 drwxrwsrwt 2 root whoopsie 4096 Nov 15 06:25 . drwxr-xr-x 15 root root 4096 Nov 13 20:56 .. ``` ``` ls -la /var/lib/apport/coredump/ total 8 drwxr-xr-x 2 root root 4096 Oct 27 2021 . drwxr-xr-x 3 root root 4096 Oct 27 2021 .. ``` ``` cat /var/log/apport.log ERROR: apport (pid 3551728) Wed Dec 21 05:33:41 2022: host pid 5646 crashed in a separate mount namespace, ignoring ``` Unfortunately no dumps are available and the lxd log shows nothing of interest during the time of crash: ``` journalctl -u snap.lxd.daemon Dec 17 08:49:16 server.domain.com lxd.daemon[5419]: => LXD is ready Dec 17 09:09:32 server.domain.com lxd.daemon[5660]: time="2022-12-17T09:09:32+01:00" level=warning msg="Detected poll(POLLNVAL) event: exiting" Dec 21 12:22:29 server.domain.com systemd[1]: Stopping Service for snap application lxd.daemon... Dec 21 12:22:29 server.domain.com lxd.daemon[1626376]: => Stop reason is: host shutdown ``` Please tell me if I should modify any configuration to catch the next possible crash.

To catch core. BTW, you can check your current /proc/sys/kernel/core_pattern, if we are lucky then possibly you already have coredump collected in /var/crash/...

amikhalitsyn · January 9, 2023, 2:09pm

@wrkilu couldn’t you also check your kernel logs for line with:

kernel: Code

It should follow the line with “segfault” info. Please, post it too.

wrkilu · January 9, 2023, 2:33pm

@stgraber
I’ve done “lxc stop test1 -f”, then “systemctl reload snap.lxd.daemon”
Container has started again and worked about 3h. And then problem got back…

@amikhalitsyn
There aren’t other important lines in dmesg around this Segfault:
[424483.934921] lxdbr0: port 1(veth4aec45e0) entered forwarding state
[431674.370712] lxcfs[1237]: segfault at 0 ip 00007f89b23d83ce sp 00007f89b09a1c38 error 4 in libc-2.31.so[7f89b2350000+178000]
[436810.186476] logflags DROP IN=enp4s0 OUT=enp4s0 MAC=54:04:a6:f1:77:83:30:b6:4f:d8:00:d2:08:00

Also I have to mention that this container (I don’t have others yet), has RAM limit 1GB and after “systemctl reload snap.lxd.daemon” it started with maximum mother RAM size (16GB). Then i rebooted him from inside and he started with 1GB. And then as I wrote after 3h he got these errors with lxcfs.

wrkilu · January 9, 2023, 2:38pm

On mother:
cat /proc/sys/kernel/core_pattern
core

/var/crash is empty

stgraber · January 9, 2023, 2:57pm

The memory reporting behavior you described is correct for the crash you’re experiencing. You need to reload LXD to have LXCFS restored at which point restarting a container will have it use the new LXCFS instance and so report the memory consumption correctly. Restarting the container prior to restarting LXCFS will leave it seeing the memory information of the host system.

Now it’d be nice if we could indeed grab a core out of this thing since you seem to have it in a state that’s mostly reproducible…

Any idea what may be happening inside of your container at the time of the LXCFS crash?
If we can figure that out, then we could probably grab both a strace and gdb output of the running lxcfs just as it crashes which should then give us what we need to sort this out.

amikhalitsyn · January 9, 2023, 3:13pm

please, change it by

echo '|/bin/sh -c $@ -- eval exec cat > /var/crash/core-%e.%p' > /proc/sys/kernel/core_pattern

and try to repeat actions which led to the crash

wrkilu · January 9, 2023, 3:19pm

@stgraber
It does nothing yet. It is clean OS.

@amikhalitsyn
ok, I’ve typed that command on mother.
Lets check to next crash and hopefully we’ll have crash dump.

wrkilu · January 10, 2023, 7:31pm

There is no crash still. I’ll write when it occurs.

wrkilu · January 10, 2023, 11:04pm

Ok I have crash dump:
https://sdata.net.pl/files/core-lxcfs.5271.tar.gz
Please check it…

wrkilu · January 11, 2023, 12:39am

And I have to add that kernel on mother server is 3.10. Isn’t too old ? Maybe this is the reason ?

amikhalitsyn · January 11, 2023, 9:07am

no-no, userspace should work without crashes on any supported kernel. 3.10 (rhel7) is not ideal, but okay

amikhalitsyn · January 11, 2023, 10:36am

The same issue as reported yesterday Handle NULL in releasedir by deleriux · Pull Request #575 · lxc/lxcfs · GitHub

(gdb) bt
#0  __strcmp_sse2_unaligned ()
    at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31
#1  0x00005577569c4508 in lxcfs_releasedir (path=0x0, fi=0x7f9242d6ac80)
    at ../src/src/lxcfs.c:774
#2  0x00007f92441122b7 in ?? ()
#3  0x0000000000000007 in ?? ()
#4  0x0000000000000000 in ?? ()

(gdb) p *(struct fuse_file_info*)0x7f9242d6ac80
$1 = {flags = 0, writepage = 0, direct_io = 0, keep_cache = 0, flush = 0, 
  nonseekable = 0, flock_release = 0, cache_readdir = 0, padding = 0, padding2 = 0, 
  fh = 140266048610592, lock_owner = 0, poll_events = 0}

(gdb) p/x ((struct fuse_file_info*)0x7f9242d6ac80)->fh
$4 = 0x7f923c006120

(gdb) x/8xg 0x7f923c006120
0x7f923c006120:	0x00007f923c005610	0x00007f923c004460
0x7f923c006130:	0x0000000000000000	0x6770757800000000
0x7f923c006140:	0x0000000000000000	0x00007f9200000000
0x7f923c006150:	0x0000000000000040	0x00000000000000a5

(gdb) x/s 0x00007f923c005610
0x7f923c005610:	"systemd"

(gdb) x/s 0x00007f923c004460
0x7f923c004460:	"lxc.payload.complexupgrade/system.slice/systemd-sysusers.service"

Thanks, @wrkilu for providing us with core dump! I think it makes sense to continue catching core dumps for the LXCFS process. I’ve a suspicion that we have 2 different bugs, cause here lxcfs crash on lxd 5.9 rev 24164 · Issue #573 · lxc/lxcfs · GitHub
we crashed on write (!), but in your case, we crashed on read.

wrkilu · January 11, 2023, 11:52am

No problem man. I do thank you for LXD! not you me.
Still I think LXD is awesome and many thanks to all of you maintainers!

Should I attach second crash when it occur ?

amikhalitsyn · January 11, 2023, 11:55am

Should I attach second crash when it occur ?

Yep, every piece of information may be valuable for debugging. I think we will release a new hotfix version of LXCFS soon, just to address this particular crash that you’ve caught already. I’ll notify you.

wrkilu · January 15, 2023, 6:58pm

Still there wasn’t next crash on my server.

Other question: when you release hot fix ? Or… is there a way to downgrade LXC in Snap to older good version ?

amikhalitsyn · January 15, 2023, 10:43pm

I think fix will be released this week. I can say that it makes no sense to downgrade, because this is not a degradation. It’s interesting question why you’ve started facing this issue (it can be related to our last fix with turning on direct IO mode for lxcfs, but in fact this is the only right behavior).