Yeah, and my theory had been that this is due to the fact of close_range()
not being available and lxd falling back to close_inherited()
and that there’s a bug in there. But there isn’t it seems.
At least I can reproduce the error on Ubuntu with a 5.3 kernel installed, LXD 4.14 and LXC 4.0.9. I’ve installed OpenSUSE Leap 5.3 now but I can’t seem to find snapd to install. Need to figure that out.
This is what I did:
7 2021-06-07 10:55:20 zypper ar --refresh https://download.opensuse.org/repositories/system:/snappy/openSUSE_Leap_15.3 snappy
8 2021-06-07 10:55:30 zypper --gpg-auto-import-keys ref
9 2021-06-07 10:56:06 zypper dup --from snappy
10 2021-06-07 10:56:18 zypper in snapd
11 2021-06-07 10:56:42 systemctl enable --now snapd
12 2021-06-07 10:56:57 reboot
13 2021-06-07 11:20:39 snap install lxd
FWIW the same message is in the separate openSUSE 15.3 box with lxd installed from the default repos:
surveyor:/var/log/lxd/opensuse # cat forkexec.log
Aborting attach to prevent leaking file descriptors into container
Hm, confused now
leap2:~ # uname -a
Linux leap2 5.3.18-lp152.78-default #1 SMP Tue Jun 1 14:53:21 UTC 2021 (556d823) x86_64 x86_64 x86_64 GNU/Linux
leap2:~ # lxc list
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
leap2:~ # lxc ^C
leap2:~ # lxc launch images:alpine/edge alp1
Creating alp1
Starting alp1
leap2:~ # lxc shell alp1
alp1:~#
driver: qemu | lxc
driver_version: 5.2.0 | 4.0.9
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "false"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.3.18-lp152.78-default
lxc_features:
cgroup2: "true"
devpts_fd: "true"
idmapped_mounts_v2: "false"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: openSUSE Leap
os_version: "15.3"
project: default
server: lxd
server_clustered: false
server_name: leap2
server_pid: 4316
server_version: "4.15"
storage: btrfs
storage_version: 4.15.1
Does an upgrade to 4.15 fix the issue for you?
lxd has been updated to 4.15.
I did notice that your particular kernel is a different build than the default from the current leap download page, however.
hydra:~ # uname -a
Linux hydra 5.3.18-57-default #1 SMP Wed Apr 28 10:54:41 UTC 2021 (ba3c2e9) x86_64 x86_64 x86_64 GNU/Linux
hydra:~ # lxc launch images:alpine/edge alp1
Creating alp1
Starting alp1
hydra:~ # lxc shell alp1
Error: Failed to retrieve PID of executing child process
hydra:~ # lxc info | egrep ‘version|^\s*os’
api_version: “1.0”
driver_version: 4.0.9 | 5.2.0
kernel_version: 5.3.18-57-default
os_name: openSUSE Leap
os_version: “15.3”
server_version: “4.15”
storage_version: 4.15.1
I confirm there is a problem on Leap 15.3
[admin@naunas] ~
❯ lxc exec atlas -- /bin/bash
Error: Failed to retrieve PID of executing child process
[admin@naunas] ~
❯ uname -a
Linux naunas 5.3.18-57-default #1 SMP Wed Apr 28 10:54:41 UTC 2021 (ba3c2e9) x86_64 x86_64 x86_64 GNU/Linux
[admin@naunas] ~
❯ rpm -qa | grep lxd
lxd-4.15-lp153.88.1.x86_64
lxd-bash-completion-4.15-lp153.88.1.noarch
[admin@naunas] ~
❯ lxc info --show-log atlas
Name: atlas
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/12/13 06:58 UTC
Status: Running
Type: container
Profiles: default
Pid: 5056
Ips:
lo: inet 127.0.0.1
lo: inet6 ::1
Resources:
Processes: 18
CPU usage:
CPU usage (in seconds): 34
Memory usage:
Memory (current): 71.42MB
Memory (peak): 249.84MB
Network usage:
eth0:
Bytes received: 1.26kB
Bytes sent: 90B
Packets received: 21
Packets sent: 1
lo:
Bytes received: 152.00kB
Bytes sent: 152.00kB
Packets received: 3040
Packets sent: 3040
Log:
lxc atlas 20210609055540.561 ERROR utils - utils.c:lxc_can_use_pidfd:1793 - Недопустимый аргумент - Kernel does not support waiting on processes through pidfds
lxc atlas 20210609055540.584 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1296 - Нет такого файла или каталога - Failed to fchownat(43, memory.oom.group, 500000001, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
[admin@naunas] ~
❯ lxc info
config:
core.https_address: '[::]:8443'
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- XXX.XXX.XXX.XXX:8443
architectures:
- x86_64
- i686
certificate: | XXX
certificate_fingerprint: XXX
driver: qemu | lxc
driver_version: 5.2.0 | 4.0.9
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "false"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.3.18-57-default
lxc_features:
cgroup2: "true"
devpts_fd: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: openSUSE Leap
os_version: "15.3"
project: default
server: lxd
server_clustered: false
server_name: naunas
server_pid: 1944
server_version: "4.15"
storage: dir
storage_version: "1"
but on thumbleweed lxd works normal
[werwolf@power] ~
❯ lxc exec atlas -- bash
[root@atlas ~]# exit
exit
[werwolf@power] ~
❯ uname -a
Linux power 5.12.9-1-default #1 SMP Thu Jun 3 07:44:58 UTC 2021 (f17eb01) x86_64 x86_64 x86_64 GNU/Linux
[werwolf@power] ~
❯ rpm -qa | grep lxd
lxd-4.14-2.1.x86_64
lxd-bash-completion-4.14-2.1.noarch
clarification: on tumbleweed, the exec works normally, but for some reason the auto-assignment of ip addresses to the container does not work
Yeah you need to use the package in the distro – it’s in the default repos. I think you could in principle use snapd (there is a package for it IIRC) but I’ve never tried it.
I haven’t yet updated my server to Leap 15.3 so I haven’t run into this particular issue yet, but it works on Leap 15.2 and Tumbleweed (with the same package) so I agree this points towards a kernel version or some other system package version issue.
@Joshua_Newman Sorry for not responding to the BZ you opened – I did take a look at it when you first opened it but I’ve been chasing down other LXD issues on openSUSE that I didn’t get around to looking into this one deeply.
I’m having the same issue on a fresh install of openSUSE Leap 15.3. Looking forward to a fix or workaround!
Well ok. This is an unexpected twist… It seems that the OpenSUSE kernel has my close_range()
syscall backported but not CLOSE_RANGE_UNSHARE
(which was in the first version. So that’s why this is all messed up.
See
please.
lxc exec
works with the fix now, thank you!!
Containers I create are not getting DHCP leases, but this might be an error on my part; not sure yet.
If you’re running the snap package it could well be related to one of these:
I’m using the packages in the openSUSE official repositories, except for the lxd package which I built myself to include the lxc exec
fix, but I’ll read those just in case anything might apply.
Check dnsmasq is running, and if so then check your firewall isn’t preventing DHCP requests.
Ahhh it was the firewall! Definitely did not expect that.
Thank you very much!
I’ve backported the patch to the openSUSE packages, it should appear in Leap in a little while (in the meantime, you can use the devel packages in Virtualization:containers).