Error: Failed to retrieve PID of executing child process

I am seeing this error quite frequently these days when doing ‘lxc exec container bash’. After a reboot this error goes away and the container shell is accessible.

I suspect this started with 4.12. Is there any fix for this?

Thanks

Which distro and kernel are you on?

I think that this:

fixes it.

1 Like

I am using Archlinux using Kernel 5.10.22-2-lts and systemd 247.3-1

The patch above should fix this!

Hi brauner,

I had the same issue Error: Failed to retrieve PID of executing child process when i try to exec the container.

Is there any steps or documentation to get the latest LXD patch above?

Thank you

I think @Foxboron would need to backport rexec: don't close stderr by brauner · Pull Request #3715 · lxc/lxc · GitHub
to LXC on Archlinux.

Hi,

Thank you for the reply. Sorry, i forgot to mention the OS distro early.

I’m using Redhat 7.8 with kernel Linux 4.18.0-80.1.2.el8_0.x86_64 and LXD version 4.15

Thank you

Ah. Are you using the snap to run LXD on Red Hat?

Yes, i’m using the snap.

@stgraber any input on this?

@brauner I’m not seeing this rexec: don't close stderr in stable-4.0

You’re likely looking for this:

commit 9c75153c5c2b2c38a7461a2b35267f68e4471c4c
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Mon Apr 12 17:50:39 2021 +0200

    Revert "rexec: mark all fds as close-on-exec if possible"

    This reverts commit 531d36ad009325b74a105d9d6956e320f37b2937.

    Callers might want to explicilty inhert file descriptors so we can't
    close them behind their back when we exec.

    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

since it was a revert.

And

commit 4f9e3f46d0db063d775013065cbd792242681ba6
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date:   Thu Mar 18 12:11:32 2021 +0100

    rexec: don't close stderr

    Otherwise we'll fail to attach to containers later on.

    Fixes: https://discuss.linuxcontainers.org/t/error-failed-to-retrieve-pid-of-executing-child-process
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

Both are in stable-4.0.

@brauner

stgraber@castiana:~/data/code/lxc/lxc (lxc/stable-4.0)$ git log lxc-4.0.9.. | grep rexec
stgraber@castiana:~/data/code/lxc/lxc (lxc/stable-4.0)$ 

So this suggests that those changes are already in the snap then.

Hi @stgraber

So that means, to get the latest patch then i must update the version LXD through snap? *please cmiiw,
Thank you

Please put your daemon in debug mode:

snap set lxd daemon.debug=true
snap set lxd daemon.verbose=true
systemctl restart snap.lxd.daemon

and then try to attach to a failing container and please get me the log file from:

/var/snap/lxd/common/lxd/logs/$CONTAINER/lxc.log

Thanks in advance. Here is the log:

lxc 20210708105733.751 DEBUG commands - commands.c:lxc_cmd_get_limit_cgroup2_fd:1800 - Function not implemented - Failed to receive file descriptor for “get_cgroup2_fd”
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at memory and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the memory controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at hugetlb and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the hugetlb controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at pids and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the pids controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at cpuset and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the cpuset controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at blkio and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the blkio controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at freezer and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the freezer controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at devices and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the devices controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at rdma and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the rdma controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at net_cls,net_prio and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the net_cls controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the net_prio controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at perf_event and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the perf_event controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at cpu,cpuacct and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the cpu controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the cpuacct controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:446 - Adding cgroup hierarchy mounted at systemd and base cgroup (null)
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:cgroup_hierarchy_add:449 - The hierarchy contains the name=systemd controller
lxc 20210708105733.751 TRACE cgfsng - cgroups/cgfsng.c:__initialize_cgroups:3139 - No such file or directory - Unified cgroup not mounted
lxc 20210708105733.751 TRACE cgroup - cgroups/cgroup.c:cgroup_init:42 - Initialized cgroup driver cgfsng
lxc 20210708105733.751 TRACE cgroup - cgroups/cgroup.c:cgroup_init:45 - Legacy cgroup layout
lxc 20210708105733.751 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_limit_cgroup” received response
lxc 20210708105733.751 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 60 for command “get_limit_cgroup”
lxc 20210708105733.751 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_cgroup” received response
lxc 20210708105733.751 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 60 for command “get_cgroup”
lxc 20210708110215.929 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_state” received response
lxc 20210708110215.929 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 31 for command “get_state”
lxc 20210708110215.929 DEBUG commands - commands.c:lxc_cmd_get_state:1055 - Container “edc-db” is in “RUNNING” state
lxc 20210708110215.929 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_state” received response
lxc 20210708110215.929 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 31 for command “get_state”
lxc 20210708110215.929 DEBUG commands - commands.c:lxc_cmd_get_state:1055 - Container “edc-db” is in “RUNNING” state
lxc 20210708110215.934 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_init_pid” received response
lxc 20210708110215.934 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 31 for command “get_init_pid”
lxc 20210708110215.942 DEBUG commands - commands.c:lxc_cmd_rsp_recv_fds:149 - Command “get_devpts_fd” received response
lxc 20210708110215.942 DEBUG commands - commands.c:lxc_cmd_rsp_recv:252 - Finished processing “get_devpts_fd” with file descriptor -9
lxc 20210708110215.942 TRACE commands - commands.c:lxc_cmd:518 - Opened new command socket connection fd 31 for command “get_devpts_fd”
lxc 20210708110215.942 DEBUG commands - commands.c:lxc_cmd_get_devpts_fd:679 - Function not implemented - Failed to receive file descriptor for “get_devpts_fd”
lxc edc-db 20210708110215.985 WARN attach - attach.c:get_attach_context:463 - No security context received
lxc edc-db 20210708110215.985 ERROR attach - attach.c:__prepare_namespaces_nsfd:587 - No such file or directory - Failed to preserve mnt namespace of 186596
lxc edc-db 20210708110215.985 ERROR attach - attach.c:lxc_attach:1444 - Failed to get namespace file descriptors

I sent a PR that fixes an underlying issue where we’re not surfacing the errno value currectly when falling back to legacy openat() which should be the case for a 4.18 kernel since openat2() (our preferred system call) only came into existence with kernel 5.6 and I find it unlikely that RH backported openat2() to 4.18:

However, I’m not sure that’s the full story since the error LXC is reporting is that the entry for the mount namespace doesn’t exist under the process’ proc which is weird since they can’t be turned off with a kernel config option or similar.

Can you show me the output of findmnt on your system, please?