Lxc exec gives Failed to retrieve PID

arif-ali · June 23, 2021, 1:55pm

Hi all,

I have been using LXD for a while now, and not had any major issues. Recently (a few months now), I have been having the below error

Error: Failed to retrieve PID of executing child process

but haven’t spent time collecting logs or debug until now (rebooting my host or lxc stop/start has normally resolved it for me). I have looked through the forums, and seems there are a few people who had similar issues, but doesn’t seem like any of the resolutions seem to help with mine.

lxc exec debug

https://paste.ubuntu.com/p/nsP7vGVsZy/

log from /var/snap/lxd/common/lxd/logs/ldap/lxd.log

https://paste.ubuntu.com/p/ndsgqGZ3Jb/

log from /var/snap/lxd/common/lxd/logs/lxd.log

https://paste.ubuntu.com/p/Zmv98kHCfP/

ubuntu@pi01:~$ lxc exec ldap bash
Error: Failed to retrieve PID of executing child process

Below are the versions of lxc/lxd for info

ubuntu@pi01:~$ lxc version
Client version: 4.15
Server version: 4.15
ubuntu@pi01:~$ lxd version
4.15
ubuntu@pi01:~$ sudo snap list | grep lxd
lxd     4.15                    20811  latest/stable  canonical*  -
ubuntu@pi01:~$ cat /etc/issue
Ubuntu 21.04 \n \l
ubuntu@pi01:~$ uname -a
Linux pi01 5.11.0-1009-raspi #10-Ubuntu SMP PREEMPT Fri May 14 14:49:24 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Any thoughts on this would be appreciated?

stgraber · June 23, 2021, 4:15pm

That’d be one for @brauner but it sounds like it may be cgroup related.

The raspberry pi kernel comes with the cpuset and memory cgroups disabled by default. I don’t know if it’s related, but maybe enabling them (done through a kernel boot parameter) would work around this issue?

arif-ali · June 23, 2021, 4:32pm

Interesting…

I do have the following on the kernel command line, which are typically on all my rpi4s.

cgroup_enable=memory cgroup_memory=1

Maybe I need to add cgroup_enable=cpuset as well, I’ll add that tonight, and see if that could potentially solve it.

arif-ali · June 24, 2021, 5:13pm

no luck I’m afraid, (but that addition has fixed my other issue with my MAAS in LXC)

at 15:38 localtime, there as a termination signal, not sure why, and how, the first pastebin shows that. Maybe this is the time when it is causing this potentially (I was doing an lxc exec randomly during the day with success, until now)

https://paste.ubuntu.com/p/4t5Bqgkmy9/

Below is the log for the lxc container, but as you can see from the logs, nothing really from when the LXD daemon was terminated and then restarted

https://paste.ubuntu.com/p/t4j44wVpZX/

After some checks in journal log, I find that snap refresh is run, and hence lxd is stopped. More from the journal here

Maybe, I’ll give it a reboot, and stop the refresh, and see if that may be the cause

brauner · June 29, 2021, 7:41am

Can you provide me with the trace log of the container?

snap set lxd daemon.debug=true
snap set lxd daemon.verbose=true
systemctl restart snap.lxd.daemon.service

and then paste me the container’s log, please?

arif-ali · July 1, 2021, 4:43pm

I set those flags a few days ago, but since the last auto refresh, I’ve not seen the issue.

I think the cpuset enablement may have gone a long way in helping to move it forward, but below is the logs from a specific container as requested.

https://paste.ubuntu.com/p/hnTfgGGy2Z/

I tried manual refreshes, but didn’t get the same logs from before, as that just restarted all the containers. So it was weird to see, and may have been a one-off, and probably going to ignore until I see it again.