Operating system: Ubuntu 20.04 LTS
LXC version: 4.0.4
I have encountered lately many issues were containers got stuck and the only reliable solution to fix it seems to be rebooting the instance. Of course this is not ideal, so i must find out a more solid fix that doesnt require rebooting.
In our latest episode you can see from syslogs:
Nov 26 03:50:32 lxd-bla-bla-bla lxd.daemon[5911]: t=2021-11-26T03:50:32+0000 lvl=eror msg=“Failed to retrieve network information via netlink” container=xlN-blablacontainer pid=111846
Nov 26 03:50:32 lxd-bla-bla-bla lxd.daemon[5911]: t=2021-11-26T03:50:32+0000 lvl=eror msg=“Error calling 'lxd forknet” container=xlN-blablacontainer err=“Failed to run: /snap/lxd/current/bin/lxd forknet info – 111846 3: Failed setns to container network namespace: No such file or directory” pid=111846
Trying to restart the container just adds the process stuck as well. From lxc operation show ID:
id: 5aa7064d-0069-428c-9be4-251b7e4c6203
class: websocket
description: Executing command
created_at: 2021-11-26T04:07:41.579498862Z
updated_at: 2021-11-26T04:07:41.579498862Z
status: Running
status_code: 103
resources:
containers:
- /1.0/containers/xlN-blablacontainer
instances: - /1.0/instances/xlN-blablacontainer
metadata:
command: - /bin/sh
- -c
- /bin/sh -c ‘/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1637899659.1603563-95885-117991861407417/AnsiballZ_command.py
&& sleep 0’
environment:
HOME: /root
LANG: C.UTF-8
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
USER: root
fds:
“0”: 607757cad07b578e2029838a394baa9a74fba6d82e74c519a9195a52450db1ed
“1”: b43ac4ad70c5fc0827ee708b6c358c3fb682d1e72072e4bf0073141a7389644e
“2”: 736d28d6f25ca25971995527e33db40caa299a6aa831f0d1b8efd7c19c9dbaab
control: 36dae859adf69b51f9df614243bcb68b6cb42812638208c753a5276f0bdeb24b
interactive: false
may_cancel: false
err: “”
location: none
and container is shown as running in the deamon:
Name: xlN-blablacontainer
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/09/30 20:30 UTC
Status: Running
Type: container
Profiles: default
Pid: 16835
Ips:
eth0: inet 10.0.12.61 vethecb9cc0a
eth0: inet6 fe80::216:3eff:fec7:5350 vethecb9cc0a
lo: inet 127.0.0.1
lo: inet6 ::1
Any idea what might be causing this, or how this can get fixed?