Container Failed to retrieve network information

combo121 · November 26, 2021, 9:13am

Operating system: Ubuntu 20.04 LTS
LXC version: 4.0.4

I have encountered lately many issues were containers got stuck and the only reliable solution to fix it seems to be rebooting the instance. Of course this is not ideal, so i must find out a more solid fix that doesnt require rebooting.

In our latest episode you can see from syslogs:

Nov 26 03:50:32 lxd-bla-bla-bla lxd.daemon[5911]: t=2021-11-26T03:50:32+0000 lvl=eror msg=“Failed to retrieve network information via netlink” container=xlN-blablacontainer pid=111846
Nov 26 03:50:32 lxd-bla-bla-bla lxd.daemon[5911]: t=2021-11-26T03:50:32+0000 lvl=eror msg=“Error calling 'lxd forknet” container=xlN-blablacontainer err=“Failed to run: /snap/lxd/current/bin/lxd forknet info – 111846 3: Failed setns to container network namespace: No such file or directory” pid=111846

Trying to restart the container just adds the process stuck as well. From lxc operation show ID:

id: 5aa7064d-0069-428c-9be4-251b7e4c6203
class: websocket
description: Executing command
created_at: 2021-11-26T04:07:41.579498862Z
updated_at: 2021-11-26T04:07:41.579498862Z
status: Running
status_code: 103
resources:
containers:

/1.0/containers/xlN-blablacontainer
instances:
/1.0/instances/xlN-blablacontainer
metadata:
command:
/bin/sh
-c
/bin/sh -c ‘/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1637899659.1603563-95885-117991861407417/AnsiballZ_command.py
&& sleep 0’
environment:
HOME: /root
LANG: C.UTF-8
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
USER: root
fds:
“0”: 607757cad07b578e2029838a394baa9a74fba6d82e74c519a9195a52450db1ed
“1”: b43ac4ad70c5fc0827ee708b6c358c3fb682d1e72072e4bf0073141a7389644e
“2”: 736d28d6f25ca25971995527e33db40caa299a6aa831f0d1b8efd7c19c9dbaab
control: 36dae859adf69b51f9df614243bcb68b6cb42812638208c753a5276f0bdeb24b
interactive: false
may_cancel: false
err: “”
location: none

and container is shown as running in the deamon:
Name: xlN-blablacontainer
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/09/30 20:30 UTC
Status: Running
Type: container
Profiles: default
Pid: 16835
Ips:
eth0: inet 10.0.12.61 vethecb9cc0a
eth0: inet6 fe80::216:3eff:fec7:5350 vethecb9cc0a
lo: inet 127.0.0.1
lo: inet6 ::1

Any idea what might be causing this, or how this can get fixed?

tomp · November 26, 2021, 11:39am

Dealing with the restart hanging first, are you using the -f flag on restart to perform a forceful restart?

combo121 · November 26, 2021, 12:17pm

@tomp yeap, tested that well, restarting also get’s hung in the operation list and doesnt proceed neither with the -f flag.

tomp · November 29, 2021, 9:10am

It could be a disk I/O issue, do you see the problem with other containers?

combo121 · November 29, 2021, 12:20pm

No, it happens randomly in different server, none of them have any IO issues.

The most important question for now, is, can this issue be fixed without having to restart the machine?

libinkai · July 6, 2023, 3:01pm

Hi, have you solved the problem? I meet same problem