Containers and Host machine terminate the connection and further go on 'Connection Timeout'

Dear Community,

I am currently experiencing the following issue when running long, compute-heavy experiments on LXDv5.2 ubuntu20.04 containers. At some points (that I could not reproduce so far), the container terminates the connection, and both the host machine and any running containers go on ‘Connection Timeout’ further after. Any pings to the host machine also return ‘Request Timeout’, the solution so far is the physical restart of the host machine.

This issue occurred irregularly over the last year and, frankly, did not bother me before. But today our system administrator unlocked a new achievement on his fitness app while going up and down to the server room and restarting the machine. Currently, all containers and the lxd.daemon are down till the fix arrives - the issue disappeared and the connection to the host machine is stable.

All containers have port forwarding configured with lxc config device add mycontainer proxy80 proxy listen=tcp:0.0.0.0:80 connect=tcp:127.0.0.1:80. The host machine runs ubuntu. LXD was installed as a snap package following the instructions from the website. I’d gladly share any further details about the setup.

Would appreciate any input on this issue.

Best,
sergred

Please can you gather the output of ip a and ip r on the host and inside the container when this happens again (presumably will need to be done from the console of the server before rebooting it).

Also getting that same info before the issue appears so we can compare before & after, would be useful.