Lxc unable to connect to running container

CanuteTheGreat · January 11, 2023, 7:42pm

Hello!

I have a cluster (4 identical machines) running as an lxd cluster. Everything works fine for a day or so and then lxc is unable to connect to the containers. Restarting the containers resolves the issue for another day or so until it happens again. When lxc is unable to connect the containers are still running (I can ssh to them.)

Machine configurations * 4:
Dell PowerEdge R6525
AMD EPYC 7282 16-Core Processor
128GB RAM
Ubuntu 22.04 LTS
lxd/lxc (snap) 5.0.1

cmd@ubuntu-test:~$ ping -c1 www.google.com
PING www.google.com (142.251.215.228) 56(84) bytes of data.
64 bytes from sea09s35-in-f4.1e100.net (142.251.215.228): icmp_seq=1 ttl=118 time=0.932 ms

— www.google.com ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.932/0.932/0.932/0.000 ms
cmd@ubuntu-test:~$
logout
Connection to 240.81.0.157 closed.
cmd@cluster01:~$ lxc restart ubuntu-test
cmd@cluster01:~$ lxc shell ubuntu-test
root@ubuntu-test:~#
logout

I am not sure how to debug this further and would greatly appreciate any help. Thank you!

tomp · January 12, 2023, 8:26am

It could be that something is clearing up /tmp, see https://github.com/lxc/lxd/issues/10771#issuecomment-1212183389

CanuteTheGreat · January 17, 2023, 4:36pm

It looks like this was indeed the case. Thanks!

Solution: At the top of /usr/lib/tmpfiles.d/snapd.conf I added: x /tmp/snap-private-tmp/snap.lxd

This excludes the snap.lxd subdir from being “cleaned” which in turn breaks lxc’s ability to connect to containers.

tomp · January 27, 2023, 2:51pm

Would you be able to do to https://forum.snapcraft.io and report the issue with /usr/lib/tmpfiles.d/snapd.conf there, as it would be great if the snapd team could modify the default configuration to avoid issues with lxc exec.

Thanks