I’ve had issues with “hanging” when connecting to my LXC container with SSH, and I’ve noticed docker compose takes almost 20 seconds to bring up a container. During that time it doesn’t appear to be doing anything.
My server boots off a Samsung SSD 970 PRO 512GB NVMe, which is formatted with a btrfs filesystem. The load it has is rather small, it has 64GB of RAM, and AMD EPYC 7262 running Archlinux.
It’s also worth to try tracing (using strace -f -p $pid_of_dockerd) your docker daemon process during docker-compose up operation (which is slow, if I’ve understood your correctly). Possibly you will see some slow syscalls.
I would suggest running sudo tcpdump -nn -i lxdbr0 and see what is happening on the network when your making a dns request. Could be something to do with ipv6.
What I have noticed is that once a network is sustained, it seems to work okay. I’m thinking that. When attempting to bring up the container I don’t see anything particularly unusual in the tpcdump log, in fact it’s pretty quiet. During that 20 seconds the container took to come up there wasn’t anything but:
I disabled IPv6 on my host and guest “docker-container” with the net.ipv6.conf.all.disable_ipv6 sysctls, and LinkLocalAddressing=ipv4 networkd options. Verified that IPv6 addresses weren’t being given to the container now in “lxc list”.
Containers still seem to take exactly 20.7s to come online.
What i have found is when those containers are starting, if I try to do something like sudo ps aux |grep docker inside the container it doesn’t return any output until the containers either start, or it times out.
If the compose file has too many containers in it, then it won’t bring them all up, and i get an error like:
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: unable to start unit “docker-eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9.scope” (properties [{Name:Description Value:“libcontainer container eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9”} {Name:Slice Value:“system.slice”} {Name:Delegate Value:true} {Name:PIDs Value:@au [3898]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Timeout waiting for systemd to create docker-eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9.scope: unknown
Can you repeat with just steps 1,2 and 3, and then confirm whether you are seeing the slow DNS resolution issue? I tend to find that removing variants from a problem can help identify where the issue is being introduced. We previously saw that it seems to be a DNS resolution issue (or at the very least cant rule it out atm), so lets proceed with that and see where the problem is introduced.