Hanging/Slow, when running Docker inside of LXC

Hi,

Some time ago I set up docker inside of a LXC container Running Docker inside of a LXD container - YouTube.

I’ve had issues with “hanging” when connecting to my LXC container with SSH, and I’ve noticed docker compose takes almost 20 seconds to bring up a container. During that time it doesn’t appear to be doing anything.

My server boots off a Samsung SSD 970 PRO 512GB NVMe, which is formatted with a btrfs filesystem. The load it has is rather small, it has 64GB of RAM, and AMD EPYC 7262 running Archlinux.

I started by configuring my system according to Server settings for a LXD production setup - LXD documentation.

How I set this up:

  1. Initialize LXD

    lxd init
    
  2. Launch Archlinux container:

    lxc launch images:archlinux docker-container
    
  3. Join to VLAN 50

    lxc config device add docker-container eth0 nic nictype=bridged parent=bridge0 hwaddr=00:xx:xx:xx:xx:xx vlan=50
    
  4. Create btrfs subvolume

    lxc storage create docker btrfs source=/var/lib/lxd/storage-pools/docker
    
  5. Create new storage volume:

    lxc storage volume create docker docker-container
    
  6. Set the storage volume for /var/lib/docker

    lxc config device add docker-container docker disk pool=docker source=docker-container path=/var/lib/docker
    

    My storage looks like:

    ❯ sudo lxc storage show default
    config:
      source: /var/lib/lxd/storage-pools/default
      volatile.initial_source: /var/lib/lxd/storage-pools/default
    description: ""
    name: default
    driver: btrfs
    used_by:
    - /1.0/instances/docker-container
    - /1.0/profiles/default
    status: Created
    locations:
    - none
    
    ❯ sudo lxc storage show docker
    config:
      source: /var/lib/lxd/storage-pools/docker
      volatile.initial_source: /var/lib/lxd/storage-pools/docker
    description: ""
    name: docker
    driver: btrfs
    used_by:
    - /1.0/storage-pools/docker/volumes/custom/docker-container
    status: Created
    locations:
    - none
    
  7. Disable COW for some extra performance

    chattr +C /var/lib/lxd/storage-pools/docker
    
  8. Set security options for docker.

    lxc config set docker-container security.nesting=true \
                                    security.syscalls.intercept.mknod=true \
                                    security.syscalls.intercept.setxattr=true
    
  9. Add some bind mounts

Is lxc exec into your container quick?

And from inside the container, can you ping externally, e.g. ping linuxcontainers.org and does it resolve DNS quickly?

It’s also worth to try tracing (using strace -f -p $pid_of_dockerd) your docker daemon process during docker-compose up operation (which is slow, if I’ve understood your correctly). Possibly you will see some slow syscalls.

No, it doesn’t.

OK so looks like a DNS issue then. Can you ping an IP OK (e.g. 8.8.8.8) though?

Yeah, seems that lookups take a while, longer than the host anyway… I’ll have to see what @amikhalitsyn suggested.

I think that DNS was good catch. So, it’s better to concentrate on the reasons why it works so bad.

I would suggest running sudo tcpdump -nn -i lxdbr0 and see what is happening on the network when your making a dns request. Could be something to do with ipv6.

What I have noticed is that once a network is sustained, it seems to work okay. I’m thinking that. When attempting to bring up the container I don’t see anything particularly unusual in the tpcdump log, in fact it’s pretty quiet. During that 20 seconds the container took to come up there wasn’t anything but:

STP 802.1s, Rapid STP, CIST Flags [Learn, Forward], length 102

I disabled IPv6 on my host and guest “docker-container” with the net.ipv6.conf.all.disable_ipv6 sysctls, and LinkLocalAddressing=ipv4 networkd options. Verified that IPv6 addresses weren’t being given to the container now in “lxc list”.

Containers still seem to take exactly 20.7s to come online.

What i have found is when those containers are starting, if I try to do something like sudo ps aux |grep docker inside the container it doesn’t return any output until the containers either start, or it times out.

If the compose file has too many containers in it, then it won’t bring them all up, and i get an error like:

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: unable to start unit “docker-eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9.scope” (properties [{Name:Description Value:“libcontainer container eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9”} {Name:Slice Value:“system.slice”} {Name:Delegate Value:true} {Name:PIDs Value:@au [3898]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Timeout waiting for systemd to create docker-eb0b359ce044155417ddeda78fbd628a32aac2327de2ec419a968128f02c3dd9.scope: unknown

For some reason I have this feeling that it has something to do with the btrfs storage volume /var/lib/docker.

I’ve never disabled COW, fwiw. Also, I set this config for nested docker: raw.lxc: keyctl=true

Can you repeat with just steps 1,2 and 3, and then confirm whether you are seeing the slow DNS resolution issue? I tend to find that removing variants from a problem can help identify where the issue is being introduced. We previously saw that it seems to be a DNS resolution issue (or at the very least cant rule it out atm), so lets proceed with that and see where the problem is introduced.