Debian Jessie containers have extremely low performance

distlists · February 24, 2018, 1:02am

I’ve been struggling with a Debian Jessie container for a few days now trying to figure out the reason for the incredibly low performance when executing standard commands like apt-get update or apt-key add some_key.

Debian 7, 9 and 10 containers don’t suffer from this problem. It’s only Debian 8 I’m having problems with. I even tried upgrading a Debian 7 container to Debian 8 and as soon as it is upgraded it starts performing very slowly.

My test system is freshly installed Ubuntu 16.04 (4.13.0-36-generic) running on VirtualBox. LXD is configured to use ZFS, but I also tested with folder based storage and got the same bad results. LXC version is 2.0.11. I didn’t do anything special. Just installed Ubuntu, updated all packages, configured LXD and created one Debian 8 and one Debian 7 container to test with.

I also tried with a nightly snapshot of Ubuntu 18.04 as host, but still no luck. Privileged vs unprivileged didn’t make any difference too. Upgrade of container systemd from backports - again no improvement.

This is what I get on Debian 7

root@deb7:~# time apt-key add some_key
OK

real	0m0.075s
user	0m0.031s
sys	0m0.043s

Compare it to a Debian 8 container

root@deb8:~# time apt-key add some_key
OK

real	0m30.640s
user	0m8.758s
sys	0m21.853s

Strace output from both commands
Debian 7: https://gist.github.com/anonymous/3e3f447206eda40860edb181ddacd30a
Debian 8: https://gist.github.com/anonymous/6402e15f59fe8a8791f42dc26821d82f

It seems to be stuck at read, but I’ve also seen it staying on select for way too long, when doing apt-get update for example. By the way strace doesn’t actually print the full read line all at once. It’s more like 23:54:08 read(4, then 3-4 seconds later "APT_DIR='/'\n", 128) = 12.

Containers were created like this

lxc launch images:debian/7/amd64 deb7
lxc launch images:debian/8/amd64 deb8

Images used:

+-------+--------------+--------+--------------------------------------+--------+----------+-------------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |             DESCRIPTION              |  ARCH  |   SIZE   |          UPLOAD DATE          |
+-------+--------------+--------+--------------------------------------+--------+----------+-------------------------------+
|       | a80feddb74c6 | no     | Debian jessie amd64 (20180223_03:44) | x86_64 | 102.77MB | Feb 23, 2018 at 11:09pm (UTC) |
+-------+--------------+--------+--------------------------------------+--------+----------+-------------------------------+
|       | 1793426a062e | no     | Debian wheezy amd64 (20180222_22:42) | x86_64 | 92.65MB  | Feb 23, 2018 at 11:08pm (UTC) |
+-------+--------------+--------+--------------------------------------+--------+----------+-------------------------------+

I have no idea how to resolve or even to properly troubleshoot this problem. Any help is much appreciated. Has anyone else experienced this? I’m able to reproduce this consistently on freshly installed system.

stgraber · February 24, 2018, 8:42pm

Can you re-run your straces using the “-f” flag so that we can see the subprocesses too?
It looks like apt-key is spawning gpg and it’s gpg that’s then stuck for a while, but we’d need the -f to figure out what that subprocess is doing.

distlists · February 24, 2018, 11:03pm

Below are the traces from Debian 7 and 8 rerun with the -f option. On Debian 8 after waiting for 8 minutes for the process to complete, I’ve decided to terminate it. This has generated 220+ MB of output. I am only including the first 10000 lines. 99% of the messages look like

[pid   216] 21:33:43 getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid   216] 21:33:43 fcntl(373, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)

and these repeat thousands of times per second.

debian7 strace -f https://gist.github.com/anonymous/cb253494101dd80e2e7eb47bd8add986
debian8 strace -f (first 10000 lines): https://gist.github.com/anonymous/0da067d67673c98c59ea2678de381679

distlists · February 25, 2018, 12:23am

I made some progress. This appears to be related to an apt bug. At least I’ve found something similar reported here https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764204. apt tries to close 65536 file descriptors.

I guess this affects the whole family of apt tools (apt-key, apt-cache, …)

What I did next was to strace apt-cache on standard Debian 8 installation (not a container) and it indeed tries to close “only” 65536 file descriptors. However on Debian 8 running inside a container apt-cache tries to close 1048576 file descriptors, which appears to cause the slow down. This seems to be linked to RLIMIT_NOFILE.

Debian 8 (not in container)

[pid   817] getrlimit(RLIMIT_NOFILE, {rlim_cur=64*1024, rlim_max=64*1024}) = 0

Debian 8 in lxc container

[pid   216] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0

I then decreased the maximum open files limit with ulimit -n 65536 and the time it takes to run apt-get add some_key went down from 30 seconds to 2 seconds. This rest of the apt tools run way faster too. 2 seconds is still a lot, but I guess that’s because it’s still trying to close 65536 file descriptors.

distlists · February 25, 2018, 1:15am

Something interesting. The open files limit is set to 1048576 only when I use lxc exec deb8 bash from the host to access the deb8 container. If I instead use ssh to connect to the container, the limit is set to 65536 and the problem isn’t that pronounced.

stgraber · February 25, 2018, 1:53am

Yeah, that part is expected, lxc exec doesn’t go through the PAM stack so pam_limits isn’t triggered. Running something like su root through lxc exec should get you a behavior similar to what you get over ssh.