Apparent overhead in IO using LXC 3.0.1

Hello,

First, not sure if LXC is the right Topic. I am using LXD, but since its related to performance, in my mind, it’s a LXC question …

From what I observed, using LXC add some overhead in I/Os, leading to higher cpu wait and load on machine. I was wondering if I was doing something bad, or if it was expected. A small plan for my post:

  1. My test case
  2. Control machine
  3. LXC configuration
  4. Results
  5. Weird configurations
  6. Questions

My test case
In my case, I test LXC with a NoSQL database (LSM tree under the hood, therefore, lots of I/Os). I wanted to see whether performance were equivalent using 2 containers on 1 machine (each of the containers running one node of the database), or directly deploying the 2 nodes on the machine (better use of our hardware when deploying 2 nodes).

+--------------+-------------+
|    NODE2     |    NODE1    |
+--------------+-------------+
|       CONTROL MACHINE      |
+----------------------------+


+--------------+-------------+
|    NODE3     |    NODE4    |
+--------------+-------------+
|     LXC      |     LXC     |
+--------------+-------------+
|       TEST MACHINE         |
+----------------------------+

For my test, we have 2 exact identical machines. In order to test the behaviour in each configuration, we start the 4 nodes on the 2 machines, and make them form a unique cluster.

To make sure that both machine manipulate the exact same data, we have to:

  • define one replica for all collections;
  • prevent a replica set to be define on the same machine as its primary set.

Control machine
The control machine runs on a CentOS 7 (Kernel 3.10.0). When pushing data on the cluster, we push everything on this machine (then, it dispatchrd data to the second machine with LXD). The filesystem is XFS.

LXC configuration
For the LXC configuration, we use default parameters (therefore no limits on I/O, nor write, etc). LXC also writes on XFS.

+--------------+-------------+
|    NODE3     |    NODE4    |
+--------------+-------------+
|     LXC      |     LXC     |
+--------------+-------------+
|     XFS      |     XFS     |
+--------------+-------------+
|      LOGICAL VOLUME        |
+----------------------------+
|       TEST MACHINE         |
+----------------------------+

Results


The results are consistent through multiple tests:

  • CPU wait much higher on the machine with LXC (from 1.3% to 7.7% in median, and from 4.9% to 23.8% for percentil 90);
  • load average is 50% higher;
  • IO read are higher for median and percentile, but lower for max values).

Weird configurations
Seing this, we ran 2 additional tests:

  1. Instead of running 2 nodes inside 2 different container, we run them inside the same container.
    Results are pretty similar to the previous test (image in next post).
  2. we run the 2 nodes outside of the LXC containers (but for some reason, we still write on the XFS defined for the containers).
    I/Os are equivalent to the test when using no LXD (image in next post).
    For me, this result shows that the filesystem configuration with the logical volume is not the issue.

Questions

  1. Is it expected to have such a behaviour with I/Os while running inside a container ?
  2. Are there any parameters we could check/tweak to get better performances ?

Results of the test with 2 nodes inside 1 container:

Results of the test with 2 nodes without container:

Were you using privileged or unprivileged containers?

If unprivileged, it could be that when running the test on the host, the daemon or its init script tweaked some process or kernel options which are not available as root inside an unprivileged container.

As far as expected overhead, all you should really have in your way is CGroup and since you’ve not configured any limits on those containers, it should have very little impact.

That said, the CentOS 3.10 kernel is a weird beast so it’s pretty hard to tell what feature it has or doesn’t have and whether any of this may impact performance for namespaces and cgroup operations.

Thanks a lot for your quick answer!

I was using unprivileged containers during tests. I will re-do tests with privileged containers.

For the LXC containers, I forgot to write some details:

  • Control machine : CentOS7, kernel 3.10.0.
  • LXC machine : Ubuntu 16.04.4, kernel 4.4.0.

I will keep you informed of the results (maybe in 1 week or 2 :)).