LXD VM and container instance memory performance

I have been running performance benchmarks using sysbench and have noticed weird results when comparing between VMs, containers and the host.

Following is the lxc config show --expanded of my container instance:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20201210)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20201210"
  image.type: squashfs
  image.version: "20.04"
  limits.cpu: "4"
  limits.memory: 8GB
  user.network-config: |2-

            #cloud-config
            version: 1
            config:
              - type: physical
                name: eth0
                subnets:
                  - type: dhcp
                    ipv4: true
  volatile.base_image: e0c3495ffd489748aa5151628fa56619e6143958f041223cb4970731ef939cb6
  volatile.eth0.host_name: vethff4c6c56
  volatile.eth0.hwaddr: 00:16:3e:ae:6e:1b
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: ecc6db98-0ba6-491b-b7be-3aa42af65652
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- container-private
stateful: false
description: ""

And my VM instance:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20201210)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20201210"
  image.type: disk-kvm.img
  image.version: "20.04"
  limits.cpu: "4"
  limits.memory: 8000MB
  user.network-config: |2-

            #cloud-config
            version: 1
            config:
              - type: physical
                name: enp5s0
                subnets:
                  - type: dhcp
                    ipv4: true
  volatile.base_image: 5f7cb0463720be7bf5c81018f5a1cbeace78e1c38bb2e8fb9a6a545aadc43fe3
  volatile.eth0.host_name: tapc22dfb36
  volatile.eth0.hwaddr: 00:16:3e:0f:56:cf
  volatile.last_state.power: RUNNING
  volatile.uuid: 2c8edac6-9241-48fc-b6ac-e7ec2ada6b25
devices:
  config:
    source: cloud-init:config
    type: disk
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 10GB
    type: disk
ephemeral: false
profiles:
- vm-private
stateful: false
description: ""

As we can see, both the VM and the container instance have 4CPUs and 8GB of RAM given.

The command used for testing was the following:

sysbench memory --threads=4 --time=10 --memory-block-size=4K --memory-total-size=100G --memory-access-mode=seq --memory-oper=read run

Following are the container’s results:

Total operations: 26214400 (7543639.57 per second)

102400.00 MiB transferred (29467.34 MiB/sec)


General statistics:
    total time:                          3.4697s
    total number of events:              26214400

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                   20.02
         95th percentile:                        0.00
         sum:                                 5493.12

Threads fairness:
    events (avg/stddev):           6553600.0000/0.00
    execution time (avg/stddev):   1.3733/0.22

The VM’s results:

Total operations: 26214400 (8827942.22 per second)

102400.00 MiB transferred (34484.15 MiB/sec)


General statistics:
    total time:                          2.9669s
    total number of events:              26214400

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.69
         95th percentile:                        0.00
         sum:                                 4458.62

Threads fairness:
    events (avg/stddev):           6553600.0000/0.00
    execution time (avg/stddev):   1.1147/0.01

And the host’s:

Total operations: 26214400 (9292470.05 per second)

102400.00 MiB transferred (36298.71 MiB/sec)


General statistics:
    total time:                          2.8158s
    total number of events:              26214400

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.46
         95th percentile:                        0.00
         sum:                                 4280.78

Threads fairness:
    events (avg/stddev):           6553600.0000/0.00
    execution time (avg/stddev):   1.0702/0.01

I have run the tests multiple times and the values reported are within the same range. The main problem here is the results reported by the container instance. I would assume if there was any performance overhead, the VMs would show it more than anything else. However it seems that the container’s performance is considerably slower when compared to both the VM and the host.

Does anyone else have a similiar experience when it comes to memory performance?

@stgraber do you have any idea why this might be? Thanks

The most likely reason is that both the host and the VM aren’t pinned to specific CPUs, allowing for the scheduler to move them around to less busy threads.

In comparison, the container is pinned to specific threads at startup, so if you have other processes fighting for those CPU threads, then you’ve got a problem.

It may be worth unsetting limits.cpu=8 from the container and replacing that with limits.cpu.allowance=800% which will apply a scheduler time limit rather than a strict pinning, making it behave much more like the VM.

If you still observe the container being slower, then the explanation may be the spectre mitigation logic which added additional cache flushes when switching to tasks protected by seccomp. I don’t know exactly what’s active there these days, but on affected Intel hardware, container performance certainly got hit…

2 Likes

Setting the CPU allowance did indeed solve the performance issue. So for production containers instead of exposing a certain number of CPU cores it would be better to set the allowance instead? Setting the CPU cores would be useful when monitoring using htop for example however it does hinder memory performance significantly.

RIght, cpu.allowance is far better at spreading the load across your CPU cores.
The downside is that the container gets to see all the CPU cores and so may start too many threads.

Also as you’ve noticed, monitoring gets a bit harder as a container can absolutely use 50% of 16 cores now.

Also worth noting that limits.cpu.allowance=800% allows bursting on systems that aren’t busy. If you want a strict max limit, use limits.cpu.allowance=80ms/10ms instead.

2 Likes

Thanks for clarifying. Another question regarding limits.cpu X. Why is it that when we set this value for VMs, the load is automatically distributed amongst the host’s cores, but for container instances the CPUs are pinned?

That’s just because of how Linux works, VMs have one process per virtual CPU, containers have potentially thousands of processes.

1 Like

Would the container’s performance be better or the same as the VM if it was pinned to CPUs that it was not sharing with anyone else?

Ex: Host with 16 cores (32 threads)

  • Container 1: limits.cpu 0,1
  • Container 2: limits.cpu 2,3

In the above example each container is pinned to its own set of cores and shouldn’t be competing with each other correct?

Would a VM set with two CPUs still outperform the containers on the same host?

Bonus question: is there less of a performance hit with spectre mitigations on containers running AMD processors?

Thanks for the clarification!

It’s likely that AMD would perform better, some mitigations aren’t necessary on that platform (not all, but some for sure).

As for pinning, well, that’s complicated :slight_smile:
If your CPU has all cores and threads running at the same speed, then yes, but that’s not how modern CPUs function. When Linux does process scheduling, it can place tasks on cores that are running at a higher speed (boosting) and keep the ones clocking slower less busy. With direct pinning, you don’t benefit from this so can be a bit slower.

It also depends on pinning being done properly, so not being done across NUMA nodes and having equivalent threads. For example, I would expect a container getting one thread of two different cores where the other thread of those cores isn’t busy to perform better than a container given both threads from a single core.

It gets even trickier with the newer chips like AMD Epycs as they use a chiplet architecture, so even within the same socket the memory access cost coming from different threads may not be the same as there is a cost for core to core within the chiplet and then additional cost for core to core across chiplets and then obviously even more if you’re crossing sockets.

Did you try comparing a pinned VM (limits.cpu=0,1) to a pinned container (limits.cpu=2,3)?

I don’t know how scientific these are, but I ran some benchmarks @stgraber :laughing:

Hardware:

  • CPU: AMD Epyc 7551 32 Core, 64 Thread
  • ZFS (mirrored / striped, RAID10 basically) over 4 NVMe PCIe 3.0 drives
  • RAM: DDR4 2133 SDRAM, ECC

Ran each benchmark with and without pinning for a container and VM. Full config for each is at the end of the post. Each command was ran multiple times to confirm the values were in range that I’ve posted here.

Summary:

The container outperforms the VM just barely in every test except for disk where the container is over double the throughput. I assume because the VM is using a virt io layer and the container is accessing the ZFS dataset directly.

Things to note:

Memory speed for the container and VM drop significantly when pinning is not used.

The hardware was basically completely idle when these tests were ran. I’m curious though how the CPU pinning would perform on a system under load. My assumption would be worse since the linux schedule wouldn’t spread the tasks to less loaded cores/threads? Maybe I can test this one day.

Details of benchmark results and container/vm config:

CONTAINER WITH PINNING (limits.cpu 0,1)

--------------
MEMORY:

CMD:
sysbench memory --threads=4 --time=10 --memory-block-size=4K --memory-total-size=100G --memory-access-mode=seq --memory-oper=read run

RESULT:
Total operations: 26214400 (8734450.01 per second)
102400.00 MiB transferred (34118.95 MiB/sec)
--------------
CPU:

CMD:
sysbench --test=cpu --cpu-max-prime=20000 run

RESULT:
events per second:   561.49
--------------
DISK:

CMD:
sysbench fileio prepare
sysbench fileio --file-test-mode=rndrw run

RESULT:
read, MiB/s:                  79.12
written, MiB/s:               52.75    
--------------

VM WITH PINNING (limits.cpu 2,3)

--------------
MEMORY:

CMD:
sysbench memory --threads=4 --time=10 --memory-block-size=4K --memory-total-size=100G --memory-access-mode=seq --memory-oper=read run

RESULT:
Total operations: 26214400 (8193482.13 per second)
102400.00 MiB transferred (32005.79 MiB/sec)
--------------
CPU:

CMD:
sysbench --test=cpu --cpu-max-prime=20000 run

RESULT:
events per second:   560.56
--------------
DISK:

CMD:
sysbench fileio prepare
sysbench fileio --file-test-mode=rndrw run

RESULT:
read, MiB/s:                  31.11
written, MiB/s:               20.74    
--------------

CONTAINER WITHOUT PINNING (limits.cpu 2)

--------------
MEMORY:

CMD:
sysbench memory --threads=4 --time=10 --memory-block-size=4K --memory-total-size=100G --memory-access-mode=seq --memory-oper=read run

RESULT:
Total operations: 26214400 (6433174.57 per second)
102400.00 MiB transferred (25129.59 MiB/sec
--------------
CPU:

CMD:
sysbench --test=cpu --cpu-max-prime=20000 run

RESULT:
events per second:   559.32
--------------
DISK:

CMD:
sysbench fileio prepare
sysbench fileio --file-test-mode=rndrw run

RESULT:
read, MiB/s:                  83.17
written, MiB/s:               55.44  
--------------

VM WITHOUT PINNING (limits.cpu 2)

--------------
MEMORY:

CMD:
sysbench memory --threads=4 --time=10 --memory-block-size=4K --memory-total-size=100G --memory-access-mode=seq --memory-oper=read run

RESULT:
Total operations: 26214400 (6223839.82 per second)
102400.00 MiB transferred (24311.87 MiB/sec)
--------------
CPU:

CMD:
sysbench --test=cpu --cpu-max-prime=20000 run

RESULT:
events per second:   560.56
--------------
DISK:

CMD:
sysbench fileio prepare
sysbench fileio --file-test-mode=rndrw run

RESULT:
read, MiB/s:                  23.01
written, MiB/s:               15.34    
--------------

Container config:

lxc config show --expanded cputest1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210223)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210223"
  image.type: squashfs
  image.version: "20.04"
  limits.cpu: "0,1"
  limits.memory: 16GB
  security.devlxd: "false"
  security.idmap.isolated: "true"
  security.nesting: "false"
  volatile.base_image: b9e93652ee67612114951d910acc4fd6fce0473f8dc0bf562c602e997fcb4857
  volatile.eth0.host_name: cputest1pub
  volatile.eth1.host_name: cputest1pri
  volatile.idmap.base: "1196608"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1196608,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1196608,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 971ad95e-ffab-4c95-bd30-7894e09cd649
devices:
  eth0:
    host_name: cputest1pub
    hwaddr: 00:16:3e:40:20:d7
    name: eth0
    nictype: bridged
    parent: br0pub
    type: nic
  eth1:
    host_name: cputest1pri
    hwaddr: 00:16:3e:40:20:d8
    name: eth1
    nictype: bridged
    parent: br0pri
    type: nic
  root:
    path: /
    pool: default
    size: 10GB
    type: disk
ephemeral: false
profiles:
- cputest1
stateful: false
description: ""

VM config:

lxc config show --expanded cputest2
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210223)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210223"
  image.type: disk-kvm.img
  image.version: "20.04"
  limits.cpu: "2,3"
  limits.memory: 16GB
  security.devlxd: "false"
  security.idmap.isolated: "true"
  security.nesting: "false"
  volatile.base_image: a548372a4ccb5fc4fb1243de4ba5e4b130f861bb73f40ad1b6ffb0f534f8d168
  volatile.eth0.host_name: cputest2pub
  volatile.eth1.host_name: cputest2pri
  volatile.last_state.power: RUNNING
  volatile.uuid: ff386bf8-981e-4b06-916d-e41432b1204f
devices:
  eth0:
    host_name: cputest2pub
    hwaddr: 00:16:3e:40:20:d9
    name: eth0
    nictype: bridged
    parent: br0pub
    type: nic
  eth1:
    host_name: cputest2pri
    hwaddr: 00:16:3e:40:20:e8
    name: eth1
    nictype: bridged
    parent: br0pri
    type: nic
  root:
    path: /
    pool: default
    size: 10GB
    type: disk
ephemeral: false
profiles:
- cputest2
stateful: false
description: ""