Disk IO is so slow in LXC

Hamid · June 4, 2021, 1:27am

Hello Everyone

I run LXC/LXD in Arch Linux (Kernel version = 5.12.8-arch1-1) and Disk IO is waaay slower in lxc containers than in the actual drive.

Here are what I tested:

lvm storage backend:
The problem arose with the lvm storage backend

lxc storage create lvm_pool source=lvmVG lvm.vg.force_reuse=true
lxc launch -p default --config=limits.memory=2GB ubuntu:20.04 test
lxc exec test bash
dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 5.68229 s, 144 MB/s

Then I mounted the LV and tried the dd command and it performs well

    sudo mount /dev/mapper/lvmVG-containers_test /mnt
    dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
    200000+0 records in
    200000+0 records out
    819200000 bytes (819 MB, 781 MiB) copied, 0.527557 s, 1.6 GB/s

Then I used the dir backend. Nothing changed

  lxc storage create dir_pool source=/mnt/containers
  lxc launch -p default --config=limits.memory=2GB ubuntu:20.04 test
  dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
  200000+0 records in
  200000+0 records out
  819200000 bytes (819 MB, 781 MiB) copied, 4.67927 s, 175 MB/s

and on the /mnt/container directly

dd if=/dev/zero of=benchfile     bs=4k count=200000 && sync; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 0.604141 s, 1.4 GB/s

On BTRFS, things are better!

  # In a container
  dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
  200000+0 records in
  200000+0 records out
  819200000 bytes (819 MB, 781 MiB) copied, 0.690473 s, 1.2 GB/s
  # In the host
  dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
  200000+0 records in
  200000+0 records out
  819200000 bytes (819 MB, 781 MiB) copied, 0.592722 s, 1.4 GB/s

ZFS is Okey-ish too

  dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
  200000+0 records in
  200000+0 records out
  819200000 bytes (819 MB, 781 MiB) copied, 1.0413 s, 787 MB/s

I can’t use ZFS or BTRFS in my use case, since they don’t play well with kubernetes and/or nested containers.

My questions are:

Is the difference between my host and containers using dir and lvm OKey ? the containers are almost unusable and very slow. Is there a problem ? and how can I debug it ?
Is ZFS performance normal compared to btrfs backend (and to host’s) ?
What are the tools and methods you use to debug storage problems lxc and linux in general ?

Hamid · June 4, 2021, 1:39am

LVM version

sudo lvm version
  LVM version:     2.03.12(2) (2021-05-07)
  Library version: 1.02.177 (2021-05-07)
  Driver version:  4.44.0
  Configuration:   ./configure CONFIG_SHELL=/bin/bash --prefix=/usr --sbindir=/usr/bin --sysconfdir=/etc --localstatedir=/var --enable-cmdlib --enable-dmeventd --enable-lvmpolld --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync --with-cache=internal --with-default-dm-run-dir=/run --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-systemdsystemunitdir=/usr/lib/systemd/system --with-thin=internal --with-udev-prefix=/usr --enable-udev-systemd-background-jobs

BTRFS Version: btrfs-progs v5.12.1

ZFS Version: zfs-2.0.4-1

stgraber · June 4, 2021, 3:02am

Instead of dd + sync, could you try dd with conv=fdatasync?

sync is always a global action and will flush the buffers not just of the container but of the entire host system too, so fdatasync should give more consistent results.

Hamid · June 4, 2021, 3:14am

Same thing on lvm backed container!

dd if=/dev/zero of=benchfile  conv=fdatasync bs=4k count=200000; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 10.6227 s, 77.1 MB/s

stgraber · June 4, 2021, 3:17am

Any limit other than memory applied on those containers?

Hamid · June 4, 2021, 9:21am

No I don’t think so. Here is the config for one of the containers

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210510)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210510"
  image.type: squashfs
  image.version: "20.04"
  limits.memory: 2GB
  volatile.base_image: 52c9bf12cbd3b06d591c5f56f8d9a185aca4a9a7da4d6e9f26f0ba44f68867b7
  volatile.eth0.host_name: vethb53fa944
  volatile.eth0.hwaddr: 00:16:3e:c9:ea:08
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: bfca530e-d0a7-477e-9d92-6461d3835f58
devices: {}
ephemeral: false
profiles:
- lvm
stateful: false
description: ""

The profile looks something like this:

config:
  limits.cpu: "2"
  limits.memory: 2GB
  limits.memory.swap: "false"
  linux.kernel_modules: ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_sh,ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,br_netfilter,zfs
  raw.lxc: |
    lxc.apparmor.profile=unconfined
    lxc.mount.auto=proc:rw sys:rw
    lxc.cgroup.devices.allow=a
    lxc.cap.drop=
  security.nesting: "true"
  security.privileged: "true"
  user.user-data: |
    #cloud-config
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCwKqq44ZZpKr4rPTq7rKyNBhuKJsdJxsNAtRSb3WA+1jdtb8FuU4Vhs0wtNR8yiW7ORAdXvlv46NOwH6GO5ua3snlAIsPVhr6QZ8uAZVbJbxyz8U1ytLsJVBOpi2q2x+d1sZUdAYQqCwC/vjlsVfMQx5+CnAVl2osoPudZl/udc8HNsuZD6MWtO3uxnRWgjnB/tB7zPTvXh9C3AtCXjXdxlp1/jnwmSn+E86/Im+gSLV2uSYJjOLggKYv6fyJuR0O5wAyIGFTbH/6K14C1MUCkBK1XkoeQsfY4+KKy8dpDxp3fEn7KCsgLvm+BZ1ja1qek45vn0rABVztRe0UgFqCNtYfVrjfZ2BiSP55Hdw82FSNF5FiMmeEHMqeKF2H8gj/4pQx8nYToh2JxZh+fPApiUA2D0mTJ8eh/0c/yjEdslWPtJriFNFhxGl9DtqnlGYJgUkn3/U0tJa/t6JyPyJMc+poU+hCx1y2kBl4/gd50TjdjtjTriMI0wgqhud85dviPQNcyoxC5nI0mkHgokeJSe6dlWSN3IkS++UwdKpPe1T+6FJRPh8olzjEZFL5BPWrRHKQRjRMYLmTLarTX1Ho6UqlEk55gobn1ZrdcoON3F3UmGOJpBoRMqEuvU0x2BOWKP7CNEn9psZ7Gq2GX0IyGUAGYAAbEykKDXJv6vWPavw== m.iduoad@gmail.com
description: Kubernetes LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lvmtest
    type: disk
name: lvm
used_by:
- /1.0/instances/test

stgraber · June 4, 2021, 12:25pm

Does removing the CPU limit help?

Hamid · June 4, 2021, 1:34pm

No it didn’t unfortunately.

Still IO is slow !

dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile

200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 8.04137 s, 102 MB/s

stgraber · June 4, 2021, 5:49pm

@brauner any ideas on what may be going on here?

ruskofd · June 4, 2021, 8:49pm

Did you tried with fio ?

I tried myself on my personal server :

using random write (4k)
single SSD
LXD LVM-thin storage driver
Ubuntu 21.04 (kernel 5.11) + LXD 4.15

Containers :

 bw (  KiB/s): min=19128, max=567848, per=100.00%, avg=367725.63, stdev=83970.26, samples=81
 iops        : min= 4782, max=141962, avg=91931.41, stdev=20992.56, samples=81

Host :

bw (  KiB/s): min=13856, max=569456, per=100.00%, avg=376131.90, stdev=80127.53, samples=80
iops        : min= 3464, max=142364, avg=94033.03, stdev=20031.89, samples=80

I tried multiple times and I still get a little bit less IOPS and throughput in a container than host. I guess it’s related to the snapshot thing of containers.

brauner · June 7, 2021, 9:35am

Is there anything interesting in dmesg or journalctl?

brauner · June 7, 2021, 9:37am

What type of disk do you have?

brauner · June 7, 2021, 10:03am

Seems fine here

root@test:~# dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 1.1202 s, 731 MB/s