Disk IO is so slow in LXC

Hello Everyone

I run LXC/LXD in Arch Linux (Kernel version = 5.12.8-arch1-1) and Disk IO is waaay slower in lxc containers than in the actual drive.

Here are what I tested:

  • lvm storage backend:
    The problem arose with the lvm storage backend

    lxc storage create lvm_pool source=lvmVG lvm.vg.force_reuse=true
    lxc launch -p default --config=limits.memory=2GB ubuntu:20.04 test
    lxc exec test bash
    dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
    200000+0 records in
    200000+0 records out
    819200000 bytes (819 MB, 781 MiB) copied, 5.68229 s, 144 MB/s 
    

Then I mounted the LV and tried the dd command and it performs well

    sudo mount /dev/mapper/lvmVG-containers_test /mnt
    dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
    200000+0 records in
    200000+0 records out
    819200000 bytes (819 MB, 781 MiB) copied, 0.527557 s, 1.6 GB/s
  • Then I used the dir backend. Nothing changed

      lxc storage create dir_pool source=/mnt/containers
      lxc launch -p default --config=limits.memory=2GB ubuntu:20.04 test
      dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
      200000+0 records in
      200000+0 records out
      819200000 bytes (819 MB, 781 MiB) copied, 4.67927 s, 175 MB/s
    

and on the /mnt/container directly

dd if=/dev/zero of=benchfile     bs=4k count=200000 && sync; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 0.604141 s, 1.4 GB/s
  • On BTRFS, things are better!

      # In a container
      dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
      200000+0 records in
      200000+0 records out
      819200000 bytes (819 MB, 781 MiB) copied, 0.690473 s, 1.2 GB/s
      # In the host
      dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
      200000+0 records in
      200000+0 records out
      819200000 bytes (819 MB, 781 MiB) copied, 0.592722 s, 1.4 GB/s
    
  • ZFS is Okey-ish too

      dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
      200000+0 records in
      200000+0 records out
      819200000 bytes (819 MB, 781 MiB) copied, 1.0413 s, 787 MB/s
    

I can’t use ZFS or BTRFS in my use case, since they don’t play well with kubernetes and/or nested containers.


My questions are:

  • Is the difference between my host and containers using dir and lvm OKey ? the containers are almost unusable and very slow. Is there a problem ? and how can I debug it ?
  • Is ZFS performance normal compared to btrfs backend (and to host’s) ?
  • What are the tools and methods you use to debug storage problems lxc and linux in general ?

LVM version

sudo lvm version
  LVM version:     2.03.12(2) (2021-05-07)
  Library version: 1.02.177 (2021-05-07)
  Driver version:  4.44.0
  Configuration:   ./configure CONFIG_SHELL=/bin/bash --prefix=/usr --sbindir=/usr/bin --sysconfdir=/etc --localstatedir=/var --enable-cmdlib --enable-dmeventd --enable-lvmpolld --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync --with-cache=internal --with-default-dm-run-dir=/run --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-systemdsystemunitdir=/usr/lib/systemd/system --with-thin=internal --with-udev-prefix=/usr --enable-udev-systemd-background-jobs

BTRFS Version: btrfs-progs v5.12.1

ZFS Version: zfs-2.0.4-1

Instead of dd + sync, could you try dd with conv=fdatasync?

sync is always a global action and will flush the buffers not just of the container but of the entire host system too, so fdatasync should give more consistent results.

Same thing on lvm backed container!

dd if=/dev/zero of=benchfile  conv=fdatasync bs=4k count=200000; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 10.6227 s, 77.1 MB/s

Any limit other than memory applied on those containers?

No I don’t think so. Here is the config for one of the containers

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210510)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210510"
  image.type: squashfs
  image.version: "20.04"
  limits.memory: 2GB
  volatile.base_image: 52c9bf12cbd3b06d591c5f56f8d9a185aca4a9a7da4d6e9f26f0ba44f68867b7
  volatile.eth0.host_name: vethb53fa944
  volatile.eth0.hwaddr: 00:16:3e:c9:ea:08
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: bfca530e-d0a7-477e-9d92-6461d3835f58
devices: {}
ephemeral: false
profiles:
- lvm
stateful: false
description: ""

The profile looks something like this:

config:
  limits.cpu: "2"
  limits.memory: 2GB
  limits.memory.swap: "false"
  linux.kernel_modules: ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_sh,ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,br_netfilter,zfs
  raw.lxc: |
    lxc.apparmor.profile=unconfined
    lxc.mount.auto=proc:rw sys:rw
    lxc.cgroup.devices.allow=a
    lxc.cap.drop=
  security.nesting: "true"
  security.privileged: "true"
  user.user-data: |
    #cloud-config
    ssh_authorized_keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCwKqq44ZZpKr4rPTq7rKyNBhuKJsdJxsNAtRSb3WA+1jdtb8FuU4Vhs0wtNR8yiW7ORAdXvlv46NOwH6GO5ua3snlAIsPVhr6QZ8uAZVbJbxyz8U1ytLsJVBOpi2q2x+d1sZUdAYQqCwC/vjlsVfMQx5+CnAVl2osoPudZl/udc8HNsuZD6MWtO3uxnRWgjnB/tB7zPTvXh9C3AtCXjXdxlp1/jnwmSn+E86/Im+gSLV2uSYJjOLggKYv6fyJuR0O5wAyIGFTbH/6K14C1MUCkBK1XkoeQsfY4+KKy8dpDxp3fEn7KCsgLvm+BZ1ja1qek45vn0rABVztRe0UgFqCNtYfVrjfZ2BiSP55Hdw82FSNF5FiMmeEHMqeKF2H8gj/4pQx8nYToh2JxZh+fPApiUA2D0mTJ8eh/0c/yjEdslWPtJriFNFhxGl9DtqnlGYJgUkn3/U0tJa/t6JyPyJMc+poU+hCx1y2kBl4/gd50TjdjtjTriMI0wgqhud85dviPQNcyoxC5nI0mkHgokeJSe6dlWSN3IkS++UwdKpPe1T+6FJRPh8olzjEZFL5BPWrRHKQRjRMYLmTLarTX1Ho6UqlEk55gobn1ZrdcoON3F3UmGOJpBoRMqEuvU0x2BOWKP7CNEn9psZ7Gq2GX0IyGUAGYAAbEykKDXJv6vWPavw== m.iduoad@gmail.com
description: Kubernetes LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lvmtest
    type: disk
name: lvm
used_by:
- /1.0/instances/test

Does removing the CPU limit help?

No it didn’t unfortunately.

Still IO is slow !

dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile

200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 8.04137 s, 102 MB/s

@brauner any ideas on what may be going on here?

Did you tried with fio ?

I tried myself on my personal server :

  • using random write (4k)
  • single SSD
  • LXD LVM-thin storage driver
  • Ubuntu 21.04 (kernel 5.11) + LXD 4.15

Containers :

 bw (  KiB/s): min=19128, max=567848, per=100.00%, avg=367725.63, stdev=83970.26, samples=81
 iops        : min= 4782, max=141962, avg=91931.41, stdev=20992.56, samples=81

Host :

bw (  KiB/s): min=13856, max=569456, per=100.00%, avg=376131.90, stdev=80127.53, samples=80
iops        : min= 3464, max=142364, avg=94033.03, stdev=20031.89, samples=80

I tried multiple times and I still get a little bit less IOPS and throughput in a container than host. I guess it’s related to the snapshot thing of containers.

Is there anything interesting in dmesg or journalctl?

What type of disk do you have?

Seems fine here

root@test:~# dd if=/dev/zero of=benchfile bs=4k count=200000 && sync; rm benchfile
200000+0 records in
200000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 1.1202 s, 731 MB/s