* Distribution: Ubuntu
* Distribution version: jammy-22.04
* The output of
… * `lxc --version` : 5.1
* `lxc-checkconfig`: is not included
* `uname -a`: Linux ip-10-11-21-54 5.15.0-1005-aws 7-Ubuntu SMP Wed Apr 20 03:44:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
* `cat /proc/self/cgroup`:
13:devices:/user.slice
12:hugetlb:/
11:misc:/
10:cpuset:/
9:freezer:/
8:memory:/user.slice/user-1000.slice/session-23.scope
7:perf_event:/
6:cpu,cpuacct:/user.slice
5:rdma:/
4:net_cls,net_prio:/
3:pids:/user.slice/user-1000.slice/session-23.scope
2:blkio:/user.slice
1:name=systemd:/user.slice/user-1000.slice/session-23.scope
0::/user.slice/user-1000.slice/session-23.scope
* `cat /proc/1/mounts`
/dev/root / ext4 rw,relatime,discard,errors=remount-ro 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1894916k,nr_inodes=473729,mode=755,inode64 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=760556k,nr_inodes=819200,mode=755,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64 0 0
cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset,clone_children 0 0
cgroup /sys/fs/cgroup/misc cgroup rw,nosuid,nodev,noexec,relatime,misc 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14362 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
none /run/credentials/systemd-sysusers.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
/dev/loop1 /snap/core20/1434 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/loop0 /snap/amazon-ssm-agent/5163 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/loop2 /snap/core18/2344 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/loop4 /snap/snapd/15534 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/nvme0n1p15 /boot/efi vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 0
tmpfs /run/snapd/ns tmpfs rw,nosuid,nodev,size=760556k,nr_inodes=819200,mode=755,inode64 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0
/dev/loop5 /snap/snapd/15904 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/loop6 /snap/core18/2409 squashfs ro,nodev,relatime,errors=continue 0 0
/dev/loop7 /snap/amazon-ssm-agent/5656 squashfs ro,nodev,relatime,errors=continue 0 0
tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=380276k,nr_inodes=95069,mode=700,uid=1000,gid=1000,inode64 0 0
/dev/loop3 /snap/lxd/23037 squashfs ro,nodev,relatime,errors=continue 0 0
nsfs /run/snapd/ns/lxd.mnt nsfs rw 0 0
tmpfs /var/snap/lxd/common/ns tmpfs rw,relatime,size=1024k,mode=700,inode64 0 0
nsfs /var/snap/lxd/common/ns/shmounts nsfs rw 0 0
nsfs /var/snap/lxd/common/ns/mntns nsfs rw 0 0
# Issue description
In production we have seen regularly containers become unavailable when memory limits get reached.
Most of the time it gets preceded by an OOM kill action and then most if not all processes in the container start reading at full capacity from disk. As far as we can see the container disk hits maximum throughput speed and stays there till the container is stopped. (have left it for hours once in this state)
I can imagine a read peek and higher read throughput when there is almost no disk cache available in memory, but the behavior we see seems not to be expected or desirable.
luckily we use dedicated disks for each container, otherwise it would probably take out the complete instance and all the containers on it.
An important side note is that we also see this happen on containers that are in no way under load and or publicly available to be able to explain a constant need to read at maximum capacity. Also we see processes at maximum read capacity, from which you would not expect much reads at all (init, nscd, crond etc).
In production we run on LXD/LXC on AmazonLinux, so I decided to reproduce it on a dedicated default Ubuntu instance on AWS to isolate the problem and rule out as much of the exotic choices we made. (OS/kernel/settings).
# Steps to reproduce
1. Start the Ubuntu instance based on Ubuntu jammy-22.04-amd64 (ami-07bd2fc45c8a8dd48 in eu-west-1)
As I could not start a AmazonLinux container:
(Error: The image used by this instance requires a CGroupV1 host system)
I had to change this and reboot;
/etc/default/grub
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=false"
update-grub2
reboot
3. Run lxd init;
Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]: no
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to create a new local network bridge? (yes/no) [default=yes]:
What should the new bridge be called? [default=lxdbr0]:
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 10.11.21.121/24
Would you like LXD to NAT IPv4 traffic on your bridge? [default=yes]:
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: no
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
images.auto_update_interval: "0"
networks:
- config:
ipv4.address: 10.11.21.121/24
ipv4.nat: "true"
ipv6.address: none
description: ""
name: lxdbr0
type: ""
project: default
storage_pools: []
profiles:
- config: {}
description: ""
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
name: default
projects: []
cluster: null
4. Attach a dedicated swap and container disk to the instance
5. Enable swap and create storage
mkswap /dev/nvme1n1 -L swap
echo "LABEL=swap none swap defaults,nofail 0 0" >> /etc/fstab
swapon -a
lxc storage create burst1 lvm source=/dev/nvme2n1
6. Create an AmazonLinux container
lxc launch -s burst1 images:amazonlinux burst1
7. Install clamd on the container
lxc exec -t burst1 -- sh -c "yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm"
lxc exec -t burst1 -- sh -c "yum install -y iputils procps-ng clamd clamav-server clamav-data clamav-update clamav-filesystem clamav clamav-scanner-systemd clamav-devel clamav-lib clamav-server-systemd"
lxc exec -t burst1 -- sh -c "sed -i s/^#LocalSocket/LocalSocket/g /etc/clamd.d/scan.conf"
lxc exec -t burst1 -- sh -c "systemctl enable clamd@scan"
lxc exec -t burst1 -- sh -c "systemctl start clamd@scan"
5. Limit the memory to 1GB and restart the container
lxc stop burst1
lxc config set burst1 limits.memory 1GB
lxc start burst1
6. See top, iotop and the AWS volume metrics to see high cpu and read io usage
iotop:
Total DISK READ: 128.17 M/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 128.17 M/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
6244 be/4 1000000 42.76 M/s 0.00 B/s ?unavailable? init
6648 be/4 1000000 42.65 M/s 0.00 B/s ?unavailable? systemd-hostnamed
6649 be/4 1000998 42.76 M/s 0.00 B/s ?unavailable? clamd -c /etc/clamd.d/scan.conf
Total DISK READ: 128.25 M/s | Total DISK WRITE: 0.00 B/s
Current DISK READ: 128.25 M/s | Current DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
6244 be/4 1000000 32.37 M/s 0.00 B/s ?unavailable? init
6639 be/4 1000081 32.33 M/s 0.00 B/s ?unavailable? dbus-daemon --system --add~idfile --systemd-activation
6642 be/4 1000000 32.61 M/s 0.00 B/s ?unavailable? crond -n
6649 be/4 1000998 30.94 M/s 0.00 B/s ?unavailable? clamd -c /etc/clamd.d/scan.conf
# Information to attach
Find dmesg, lxc.log and lxc.conf and AWS volume metric read-io.png attached

[lxc.conf.txt](https://github.com/lxc/lxc/files/8769741/lxc.conf.txt)
[lxc.log](https://github.com/lxc/lxc/files/8769742/lxc.log)
[dmesg.txt](https://github.com/lxc/lxc/files/8769743/dmesg.txt)