Container sees buffer/cached memory as used

klim · April 5, 2021, 3:44pm

Hi,

I’m having some issues with containers seeing their buffered/cached memory as used. I thought it had something to do with memory limits, but it still comes to a point where services get OOM’ed killed, after I’ve disabled them.

I run docker inside the container and might be something related to that. Atleast it’s easy to reproduce by spinning up a bunch of docker containers inside the LXC.

lxc config show compute-xlarge-1

architecture: x86_64
config:
  image.build_ver: 10f16957b38b3df1cb525889296835c4868c3d4661a7fcd1040d78da1c364379
  image.name: base_image
  image.os: ubuntu
  image.version: "20.04"
  limits.cpu: 26-33
  limits.memory: 16GB
  security.nesting: "true"
  user.access_interface: eth1
  volatile.base_image: 42aa515d369bf585f35c774fa10f8dae74087ac952aa7cce4c59c19e93c5a4ae
  volatile.eth0.host_name: veth65d4e87d
  volatile.eth0.hwaddr: 00:16:3e:b4:95:cd
  volatile.eth1.host_name: veth897966a1
  volatile.eth1.hwaddr: 00:16:3e:50:ab:87
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: b5626105-3bc9-4b57-9262-e5dd4a89bea8
ephemeral: false
profiles:
- default
- vlan
stateful: false
description: ""

Fresh start of compute-xlarge-1

free -m
              total        used        free      shared  buff/cache   available
Mem:          15258         115       14842           0         301       15143
Swap:             0           0           0

After I’ve created 100 nginx docker containers, with docker run -ti -d nginx

957 MB used, this is still fair, as we have the 100 containers running.

free -m
              total        used        free      shared  buff/cache   available
Mem:          15258         957       12457           5        1843       14300
Swap:             0           0           0

It more than doubled the amount of memory used, after I’ve stopped and deleted all the 100 containers. This seems odd.

free -m
              total        used        free      shared  buff/cache   available
Mem:          15258        2414       11815           0        1028       12843
Swap:             0           0           0

Thought it might be the docker service, well stopping that lowered usage a little bit, but still quite high for what’s running

free -m
              total        used        free      shared  buff/cache   available
Mem:          15258        1513       12824           0         920       13744
Swap:             0           0           0


ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 169640  8300 ?        Ss   17:25   0:01 /sbin/init
root          82  0.0  0.4 133580 64536 ?        Ss   17:25   0:01 /lib/systemd/systemd-journald
root         117  0.0  0.0  21608  3036 ?        Ss   17:25   0:00 /lib/systemd/systemd-udevd
systemd+     172  0.0  0.0  27056  5276 ?        Ss   17:25   0:00 /lib/systemd/systemd-networkd
root         200  0.0  0.0 237312  4164 ?        Ssl  17:25   0:00 /usr/lib/accountsservice/accounts-daemon
root         209  0.0  0.0   8536  1432 ?        Ss   17:25   0:00 /usr/sbin/cron -f
message+     211  0.0  0.0   7384  3040 ?        Ss   17:25   0:00 /usr/bin/dbus-daemon --system --address=systemd: --n
root         214  0.0  0.0  29528 14592 ?        Ss   17:25   0:01 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-
syslog       215  0.0  0.0 154864  2976 ?        Ssl  17:25   0:00 /usr/sbin/rsyslogd -n -iNONE
root         218  0.0  0.0  16804  3088 ?        Ss   17:25   0:00 /lib/systemd/systemd-logind
daemon       221  0.0  0.0   3792  1032 ?        Ss   17:25   0:00 /usr/sbin/atd -f
root         235  0.0  0.0   7352  1232 console  Ss+  17:25   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud conso
root         236  0.0  0.0 108084 14552 ?        Ssl  17:25   0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unat
root         237  0.0  0.0  12176  4680 ?        Ss   17:25   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 start
root         258  0.0  0.0 232716  3612 ?        Ssl  17:25   0:00 /usr/lib/policykit-1/polkitd --no-debug
root         509  0.0  0.0   8960  3256 ?        Ss   17:25   0:00 bash
root        2809  0.0  0.0   8960  3936 ?        Ss   17:31   0:00 bash
root        3045  0.0  0.0   9140  4888 ?        S+   17:31   0:04 htop
root       43901  0.0  0.0  10616  3304 ?        R+   17:37   0:00 ps aux

Memory consumption comes to a reasonable level after clearing cache with the following command on the host
sync && echo 3 > /proc/sys/vm/drop_caches

free -m
              total        used        free      shared  buff/cache   available
Mem:          15258         767       14360           0         130       14491
Swap:             0           0           0

Any help would be much appreciated!

stgraber · April 5, 2021, 11:01pm

The listed available memory in all of those looks reasonable.
The fact that the kernel doesn’t expose buffered/cached amounts properly for cgroups makes it hard for lxcfs to render properly in MemInfo but the available values look correct throughout and so it’s a bit surprising to hear that the OOM killer would have triggered.

Can you show the kernel log of an OOM killer run?

klim · April 6, 2021, 4:42am

Why do you find 12843 MB available memory reasonable after I’ve stopped all containers, and 13744 MB after I’ve stopped basicly everything else than systemd within the container.

Here’s the log from yesterday

Apr 05 08:06:06 compute-lxc-host-1 kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0x42cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP), order=2, oom_score_adj=0
Apr 05 08:06:07 compute-lxc-host-1 kernel: CPU: 5 PID: 2838153 Comm: runc:[2:INIT] Tainted: P           O      5.4.0-65-generic #73-Ubuntu
Apr 05 08:06:07 compute-lxc-host-1 kernel: Hardware name: ASUSTeK COMPUTER INC. KRPA-U16 Series/KRPA-U16 Series, BIOS 4102 11/16/2020
Apr 05 08:06:07 compute-lxc-host-1 kernel: Call Trace:
Apr 05 08:06:07 compute-lxc-host-1 kernel:  dump_stack+0x6d/0x9a
Apr 05 08:06:07 compute-lxc-host-1 kernel:  dump_header+0x4f/0x1eb
Apr 05 08:06:07 compute-lxc-host-1 kernel:  oom_kill_process.cold+0xb/0x10
Apr 05 08:06:07 compute-lxc-host-1 kernel:  out_of_memory.part.0+0x1df/0x3d0
Apr 05 08:06:07 compute-lxc-host-1 kernel:  out_of_memory+0x6d/0xd0
Apr 05 08:06:07 compute-lxc-host-1 kernel:  __alloc_pages_slowpath+0xd5e/0xe50
Apr 05 08:06:07 compute-lxc-host-1 kernel:  __alloc_pages_nodemask+0x2d0/0x320
Apr 05 08:06:07 compute-lxc-host-1 kernel:  alloc_pages_current+0x87/0xe0
Apr 05 08:06:07 compute-lxc-host-1 kernel:  kmalloc_order+0x1f/0x80
Apr 05 08:06:07 compute-lxc-host-1 kernel:  kmalloc_order_trace+0x24/0xa0
Apr 05 08:06:07 compute-lxc-host-1 kernel:  ? cpumask_next+0x1b/0x20
Apr 05 08:06:07 compute-lxc-host-1 kernel:  __kmalloc_track_caller+0x222/0x280
Apr 05 08:06:07 compute-lxc-host-1 kernel:  ? bpf_prog_store_orig_filter.isra.0+0x5e/0x90
Apr 05 08:06:07 compute-lxc-host-1 kernel:  kmemdup+0x1c/0x40
Apr 05 08:06:07 compute-lxc-host-1 kernel:  bpf_prog_store_orig_filter.isra.0+0x5e/0x90
Apr 05 08:06:07 compute-lxc-host-1 kernel:  ? hardlockup_detector_perf_cleanup.cold+0x14/0x14
Apr 05 08:06:07 compute-lxc-host-1 kernel:  bpf_prog_create_from_user+0xb8/0x120
Apr 05 08:06:07 compute-lxc-host-1 kernel:  seccomp_set_mode_filter+0x11a/0x750
Apr 05 08:06:07 compute-lxc-host-1 kernel:  ? __secure_computing+0x42/0xe0
Apr 05 08:06:07 compute-lxc-host-1 kernel:  do_seccomp+0x39/0x200
Apr 05 08:06:07 compute-lxc-host-1 kernel:  __x64_sys_seccomp+0x1a/0x20
Apr 05 08:06:07 compute-lxc-host-1 kernel:  do_syscall_64+0x57/0x190
Apr 05 08:06:07 compute-lxc-host-1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 05 08:06:07 compute-lxc-host-1 kernel: RIP: 0033:0x7fc9edb2389d
Apr 05 08:06:07 compute-lxc-host-1 kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 0>
Apr 05 08:06:07 compute-lxc-host-1 kernel: RSP: 002b:00007fffbe755ac8 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
Apr 05 08:06:07 compute-lxc-host-1 kernel: RAX: ffffffffffffffda RBX: 00005645ac3334d0 RCX: 00007fc9edb2389d
Apr 05 08:06:07 compute-lxc-host-1 kernel: RDX: 00005645ac360d30 RSI: 0000000000000001 RDI: 0000000000000001
Apr 05 08:06:07 compute-lxc-host-1 kernel: RBP: 00005645ac360d30 R08: 0000000000000001 R09: 000000c00018cf90
Apr 05 08:06:07 compute-lxc-host-1 kernel: R10: 0000000000000007 R11: 0000000000000246 R12: 000000c00018e000
Apr 05 08:06:07 compute-lxc-host-1 kernel: R13: 0000000000000181 R14: 0000000000000180 R15: 0000000000000200
Apr 05 08:06:07 compute-lxc-host-1 kernel: Mem-Info:
Apr 05 08:06:07 compute-lxc-host-1 kernel: active_anon:911792 inactive_anon:10701 isolated_anon:0
                                            active_file:163681 inactive_file:205589 isolated_file:0
                                            unevictable:0 dirty:731 writeback:13 unstable:0
                                            slab_reclaimable:30310549 slab_unreclaimable:851661
                                            mapped:115798 shmem:12769 pagetables:12593 bounce:0
                                            free:211050 free_pcp:559 free_cma:0
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 active_anon:3647168kB inactive_anon:42804kB active_file:654724kB inactive_file:822356kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:463192kB dirty:2924kB writeback:52kB sh>
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 DMA free:15900kB min:8kB low:20kB high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB>
Apr 05 08:06:07 compute-lxc-host-1 kernel: lowmem_reserve[]: 0 2564 128615 128615 128615
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 DMA32 free:504708kB min:1344kB low:3968kB high:6592kB active_anon:544kB inactive_anon:0kB active_file:84kB inactive_file:68kB unevictable:0kB writepending:0kB present:2745684kB managed:274>
Apr 05 08:06:07 compute-lxc-host-1 kernel: lowmem_reserve[]: 0 0 126051 126051 126051
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 Normal free:324096kB min:66228kB low:195304kB high:324380kB active_anon:3646624kB inactive_anon:42804kB active_file:654636kB inactive_file:822292kB unevictable:0kB writepending:2976kB pres>
Apr 05 08:06:07 compute-lxc-host-1 kernel: lowmem_reserve[]: 0 0 0 0 0
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 DMA32: 425*4kB (UME) 288*8kB (UME) 204*16kB (UME) 253*32kB (UME) 230*64kB (UME) 206*128kB (UM) 149*256kB (UM) 81*512kB (UM) 34*1024kB (UM) 13*2048kB (UM) 75*4096kB (UM) = 504708kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 Normal: 75305*4kB (UME) 3442*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 328756kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: 381689 total pagecache pages
Apr 05 08:06:07 compute-lxc-host-1 kernel: 0 pages in swap cache
Apr 05 08:06:07 compute-lxc-host-1 kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 05 08:06:07 compute-lxc-host-1 kernel: Free swap  = 0kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: Total swap = 0kB
Apr 05 08:06:07 compute-lxc-host-1 kernel: 33520369 pages RAM
Apr 05 08:06:07 compute-lxc-host-1 kernel: 0 pages HighMem/MovableOnly
Apr 05 08:06:07 compute-lxc-host-1 kernel: 558818 pages reserved
Apr 05 08:06:07 compute-lxc-host-1 kernel: 0 pages cma reserved
Apr 05 08:06:07 compute-lxc-host-1 kernel: 0 pages hwpoisoned
Apr 05 08:06:07 compute-lxc-host-1 kernel: Tasks state (memory values in pages):
Apr 05 08:06:07 compute-lxc-host-1 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Apr 05 08:06:07 compute-lxc-host-1 kernel: [    791]     0   791    12924      669   118784        0          -250 systemd-journal
Apr 05 08:06:07 compute-lxc-host-1 kernel: [    835]     0   835     5418      840    77824        0         -1000 systemd-udevd
Apr 05 08:06:07 compute-lxc-host-1 kernel: [    847]   100   847     4681      453    77824        0             0 systemd-network
Apr 05 08:06:07 compute-lxc-host-1 kernel: [   6342] 1000000  6342   573794    95536  1437696        0             0 nomad
    (tasks continues...)


Apr 05 08:06:07 compute-lxc-host-1 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=39f777ef9204956cc8f840e3e0b75a074e0bb76849f210356742641809196559,mems_allowed=0,global_oom,task_memcg=/lxc.payload.compute-server-1/syst>
Apr 05 08:06:07 compute-lxc-host-1 kernel: Out of memory: Killed process 6342 (nomad) total-vm:2295176kB, anon-rss:378596kB, file-rss:3548kB, shmem-rss:0kB, UID:1000000 pgtables:1404kB oom_score_adj:0
Apr 05 08:06:07 compute-lxc-host-1 kernel: oom_reaper: reaped process 6342 (nomad), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

stgraber · April 6, 2021, 6:58pm

what storage backend are you using?

stgraber · April 6, 2021, 7:00pm

As for the OOM kill, what’s the memory limit and usage in compute-server-1?
Sounds like that particular cgroup/instance hit a fall causing it to trigger.

klim · April 6, 2021, 7:20pm

I’m using btrfs as storage backend. Actually today just had a lot of trouble with the filesystem on this node. It’s a “fresh node”, install is < a week old.
However a combination between LXC and Docker seems not to go well with btrfs, right now there’s more than 650 subvolumes created.
Maybe this is related? Till today the root filesystem wasn’t mounted with “user_subvol_rm_allowed”

compute-server-1 has “limit.memory” set to “8GB”. Just rebooted, 37 minutes of uptime and right now it’s using 514MB which is fair.

stgraber · April 6, 2021, 7:28pm

Okay, I don’t know what btrfs does in that regard. On ZFS, the filesystem uses some amount of kernel memory as a cache which doesn’t get counted as bufferred/cached, so it’s not uncommon for a system to show higher used memory as a result.

Another thing to keep in mind is that Linux memory management has changed a lot over the years and a common memory release pattern from processes these days is to tell the kernel that they have memory which can be reclaimed but that memory will not actually be reclaimed until the kernel needs it, to save costly kernel operations.
The side effect of this is a general inaccuracy in memory usage on modern systems…

One last thing to keep in mind when it comes to memory totals looking weird.
Remember that tmpfs usage will count as memory usage within a cgroup, so a system with no obvious memory usage through processes but high amount of used memory can sometimes be caused by tmpfs filesystems consuming the memory.

It’s also always possible that something in the kernel is leaky so when practical, testing a newer kernel (like Ubuntu’s HWE) is often a useful datapoint when debugging this kind of stuff.