Scaling past 1350 Containers - seccomp errors

Hello,

I have been working on scaling further upwards and have encountered another occurrence of the following failure when attempting to start a container, after having 1351 containers up:

lxc-start 438 20200615053953.628 ERROR seccomp - seccomp.c:lxc_seccomp_load:1239 - Unknown error 524 - Error loading the seccomp policy

This was originally raised & solved in this thread:

Where the solution was to increase this:

net.core.bpf_jit_limit = 3000000000

I have tried increasing it much further than what is advised (net.core.bpf_jit_limit = 3000000000000), but am still encountering the seccomp errors, so I believe that there might be another constraint at play.

Here is sysctl:
kernel.keys.maxkeys = 100000000
kernel.keys.maxbytes = 200000000
kernel.dmesg_restrict = 1
vm.max_map_count = 262144
net.ipv6.conf.default.autoconf = 0
fs.inotify.max_queued_events = 167772160
fs.inotify.max_user_instances = 167772160 # def:128
fs.inotify.max_user_watches = 167772160 # def:8192
net.core.bpf_jit_limit = 3000000000000x
kernel.keys.root_maxbytes = 2000000000
kernel.keys.root_maxkeys = 1000000000
kernel.pid_max = 4194304
kernel.keys.gc_delay = 300
kernel.keys.persistent_keyring_expiry = 259200
fs.aio-max-nr = 524288
kernel.pty.max = 10000
net.core.somaxconn=10000
fs.file-max = 1048576
net.ipv4.ip_local_port_range = 12000 65535
kernel.pty.reserve = 2048
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_mem = 50576 64768 98152
net.core.netdev_max_backlog = 5000

The containers are running Ubuntu 18.04.

Could you please advise where to look next in order to diagnose the issue?

Actually this appears to be unrelated to LXC, i see the following in the syslog:

Jun 15 06:45:14 host kernel: vmap allocation for size 8192 failed: use vmalloc= to increase size
Jun 15 06:45:14 host kernel: lxc-start: vmalloc: allocation failure: 4096 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=ns,mems_allowed=0-1

Jun 15 06:45:14 host kernel: Call Trace:
Jun 15 06:45:14 host kernel: dump_stack+0x6d/0x9a
Jun 15 06:45:14 host kernel: warn_alloc.cold.119+0x7b/0xdd
Jun 15 06:45:14 host kernel: ? __get_vm_area_node+0x149/0x160
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: __vmalloc_node_range+0x1aa/0x270
Jun 15 06:45:14 host kernel: ? pcpu_block_refresh_hint+0xb0/0xf0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: module_alloc+0x82/0xe0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_binary_alloc+0x63/0xf0
Jun 15 06:45:14 host kernel: ? emit_mov_reg+0xf0/0xf0
Jun 15 06:45:14 host kernel: bpf_int_jit_compile+0x133/0x34d
Jun 15 06:45:14 host kernel: bpf_prog_select_runtime+0xcd/0x150
Jun 15 06:45:14 host kernel: bpf_prepare_filter+0x52e/0x5a0
Jun 15 06:45:14 host kernel: bpf_prog_create_from_user+0xc5/0x110
Jun 15 06:45:14 host kernel: ? hardlockup_detector_perf_cleanup.cold.9+0x1a/0x1a
Jun 15 06:45:14 host kernel: do_seccomp+0x2bf/0x8d0
Jun 15 06:45:14 host kernel: __x64_sys_seccomp+0x1a/0x20
Jun 15 06:45:14 host kernel: do_syscall_64+0x57/0x190
Jun 15 06:45:14 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 15 06:45:14 host kernel: RIP: 0033:0x7fbfc709bf59
Jun 15 06:45:14 host kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
Jun 15 06:45:14 host kernel: RSP: 002b:00007ffd36a591b8 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
Jun 15 06:45:14 host kernel: RAX: ffffffffffffffda RBX: 00005597e2acf440 RCX: 00007fbfc709bf59
Jun 15 06:45:14 host kernel: RDX: 00005597e2ade6f0 RSI: 0000000000000000 RDI: 0000000000000001
Jun 15 06:45:14 host kernel: RBP: 00005597e2ade6f0 R08: 00005597e2acf440 R09: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: R10: 00005597e2ad34a0 R11: 0000000000000246 R12: 00007ffd36a5925c
Jun 15 06:45:14 host kernel: R13: 0000000000000000 R14: 00000000ffffffff R15: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: Mem-Info:
Jun 15 06:45:14 host kernel: active_anon:46934939 inactive_anon:84738556 isolated_anon:0
active_file:20479475 inactive_file:18648470 isolated_file:0
unevictable:223734 dirty:590 writeback:0 unstable:0
slab_reclaimable:6646485 slab_unreclaimable:25509665
mapped:5764741 shmem:53598 pagetables:2035581 bounce:0
free:35623875 free_pcp:138359 free_cma:0

Jun 15 06:45:14 host kernel: Node 0 active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB isolated(anon):0kB isolated(file):0kB mapped:11951496kB dirty:1392kB writeback:0kB shmem:78572kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 1 active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB isolated(anon):0kB isolated(file):0kB mapped:11107468kB dirty:968kB writeback:0kB shmem:135820kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 0 DMA free:15872kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15872kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 2557 515793 515793 515793
Jun 15 06:45:14 host kernel: Node 0 DMA32 free:2626636kB min:220kB low:2836kB high:5452kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2732964kB managed:2665112kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1608kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 513236 513236 513236
Jun 15 06:45:14 host kernel: Node 0 Normal free:57810656kB min:44820kB low:570372kB high:1095924kB active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB writepending:1392kB present:533970944kB managed:525553736kB mlocked:285892kB kernel_stack:881128kB pagetables:4131924kB bounce:0kB free_pcp:280648kB local_pcp:1284kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 1 Normal free:82042336kB min:45064kB low:573476kB high:1101888kB active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB writepending:968kB present:536866816kB managed:528422156kB mlocked:609044kB kernel_stack:973480kB pagetables:4010400kB bounce:0kB free_pcp:271176kB local_pcp:1472kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 0 DMA: 24kB (U) 18kB (U) 116kB (U) 132kB (U) 364kB (U) 0128kB 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15872kB
Jun 15 06:45:14 host kernel: Node 0 DMA32: 5
4kB (UM) 38kB (M) 616kB (M) 632kB (M) 464kB (M) 6128kB (M) 5256kB (UM) 7512kB (UM) 91024kB (UM) 72048kB (UM) 6344096kB (M) = 2626636kB
Jun 15 06:45:14 host kernel: Node 0 Normal: 169974kB (UME) 192868kB (UM) 639816kB (UME) 200932kB (UME) 3764kB (UME) 192128kB (UME) 173256kB (UM) 53512kB (UM) 7521024kB (UME) 3562048kB (U) 136294096kB (M) = 57810820kB
Jun 15 06:45:14 host kernel: Node 1 Normal: 52417
4kB (UME) 378348kB (UME) 1751816kB (UME) 2570332kB (UME) 1020064kB (UME) 6514128kB (UME) 794256kB (UME) 755512kB (UE) 6851024kB (UE) 1562048kB (U) 188794096kB (M) = 82040852kB
Jun 15 06:45:14 host kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 15 06:45:14 host kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 15 06:45:14 host kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 15 06:45:14 host kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 15 06:45:14 host kernel: 63074873 total pagecache pages
Jun 15 06:45:14 host kernel: 23889255 pages in swap cache
Jun 15 06:45:14 host kernel: Swap cache stats: add 662423405, delete 638583187, find 85025462/143435592
Jun 15 06:45:14 host kernel: Free swap = 246458880kB
Jun 15 06:45:14 host kernel: Total swap = 1953588640kB
Jun 15 06:45:14 host kernel: 268396680 pages RAM
Jun 15 06:45:14 host kernel: 0 pages HighMem/MovableOnly
Jun 15 06:45:14 host kernel: 4232461 pages reserved
Jun 15 06:45:14 host kernel: 0 pages cma reserved
Jun 15 06:45:14 host kernel: 0 pages hwpoisoned

So perhaps related to space taken by bpf_jit, but also possibly related to my almost full swap (1.52TB/1.82TB) which I imagine could be fragmented. That is increasing in size shortly, so will retest after that. I still have plenty free RAM.

I have continued to try a few tweaks here and there with being able to boot more containers than 1350, but am still getting stuck. I have a thread on the proxmox forums looking for some advice, but no progress yet:

As I mention there, I see very few similar reports of this error, other than old issues with vmalloc on 32-bit. I reduced the footprint of zfs on the server by reducing zpools on the server from 71 to 10, thinking that might help, but no change at all. My last thoughts are around cgroups, but i am at a loss as to how to interpret the call trace, and what direction that points in.