Limits.kernel.memlock cannot exceed 16777216

I’m trying to set limits.kernel.memlock to a value higher than the default. Obviously I’m doing something wrong, since I cannot get is work. Lowering the value does work.

# lxc --version
3.13

Both host and container are bionic:
root@host:~# lsb_release -c
Codename: bionic
root@limits:~# lsb_release -c
Codename: bionic

# cat /etc/systemd/system/snap.lxd.daemon.service.d/override.conf 
[Service]
LimitMEMLOCK=infinity

Two examples:
lxc config set limits limits.kernel.memlock $[10 * 1024 * 1024]
lxc config set limits limits.kernel.memlock $[20 * 1024 * 1024]

It looks like the parent processes (lxd service and monitor) have the right limit for memlock.

# ps axf
...
21756 ?        Ss     0:00 /bin/sh /snap/lxd/10756/commands/daemon.start
21870 ?        Sl     0:14  \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
...
19239 ?        Ss     0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers limits
19258 ?        Ss     0:00  \_ /sbin/init
...

lxd service seems OK:
# prlimit -l -p 21870
RESOURCE DESCRIPTION SOFT HARD UNITS
MEMLOCK max locked-in-memory address space unlimited unlimited bytes
monitor process seems OK:
# prlimit -l -p 19239
RESOURCE DESCRIPTION SOFT HARD UNITS
MEMLOCK max locked-in-memory address space unlimited unlimited bytes
Container with 10MB OK (/sbin/init):
# prlimit -l -p 19258
RESOURCE DESCRIPTION SOFT HARD UNITS
MEMLOCK max locked-in-memory address space 10485760 10485760 bytes
Container with 20MB only has 16MB (/sbin/init):
# prlimit -l -p 19217
RESOURCE DESCRIPTION SOFT HARD UNITS
MEMLOCK max locked-in-memory address space 16777216 16777216 bytes

Any idea what I’m doing wrong?

Did you try with an Alpine container to see if that behaves differently?

We’ve seen cases where systemd in the container reduces its prlimits on startup which could explain what you’re seeing here.

You’re right (as allways :slight_smile: ).

11417 ?        Ss     0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers alpine
11436 ?        Ss     0:00  \_ /sbin/init
# prlimit -l -p 11436
RESOURCE DESCRIPTION                             SOFT      HARD UNITS
MEMLOCK  max locked-in-memory address space unlimited unlimited bytes

So, my quest is now how to make systemd not dropping resources to defaults…

It looks like this is a special ubuntu issue.

I’ve asked around the systemd-devel list and the response is:

It seems that systemd drops rlimit_memlock on startup. Correct? And if
so, is it configurable?

No. It actually raises it to 64M if it can:

https://github.com/systemd/systemd/blob/master/src/core/main.c#L1380

Before doing so it will save the original setting though and that’s
what it tries to pass to services invoked as default too. Thus,
RLIMIT_MEMLOCK should really just be bumped for PID 1 itself, and only
if privileges allow.

That’s correct for the github code (I checked that) and thus I compiled the code (with minimal config/libs) and tested it:
10969 ? Ss 0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers limits
10987 ? Ss 0:00 _ /sbin/init

And behold:
# prlimit -p 10987 -l
RESOURCE DESCRIPTION SOFT HARD UNITS
MEMLOCK max locked-in-memory address space 20971520 20971520 bytes

It’s not special ubuntu, but just outdated code.
Get source and compile:
apt build-dep systemd ; apt source --compile systemd

The bionic code has a fixed setting for memlock in src/core/main.c @ bump_rlimit_memlock:

r = setrlimit_closest(RLIMIT_MEMLOCK, &RLIMIT_MAKE_CONST(1024ULL*1024ULL*16ULL));

This is changed in August 2018 to 64MB:

commit 91cfdd8d29b353dc1fd825673c9a23e00c92a341
Author: Roman Gushchin <guro@fb.com>
Date:   Thu Aug 23 10:46:20 2018 -0700

    core: bump mlock ulimit to 64Mb
    
    Bpf programs are charged against memlock ulimit, and the default value
    can be too tight on machines with many cgroups and attached bpf programs.
    
    Let's bump it to 64Mb.

    diff --git a/src/core/main.c b/src/core/main.c
    index ce45f2ded2..88656dcabf 100644
    --- a/src/core/main.c
    +++ b/src/core/main.c
    @@ -1201,7 +1201,7 @@ static int bump_rlimit_memlock(struct rlimit *saved_rlimit) {
             if (getrlimit(RLIMIT_MEMLOCK, saved_rlimit) < 0)
                     return log_warning_errno(errno, "Reading RLIMIT_MEMLOCK failed, ignoring: %m");
     
    -        r = setrlimit_closest(RLIMIT_MEMLOCK, &RLIMIT_MAKE_CONST(1024ULL*1024ULL*16ULL));
    +        r = setrlimit_closest(RLIMIT_MEMLOCK, &RLIMIT_MAKE_CONST(1024ULL*1024ULL*64ULL));
             if (r < 0)
                     return log_warning_errno(r, "Setting RLIMIT_MEMLOCK failed, ignoring: %m");

Nowadays it’s a bit more sophisticated. See:

commit cda7faa9a5ae2fa1ebc27b08e84d5ce62e46e37b
Author: Lennart Poettering <lennart@poettering.net>
Date:   Wed Jan 16 18:05:14 2019 +0100

    main: dont bump resource limits if they are higher than we need them anyway
    
    This matters in particular in the case of --user, since there we lack
    the privs to bump the limits up again later on when invoking children.

We will need a backport of this in case this should be fixed for bionic. In xenial this problem doesn’t exist.

Created a bug report wit patch.

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1830746