Setting net.ipv4.ping_group_range inside an LXD container

Hi there,

I’ve installed Ubuntu 18.04 with LXD 3.0.1 from packages. I’ve created an unprivileged container and want to set net.ipv4.ping_group_range inside the container. This was used to be working on Ubuntu 16.04:

# uname -a
Linux kauz-hetz-srv14 4.4.0-127-generic #153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 1   0
# lxc exec minion sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 0   429296728
# lxc exec minion -- sysctl -w "net.ipv4.ping_group_range=0 429296729"
net.ipv4.ping_group_range = 0 429296729
#lxc exec minion sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 0   429296729

And fails with Ubuntu 18.04:

# uname -a
Linux kauz-hetz-srv15 4.15.0-24-generic #26-Ubuntu SMP Wed Jun 13 08:44:47 UTC 2018 x86_64   x86_64 x86_64 GNU/Linux
# sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 1   0
# lxc exec minion sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 65534       65534
# lxc exec minion -- sysctl -w "net.ipv4.ping_group_range=0 2000000"
net.ipv4.ping_group_range = 0 2000000
# lxc exec minion sysctl net.ipv4.ping_group_range
net.ipv4.ping_group_range = 65534       65534

The entry exists in /proc and seems to be writeable:

root@minion:/proc/sys/net/ipv4# ls -l ping_group_range
-rw-r--r-- 1 root root 0 Jul  3 13:56 ping_group_range

Is there any way to change this parameter for the container? Is this a kernel issue, an apparmor issue or a LXD configuration problem?

If it was an apparmor issue, I’d have expected an error from sysctl, so this may well be a kernel bug instead.

@tyhicks or @sforshee may be able to look into this for you.

Can you paste cat /proc/self/gid_map from inside the container?

From net/ipv4/sysctl_net_ipv4.c:

static int ipv4_ping_group_range(struct ctl_table *table, int write,
                                 void __user *buffer,
                                 size_t *lenp, loff_t *ppos)
{
...
        if (write && ret == 0) {
                low = make_kgid(user_ns, urange[0]);
                high = make_kgid(user_ns, urange[1]);
                if (!gid_valid(low) || !gid_valid(high) ||
                    (urange[1] < urange[0]) || gid_lt(high, low)) {
                        low = make_kgid(&init_user_ns, 1);
                        high = make_kgid(&init_user_ns, 0);
                }
                set_ping_group_range(table, low, high);
        }

        return ret;
}

So what this means is that if either the minimum or maximum GID value in the specified range is not valid inside of the user namespace, the kernel will (silently) set the sysctl’s value to the range of “1 0” from the init user namespace (IMO, it should be returning an error in this situation).

After the write has silently failed and you read back the sysctl value, the kernel does something silly by reporting that the min and max values of the GID range are the overflow gid (DEFAULT_OVERFLOWGID in the source code) since the actual sysctl value doesn’t map to a valid GID range inside the container. This is why you see 65534 65534 when reading the sysctl from inside the 18.04 container.

I suspect that in your 16.04 container, 429296729 is a valid GID and that your 18.04 container is configured differently in a way that 2000000 is not a valid GID inside the container.

@stgraber’s request for the gid_map contents will give us useful information. Please include the gid_map contents from both containers.

Thanks a lot, your hint saved me! Although I don’t know where’s the faulty configuration key here.

Ubuntu 16.04:

# lxc exec minion bash
root@minion:~# cat /proc/self/gid_map 
         0    1000000 1000000000

Ubuntu 18.04:

# lxc exec minion bash
root@minion:~# cat /proc/self/gid_map 
         0     100000      65536

Choosing a smaller range works:

root@minion:~# echo "0 65535" > /proc/sys/net/ipv4/ping_group_range 
root@minion:~# cat /proc/sys/net/ipv4/ping_group_range 
0       65535

My guess is that your 16.04 system is using the snap and your 18.04 system is using the deb, that’d explain the difference in range size.

On your 18.04 system you could edit /etc/subuid and /etc/subgid and bump from 65536 to 1000000000 which would then match the snap setup (will require a restart of the LXD daemon).

I’ve submitted an upstream kernel fix for this issue that would have made the situation easier for @oms-kauz to debug by making it clear that the sysctl value being written was invalid:

https://lore.kernel.org/lkml/1530816563-4478-1-git-send-email-tyhicks@canonical.com/

Thanks for you help, I’ve adjusted the range inside my container and it is working fine for me.

Nice! Click the Solved button to mark this thread as completed.

FYI, the kernel patch has been accepted upstream. I’m glad we were able to chase this problem down. :slight_smile: