Busybox free inside container

s4uliu5 · June 4, 2019, 10:17am

Hello,

inside a container busybox “free” command reports all/full host memory, even when “limits.memory” is set for container.

c1:~# free
              total        used        free      shared  buff/cache   available
Mem:         250000       31312      199232           0       19456      218688
Swap:      16777212           0    16777212

c1:~# busybox free
             total       used       free     shared    buffers     cached
Mem:      16426616    4272120   12154496      34284      26352      19456
-/+ buffers/cache:    4226312   12200304
Swap:     16777212          0   16777212

Just wondering does this issue worth reporting/fixing.

Any opinions?

stgraber · June 4, 2019, 11:09pm

Busybox doesn’t use /proc/meminfo for that information and so bypasses lxcfs.
Depending on how it is getting that information, this may not be fixable by lxcfs.

s4uliu5 · June 5, 2019, 7:58am

Does it mean that it is not possible to hide host’s memory information from inside a container?

gpatel-fr · June 5, 2019, 8:13am

Until the sysinfo system call get containerized, that would be true it seems (and that’s not really a problem that has just been discovered)

in fact ‘man sysinfo’ is not accurate these days, it states that sysinfo publishes info available in /proc/meminfo, but it’s not true anymore as /proc/meminfo is fixed for containers while sysinfo is not.

s4uliu5 · June 5, 2019, 8:21am

Good to know. Thank you.

gpatel-fr · June 5, 2019, 8:59am

I should have added that if your life depends on it you can actually disable the sysinfo system call in a container, but it could have side effects of course

Skaperen · June 5, 2019, 6:16pm

sounds like a kernel thing. just how secure are we to expect containers to be? if something really needs tight security, do we need to revert to emulation?

stgraber · June 5, 2019, 10:15pm

For containers, you need to trust that the kernel will properly isolate different processes.
In this instance, the memory limits are very much being enforced, the problem is that the application of the limits (cgroups) is different than the reporting of the resources (sysinfo syscall, /proc and /sys filesystems). lxcfs bridges between the two to try to show the actual limits to the container user, when it fails, the limits are very much still enforced but the reporting may be wrong inside the container.

For VMs, depending on your workload you may not have to trust the kernel in the VM so much, but now you have to trust your CPU quite a bit (spectre isn’t making that any easier), trust the virtual firmware and emulated devices and trust the hypervisor process in general to have set things up right for you.

Security issues have happened and will keep happening at all layers, so at the end of the day, it’s mostly a matter of how easy it is for you to know about and fix any such issue when they arise and decide if you want to mix and match to reduce the likelihood of being affected by requiring multiple things to be compromised.

It’s not uncommon to see deployments where one VM is used per user/tenant and containers are then used inside that VM, a full compromise would then require a container escape + VM escape to be able to reach other user/tenants on the system. But this obviously comes at a high management and to some extent hardware/resources cost.