Busybox free inside container

stgraber · June 5, 2019, 10:15pm

For containers, you need to trust that the kernel will properly isolate different processes.
In this instance, the memory limits are very much being enforced, the problem is that the application of the limits (cgroups) is different than the reporting of the resources (sysinfo syscall, /proc and /sys filesystems). lxcfs bridges between the two to try to show the actual limits to the container user, when it fails, the limits are very much still enforced but the reporting may be wrong inside the container.

For VMs, depending on your workload you may not have to trust the kernel in the VM so much, but now you have to trust your CPU quite a bit (spectre isn’t making that any easier), trust the virtual firmware and emulated devices and trust the hypervisor process in general to have set things up right for you.

Security issues have happened and will keep happening at all layers, so at the end of the day, it’s mostly a matter of how easy it is for you to know about and fix any such issue when they arise and decide if you want to mix and match to reduce the likelihood of being affected by requiring multiple things to be compromised.

It’s not uncommon to see deployments where one VM is used per user/tenant and containers are then used inside that VM, a full compromise would then require a container escape + VM escape to be able to reach other user/tenants on the system. But this obviously comes at a high management and to some extent hardware/resources cost.