The containers micro-conference is a half day event organized as part of the Linux Plumbers Conference.
This year’s edition will be next week in Los Angeles, CA. (13-15 of September 2017).
+----------------------------------------------------+----------------------------------+--------+ | TITLE | PRESENTER | LENGTH | +----------------------------------------------------+----------------------------------+--------+ | Welcome to the containers micro-conference | Stéphane Graber | 5min | | Exposing resource limits to containers with LXCFS | Serge Hallyn | 15min | | AppSwitch: Application Level Network Namespacing | Dinesh Subhraveti | 15min | | Namespacing and Stacking the LSM | John Johansen & Casey Schaufler | 15min | | Running OCI containers with LXC | Serge Hallyn | 15min | | CGroup v2 and its impact for containers | Christian Brauner | 20min | | Namespacing IMA | Stefan Berger | 15min | | Defensively designed container runtimes | Christian Brauner & Aleksa Sarai | 15min | | UID/GID shifting filesystem overlays | James Bottomley | 15min | | Privileged actions in unprivileged containers | Stéphane Graber | 20min | +----------------------------------------------------+----------------------------------+--------+
The usual introduction talk.
This will introduce the containers micro-conference, the format for this year’s edition, make sure someone is taking notes and then introduce the first speaker.
Introduction to the problem of cgroup limits visibility to userspace and the approach taken by LXCFS.
There are some difficult challenges ahead for LXCFS, if it chooses to accept them:
- implementing support for /sys/devices/system/cpu
- addressing loadavg
- whether/how to help address sysinfo(2) system call (not in scope for fs, but in scope for LXCFS’s missions).
In this talk we’d like to spend some time discussing possible approaches to each of these three, as well as, maybe, possibly, encouraging future contributors to join in.
This talk presents AppSwitch, a completely new TCP-layer network element similar to a router at IP layer or a bridge at link layer, that addresses a number of issues in modern environments, especially ones based on microservices.
This talk presents AppSwitch, a completely new network element that operates at TCP layer, similar to a router at IP layer or a bridge at link layer. The key idea is that it decouples applications from underlying network at the system call layer rather than at the network device or packet level as traditional overlay mechanisms do. That provides the applications a distinct identity independent of the host and provides several advantages including significantly more efficient implementation of application-level network functions such as application firewall and load balancer, reduced operational cost and complexity by minimizing unnecessary friction between applications and operations teams, ability to seamlessly run applications across heterogeneous infrastructure backends including bare metal, VMs, containers and cloud, and improved performance by selecting most suitable network medium. It would also effectively remove the performance penalty associated with unnecessary data path processing that is typical in microservice application environments.
In this session, we’d like to discuss the kernel support required to implement AppSwitch. There are a few candidate approaches that could be considered. It could be made to work with tracepoints but the implementation turns out somewhat hacky. Extending Netlink is cleaner but requires deeper changes.
Making Linux Security Modules available to containers.
Containers would like to be able to make use of Linux Security
Modules (LSMs), from providing more complete system virtualization
to improving container confinement. To date containers access to the
LSM has been limited but there has been work to change the situation.
This presentation will discuss the current state of LSM namespacing
and stacking. The work being done on various security modules to
support namespacing, the infrastructure work being done to improve the
LSM, followed by an examination of the remaining problems.
Can LXC be used to run OCI application container images?
For years, LXC has pushed the envelope of Linux containers support by focusing on system containers. In the meantime it has ignored its original use case, application containers. Thanks to work by Docker, CoreOS, and many others, we are at a point now where we can re-use existing standards, formats, and tools to enhance application container support in LXC We will present here some small enhancements which are in progress:
- OCI container creation template
- Network support for application containers which come without network admin tools
but the main point of this discussion will be to get feedback on other things lxc needs to do to improve application container support.
CGroup V2 is pretty different from CGroup V1 and the two can’t fully operate in parallel leading to problems running containers which only support one or the other.
With the release of kernel 4.5 the new cgroupfs v2 API was declared non-experimental. But the missing feature parity between cgroupfs v2 with cgroupfs v1 makes it nearly impossible for container runtimes to use it. Especially before the cpu controller is merged, no runtime is expected to switch to it by default. Nonetheless cgroupfs v2 is slowly making its way into various distributions. This brings with it a new set of problems and challenges which container runtimes must tackle. For example, one of the core problems container runtimes will have to face is how to support running cgroupfs v1 hierarchies inside a container while the host is running a cgroupfs v2 hierarchy and vica versa. This talk will try to outline some of these problems more clearly, and suggest possible solutions and hopefully inspire a fruitful discussion that leads to further solutions or at least helps to identify and specify various problems more clearly.
Current status of namespacing of the Linux Integrity Measurement Architecture.
In this talk we will give a high level overview of namespacing the Linux Integrity Measurement Architecture (IMA), present use cases for namespacing IMA, and discuss the current challenges. We will talk about the individual parts of this subsystem that we need to modify and show a demo of the current status of a container making use of namespaced IMA.
In this talk we will present current roadblocks to a more defensive design that affect all container runtimes.
In contrast to other operating systems the Linux kernel does not attempt to define what a container is. Rather than implementing a first-class container object in the kernel itself Linux exposes various interfaces to user space that can be combined in various ways to define a container. This gives user space the flexibility to work with different container concepts and is one explanation for the variety of container runtimes out there. This liberty is one of the strengths of Linux when implementing a container runtime, and has lead to the usage of these interfaces in non-container contexts such as web browsers. However, a lot of the interfaces that are used in creating containers were not necessarily designed with containers in mind. Additionally, it is not always obvious how the various containerization interfaces are to be combined to guarantee a secure container runtime implementation. The consequence of this conceptual liberty is that there are multiple classes of theoretical vulnerabilities that affect container runtimes such as runC and LXC. In this talk we will present current roadblocks to a more defensive design that affect all container runtimes. Furthermore, we will look at specific roadblocks that are inherent to the design of LXC and runC to open the floor to discussions on topics of runtime design as well as discussions of alternative solutions to some of the roadblocks identified in this talk. In addition, we hope to open the door to even more discussions between container runtimesand kernel containerization primitives.
An update on shiftfs and the next steps for remapping overlay filesystems.
Containers using the user namespace have a different view of uid/gid than processes in the initial namespace.
This makes it difficult to share data between the initial namespace and other user namespaces as any unmapped uid/gid will show as -1.
Shiftfs is an overlay filesystem which remaps UIDs and GIDs transparently for you.
This presentation will go over the current state of shiftfs and the next steps needed to get it or something like it merged in the upstream Linux kernel.
How to selectively allow privileged actions from otherwise unprivileged containers?
Unprivileged containers, that is, containers which use a user namespace to map their UIDs and GIDs to an unprivileged range, have very limited kernel privileges.
Those privileges are no higher than what a normal unprivileged user would have on the system and are restricted to the container’s namespaces.
Anything which isn’t owned by the container and isn’t part of a namespace will typically be rejected by the kernel.
Some common examples include:
– Creating device nodes
– Mounting filesystems
– Setting up loop devices
– Raising a process nice level
– Raising the OOM score of a process
In this presentation, we’ll go over the most common sources of frustration and look at a few approaches that could be used for the container runtime to decide whether those privileged actions should be allowed or not.
We hope that anyone planning to attend the Linux Plumbers Conference is already registered and booked their travel and accommodation by this time.
But should someone be planning to attend last minute, there are still tickets available here: https://www.regonline.com/registration/Checkin.aspx?EventID=1919539
Note that registration to the Open Source Summit or Linux Security Summit doesn’t get you access to Linux Plumbers. While all those events are happening at the same venue during the same week, they each have their own separate registration process.
See you all in Los Angeles!