Why cgroups are setup as they are

With lxc v2.1, I’d like to understand why cgroups are set up the way they are.

For example, so far as I can tell, even if I provide lxc.cgroup.dir (not the root cgroup), lxc-start collects “root” cgroup information and uses it to set up the cgroup for the new container. E.g., it copies the cpuset.cpus from the root cgroup to the new container cgroup. Is there a reason for doing this as opposed to using/inheriting the settings from the cgroup specified in lxc.cgroup.dir?

So far as I can tell, the current behavior assumes that lxc has full control over the cgroups involved in supporting the new lxc container, including each cgroup/dir in the lxc.cgroup.dir path. This does not play well with systems/situations in which a dedicated part of the cgroup tree set up for lxc containers. In fact, it breaks them.

Do I need to patch my builds of lxc to avoid this behavior?

It would be helpful if some of the lxc devs would comment on this.

Yes, so far LXC will assume that it has control over the cgroup tree it resides in. If you run fully unprivileged containers it will stick to the cgroup it is currently located it. If LXC runs as root it will escape to the root cgroup. lxc.cgroup.dir will just create the corresponding cgroup relative to the cgroups the user resides in (if unprivileged) or relative to the root of the cgroup tree.

Now, you’re saying that you want to be able to tell LXC under which existing cgroup it is supposed to create the cgroup for the container?

Yes, I want to tell lxc where and I want it to only touch the cgroup specified in lxc.cgroup.dir. It should not touch anything from above (i.e., from a parent cgroup/dir), nor should it copy/use anything except from the specified cgroup.

For example given a cgroup at jobs/123. It is preconfigured (for cpus, mems, etc) for all the controllers. In the config file, I set “lxc.cgroup.dir = jobs/123”. lxc is free create its own cgroup under there (such as jobs/123/123, but exactly what name I don’t care) and must use only the settings of jobs/123. So, if jobs/123/cpuset.cpus is 1-4, that should be used/copied to jobs/123/123/cpuset.cpus. To make this simple, it makes sense that jobs/123/cgroup.clone_children is set to 1 (and it is preconfigured to be so). Also, jobs.cgroup.clone_children is set to 0 and needs to remain 0. Currently, lxc changes/sets jobs.cgroup.clone_children to 1, which is problematic.

Also, I’d like to avoid setting lxc.cgroup.<controller> in the config.

After looking through the code (for v2.1), I’ve narrowed my search down to create_path_for_hierarchy() in cgfsng.c.

For the sake of this example, assume that lxc.cgroup.prepared is a valid configuration item and takes a “1” to indicate that the cgroup pointed to by lxc.cgroup.dir has been prepared (i.e., it exists for all hierarchies and has the appropriate settings, ready to be cloned/inherited by a child cgroup). If I have an implementation like the following, I seem to get the behavior I am looking for:

static bool create_path_for_hierarchy(struct hierarchy *h, char *cgname)
{
    char *cgroup_prepared = "1"; // should be set in the config

    h->fullcgpath = must_make_path(h->mountpoint, h->base_cgroup, cgname, NULL);
    if (dir_exists(h->fullcgpath)) { /* it must not already exist */
            ERROR("Path \"%s\" already existed.", h->fullcgpath);
            return false;
    }
    if ((cgroup_prepared == NULL) || (strcmp(cgroup_prepared, "1") != 0)) {
            DEBUG("lxc.cgroup.prepared is not 1");
            if (!handle_cpuset_hierarchy(h, cgname)) {
                    ERROR("Failed to handle cgroupfs v1 cpuset controller.");
                    return false;
            }
    } else {
            DEBUG("lxc.cgroup.prepared is 1");
    }
    return mkdir_p(h->fullcgpath, 0755) == 0;
}

Does this seem correct to you or might I be missing something?

That amounts to special-casing the cpuset controller in a legacy cgroup hierarchy. If we’re going to do this we need a generic solution. I’ll come back with an idea or feel free to suggest one.

Thanks for update/feedback. I will also take a look.

Any update on a generic solution? I.e., that a path to a cgroup could be used as the base for an LXC container to use?

I’m tracking this in https://github.com/lxc/lxc/issues/2501
I need to check whether I already have a patch. I thought I do in one of my branches. But patches welcome.

Thanks