Need help debugging lxc container on Rocky Linux 9 and Ubuntu 24

Hi all,

I was hoping for some thoughts/comments based on my recent experience with LXC (not LXD or Incus).

I worked with LXC/LXD/Incus on various Ubuntu versions and Proxmox, and then decided to apply some of the experience to a Rocky Linux 9 host. I created a LXC image from a VM (for a variety of reasons I need a RHEL5 image), and I couldn’t get it to run on RL, it keeps failing with this error:

lxc-start centos5-base 20240924185128.959 NOTICE   start - ../src/lxc/start.c:start:2194 - Exec'ing "/sbin/init"
lxc-start centos5-base 20240924185128.959 ERROR    start - ../src/lxc/start.c:start:2197 - No such file or directory - Failed to exec "/sbin/init"

I first thought this was due to the image and the way I got it, so I decided to move the image to one of my Ubuntu 24.04 hosts (where I have running/working LXC images). To my surprise, it worked fine there.

After lots of browsing I got no closer to explaining this phenomenon, so I decided to ditch RL and try the same image on a fresh Ubuntu 24.04 host, but (also somewhat to my surprise) it failed on this Ubuntu host with the exact same error.

I can share the logs if anybody is interested in reading lines and lines of output, but I think the fundamental difference between the “working” and “non-working” hosts is expressed in this output in the log:

On the “working” host:

lxc-start centos5-base 20240924010944.124 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:unified_hierarchy_delegated:3467 - Permission denied - The cgroup.threads file is not writable, skipping unified hierarchy

On the “non-working” hosts:

lxc-start centos5-base 20240924183451.939 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_create:1391 - The monitor process uses "lxc.monitor.centos5-base" as cgroup
lxc-start centos5-base 20240924183451.939 ERROR    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgfsng_delegate_controllers:3341 - Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 13
lxc-start centos5-base 20240924183451.947 INFO     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_create:1499 - The container process uses "lxc.payload.centos5-base" as inner and "lxc.payload.centos5-base" as limit cgroup

My conclusion - for some reason it works on the host where the unified hierarchy is not enabled/detected/used (per the permission error). Which makes somewhat sense since older kernels don’t work (right) on “pure” cgroup2 machines (at least that’s what I gathered from various other/older posts). I have a hard time keeping all the terminology straight in my head so that may not be correct in any way though.

What I want to ask about and have comments on is:

  • What can I look for on my “working” machine that would cause the “Permission denied” error above? My browsing to date hasn’t given me many clues to go by.
  • What tricks are there to ascertain if a host is running cgroup1, cgroup2, or some combination of the two? I assume my RL and “fresh” Ubuntu 24 hosts are cgroup2, and my “working” host is either cgroup1 or some combination of the two, I just don’t know how to tell the difference?

Any thoughts on how to assess the state of these hosts with respect to the “cgroup-ness” would be appreciated.

Thanks!

I just figured out that, while curious, this “error” does not appear to be related to the overarching problem. I found that this “permissions” problem is only logged when I SSH into the host and start the container. When I run lxc-start from the console, it doesn’t happen.

In either case - whether this permission error is logged or not, the lxc container won’t start since it claims it can’t find /sbin/init.

Heavens! That was a tricky one for sure.

The error:

lxc-start centos5-base 20240924185128.959 NOTICE   start - ../src/lxc/start.c:start:2194 - Exec'ing "/sbin/init"
lxc-start centos5-base 20240924185128.959 ERROR    start - ../src/lxc/start.c:start:2197 - No such file or directory - Failed to exec "/sbin/init"

Is misleading at best. I copied the same image to a variety of different hosts to try and get my head wrapped around the reason why it works on one, but not another.

Eventually I moved it to a machine where by accident I figured out what I was doing wrong all along… On this one host the image would run, but died immediately complaining about a missing shared library.

I then realized that the configuration I created for the image mounted /lib and /usr/lib from the host OS. Once I fixed that, the image runs successfully.

So it wasn’t anything to do with “no such file or directory” as the error seemed to indicate, but init in the image trying to execute with incompatible libraries.

1 Like