Runc OCI bundle container or systemd-nspawn instead of Debootstrap

michacassola · August 21, 2022, 3:42pm

What I run (as UID 0):

# mkdir /focal1
# debootstrap --variant=buildd focal /focal1 http://archive.ubuntu.com/ubuntu/
mknod: /focal1/test-dev-null: Operation not permitted
E: Cannot install into target '/focal1' mounted with noexec or nodev

Unfortunately I need more priviledges on the container I guess.
Which ones do you think I need?

stgraber · August 22, 2022, 4:45am

Try setting security.syscalls.intercept.mknod=true and reboot the container.
That may be enough to let this go through.

michacassola · August 22, 2022, 6:20am

Thanks!
I ended up being to impatient and just downloading and unpacking ubuntu base into the folder.

I am using runc (crun after testing) to get a container from an “OCI bundle” which can be just a folder with everything in it, no overlay hell on ZFS. I am doing this as I could not get DIR based LXD inside LXD to work and I need an application container inside my ZFS containers, or rather several of them.
And all of this becauses nobody got the idea to make it possible in LD to force a binary to only search in 1 folder for its dynamic libraries and trust me I tried all the -WL,-rapth= and so on stuff I could find. I guess this is why app containers were invented in the first place.

dontlaugh · August 25, 2022, 11:09pm

Would you mind sharing the important bits of your script to turn a directory into an OCI container?

michacassola · August 26, 2022, 9:23am

For sure, I am still writing the script, can share it here when it’s done. Will take me a while though.

The theory is quite simple:

Inside a folder with any name, let’s say “container” you make a folder “rootfs” (can be changed in the config to anything you’d like, but who cares)
In rootfs you put your minimal linux system (I also saw an article that you dont need any kind of system, can be just a binary, will have to try that out though, and I guess dynamically linked binaries will want ld-linux-x86-64.so.2 and all the necessary libs)
Of course you also put the app you want to run in the minimal system
In the folder “container” you run “runc spec”, which generates the basic config.json
In that json file you can configure bind mounts, grant additional permissions (I needed CAP_SETUID and CAP_SETGID for PHP-FPM) and so on. There is also a setting to make rootfs mutable.
Inside the container folder where config.json and rootfs are next to each other you just run runc run container and you should have a working container

For the minimal system I would suggest these due to advanced cpu arch support: https://partner-images.canonical.com/oci/
They are similar to the Ubuntu base image.
I wouldn’t go with Alpine because of musl-libc, which I read is less performant. And who cares about 40MB extra these days.

I still have to figure out if one can run different binaries in different containers form the same base OCI bundle folder…

dontlaugh · August 26, 2022, 6:16pm

Thanks for providing some details. I’m always looking for alternatives, because I really don’t like Dockerfiles.

I like this more chroot-ish approach. I’ve used it with systemd-nspawn, and I’m pleased to learn you can do it with runc. I would like to explore using an ostree repository to have git-like management of the binaries in the chroot. If I ever figure out ostree, that is.

michacassola · August 26, 2022, 7:11pm

I read about systemd-nspawn, how do you use it? Is it simple and can you access a UNIX socket inside it from the host?

dontlaugh · August 26, 2022, 7:25pm

I’ve never used the bind mount options myself, but it’s possible systemd-nspawn

I wouldn’t call it simple, per se. But it’s not hard to use. Become root and run systemd-nspawn -D ./yourchroot.

I haven’t used it for a while, but if I would consider it if I was running in an environment with an investment in systemd-networkd configs. nspawn has been around for a long time.

The security features are actually a lot easier to use than Docker, imo. They have a lot of the systemd-ish stuff built in: private tmpfs, restricted network, etc. Most of the security features are basically the same as what’s in regular unit files. See this talk [ENG] Lennart Poettering: "Containers without a Container Manager, with systemd" - YouTube

michacassola · August 26, 2022, 7:38pm

Thanks! Will have to give it a closer look and see which option is better for me. Watching the Video now.

I guess the pro side would be at first, that you don’t have to download another binary, because it’s built into systemd?

dontlaugh · August 26, 2022, 8:01pm

It depends on the Distro. Some have it built in, others require another package download.

michacassola · August 26, 2022, 8:09pm

Yeah, ok. Another thing will be performance, where I guess both should be similar. And which is better for ARM device support which I would like to add in the future to my stackbuilder.

michacassola · September 4, 2022, 11:34am

@dontlaugh In his talk 11 years ago Lennart Poettering does mention not to use nspawn in production.
Has that changed over the years?
https://www.youtube.com/watch?v=s7LlUs5D9p4&t

michacassola · September 4, 2022, 11:36am

Sorry, I was too quick to post, later he did deem it ready for production: linux - Why is systemd-nspawn not appropriate for production deployments? - Unix & Linux Stack Exchange

michacassola · September 4, 2022, 12:52pm

So one big disadvantage: You have to install systemd-container, but when testing in an 18.04 lxc for backwards compatibility I got the following error:

The following packages have unmet dependencies:
 systemd-container : Depends: systemd (= 237-3ubuntu10.53) but 237-3ubuntu10.54 is to be installed

So dependency issues with “managed” packages…
Wait, what problem did I want to solve again? Sorry, couldn’t resist.

So I will continue looking into runc as it comes as a simple download from github as a precompiled static binary. Seems less trouble than to downgrade systemd with aptitude. Must be a reason for that update to .54