Setting capabilities to the LXC containers

I am installing BigBlueButton inside an LXC container. However there are some services that fail to start without modifications (they obviously work fine in a normal, non-virtualized installation).
The error in all the cases is: status=214/SETSCHEDULER
The options that need to be commented out are

CPUSchedulingPolicy=fifo

and

IOSchedulingClass=realtime
IOSchedulingPriority=2
CPUSchedulingPolicy=rr
CPUSchedulingPriority=89

When they are commented out the services start without problems.

I am not a kernel expert, but it seems like the LXC containers need the capability CAP_SYS_NICE in order to be able to set a realtime scheduler.

I tried to set it to my container with a command like this:

lxc config set BBB raw.lxc=lxc.cap.keep=sys_nice

However the container wouldn’t start afterwards. The command lxc console BBB --show-log displays:

Failed to mount tmpfs at /dev/shm: Operation not permitted
Failed to mount tmpfs at /run: Operation not permitted
Failed to mount tmpfs at /run/lock: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...

Maybe this is not the right way to do it.
Do you have any suggestions?

Another question (not necessarily related to LXD):

How can I check whether the capability CAP_SYS_NICE is enabled or not in a Linux system (which can be an LXC container, a VPS, or something else)?

I can make the container privileged, like this:

lxc config set BBB security.privileged=true

In this case the services can start without modifications.
But this has drastic security implications (I guess). So, I would prefer to grant only the minimum of necessary permissions (in this case the capability CAP_SYS_NICE), if possible.

Also, a way to find out whether this capability is already granted inside a container or not, would be useful.

As a workaround I am using something like this:

# If CAP_SYS_NICE is not available, then the FreeSWITCH systemctl service
# will fail to start, with an error message like "status=214/SETSCHEDULER".
# In this case we need to modify this service so that it does not require a realtime scheduler.
# A similar modification needs to be done to a couple of other services as well,
# like: bbb-html5-frontend@.service, bbb-html5-backend@.service and bbb-webrtc-sfu.service
check_cap_sys_nice() {
  # if we don't detect a SETSCHEDULER error message in the status of the service,
  # then there is nothing to be modified/customized
  { systemctl status freeswitch | grep -q SETSCHEDULER; } || return
  
  # override /lib/systemd/system/freeswitch.service so that it does not use realtime scheduler
  mkdir -p /etc/systemd/system/freeswitch.service.d
  cat <<HERE > /etc/systemd/system/freeswitch.service.d/override.conf
[Service]
IOSchedulingClass=
IOSchedulingPriority=
CPUSchedulingPolicy=
CPUSchedulingPriority=
HERE

  # override /usr/lib/systemd/system/bbb-html5-frontend@.service
  mkdir -p /etc/systemd/system/bbb-html5-frontend@.service.d
  cat <<HERE > /etc/systemd/system/bbb-html5-frontend@.service.d/override.conf
[Service]
CPUSchedulingPolicy=
HERE

  # override /usr/lib/systemd/system/bbb-html5-backend@.service
  mkdir -p /etc/systemd/system/bbb-html5-backend@.service.d
  cat <<HERE > /etc/systemd/system/bbb-html5-backend@.service.d/override.conf
[Service]
CPUSchedulingPolicy=
HERE

  # override /usr/lib/systemd/system/bbb-webrtc-sfu.service
  mkdir -p /etc/systemd/system/bbb-webrtc-sfu.service.d
  cat <<HERE > /etc/systemd/system/bbb-webrtc-sfu.service.d/override.conf
[Service]
CPUSchedulingPolicy=
HERE

  systemctl daemon-reload
}

Just in case someone else has a similar problem.

You may find the syscall interception feature helps here:

https://linuxcontainers.org/lxd/docs/master/syscall-interception/#sched-setscheduler

1 Like

This is really useful. I tried something like this:

lxc config set BBB \
    security.syscalls.intercept.sched_setscheduler=true
lxc restart BBB

And the error message status=214/SETSCHEDULER was gone.

However now I get another error message: status=211/IOPRIO.
I believe that it is related to CAP_SYS_ADMIN (which is almost the same as a privileged container) and it is caused by these systemd config lines:

IOSchedulingClass=realtime
IOSchedulingPriority=2

When I comment them out the error is gone.