LXC 3.0.3 processes killed by nginx restart

Hello all,

I have recently been upgrading a set of config managed servers from Ubuntu 14 -> 16 -> 18

This required upgrades to nginx/passenger and lxc (from 2 to 3.0.3)

Having updated my lxc containers’ configs, most things are working nicely but there is one unexpected problem in that the containers are destroyed on all but 1 server when nginx is restarted.

The difference between the 1 “working” and other “failing” servers appears to be an entry in the journalctl:

nginx.service: Killing process 13743 (lxc-start)

As far as I can tell, all the relevant configs (nginx,passenger,lxc) match and I have ran out of ideas of where to look next.

Has anyone else run into (and hopefully solved) into this issue? I am sure this must be a config issue somewhere but I am not sure where else to look so any advice will be greatly appreciated.

Relevant versions:
Nginx: 1.14
Phusion Passenger: 6.0.2
LXC: 3.0.3
OS: Ubuntu 18.04.2 Bionic Bear

What’s starting your LXC container?

Hey, thanks for taking an interest.

It’s a Node app that just executes lxc commands as a child-process. The Node app is maintained using nginx+passenger

Ok, so I think your probably is systemd.

Back when you were on 14.04, you were on upstart which would manage the main process (nginx) but not care about subprocesses so much.

Systemd is different as it uses a cgroup per service and then restarts then entirety of that cgroup on restarts, which in your case, includes your LXC containers.

You should be able to fix that by doing a systemd override for that unit (systemctl edit nginx) and modifying the KillMode to process (you may want to look at systemd man for details).

I think you are definitely onto something there

I don’t see any difference between the working and failing systemctl files for nginx but setting KillMode to process resolves the issue.

If I run systemctl status nginx I still see some references to lxc that aren’t present in the other working server.

e.g:

   CGroup: /system.slice/nginx.service
           ├─ 8234 [lxc monitor] /var/lib/lxc 302323
           ├─ 8814 [lxc monitor] /var/lib/lxc 302085
           ├─ 9364 [lxc monitor] /var/lib/lxc 302089
           ├─ 9531 [lxc monitor] /var/lib/lxc 302321
           ├─ 9871 [lxc monitor] /var/lib/lxc 302115

I’ve no idea how I have managed to set up different cgroups but at least for now I have a workaround and plenty of reading. Thanks for the help Stéphane :+1:

I have tried switching the systemd killmode to process, but this results in the service complaining about about left over processes for passenger and lxc, which doesn’t seem like the correct solution.

Having dug deeper into this, we have two different kernel versions running on ubuntu 18. The server that still behaves as expected is on kernel 4.5, the servers that behave unexpectedly are on 4.15. The differences between them are not just the use of systemd, but also cgroups v1 vs cgroup v2.

Running systemctl status on the 2 servers shows what seems to be the problem, lxc monitor processes live under nginx’s cgroup instead of under the user cgroup.

I’ve not been able to find any way to determine what is controlling these processes or how to change this.

Below is the output of systemctl status on 2 different servers. I’d really appreciate any guidance on where to look to understand these processes and how they are controlled.

Kernel 4.5 w CGroup v1 and upstart:

CGroup: /
           ├─lxc
           │ └─container_name
           │   ├─4846 sh -c dhclient eth0; sudo -u appuser HOME=/home/app /bin/sh -c 'cd /home/app; NODE_ENV=producti
           │   ├─5009 dhclient eth0
           │   ├─5010 sudo -u appuser HOME=/home/app /bin/sh -c cd /home/app; NODE_ENV=production node run.js
           │   ├─5011 /bin/sh -c cd /home/app; NODE_ENV=production node run.js >> stdout.log 2>> stderr.log
           │   └─5012 node index
           ├─user
           │ └─root
           │   └─0
           │     └─4838 [lxc monitor] /var/lib/lxc container_name
           ...
           │ ├─nginx.service
           │ │ ├─ 4659 Passenger watchdog
           │ │ ├─ 4662 Passenger core
           │ │ ├─ 4692 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           │ │ ├─ 4693 nginx: worker process
           │ │ ├─ 4694 nginx: worker process
           │ │ ├─ 4695 nginx: worker process
           │ │ ├─ 4696 nginx: worker process
           │ │ ├─15786 Passenger NodeApp: /opt/dispatcherjs/current
           │ │ └─15836 Passenger NodeApp: /opt/dispatcherjs/current

Kernel 4.15

CGroup: /
           ├─user.slice
           ...
           ├─system.slice
           ...
           │ ├─nginx.service
           │ │ ├─11654 Passenger NodeApp: /opt/dispatcherjs/current
           │ │ ├─11704 Passenger NodeApp: /opt/dispatcherjs/current
           │ │ ├─14258 Passenger watchdog
           │ │ ├─14261 Passenger core
           │ │ ├─14287 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           │ │ ├─14288 nginx: worker process
           │ │ ├─14289 nginx: worker process
           │ │ ├─14290 nginx: worker process
           │ │ ├─14291 nginx: worker process
           │ │ ├─30044 [lxc monitor] /var/lib/lxc container_name
           ...
           └─lxc
             ├─container_name-1
             │ ├─30049 sh -c dhclient eth0; sudo -u appuser HOME=/home/app /bin/sh -c 'cd /home/app; NODE_ENV=pro
             │ ├─30234 dhclient eth0
             │ ├─30235 sudo -u appuser HOME=/home/app /bin/sh -c cd /home/app; NODE_ENV=production node run.js
             │ ├─30237 /bin/sh -c cd /home/app; NODE_ENV=production node run.js >> stdout.log 2>> stderr.log
             │ └─30259 node index