Out of Memory error on a container: Incus can’t stop/restart container, even with –force

I was seeing a container use a lot of memory for what I thought it was doing and so I thought “let’s see what happens if I limit it’s memory” . I limited it to 16GiB with

incus config set CONTAINER memory.limit=16GB

incus config set CONTAINER limits.memory.swap="false"

and rebooted it. But after the reboot I was doing an upgrade where it backed up its PostgreSQL database, the container hung and now it can’t be restarted, even with –force.

I ran shutdown -r now on the container but after it kicked me off (I assumed for reboot) it showed RUNNING as a status. I can’t connect to it with incus exec CONTAINER bash -–

returning Error: Failed to retrieve PID of executing child process

I ran incus stop –force gitlab on the incus server but it hangs.

/var/log/incus/incus.log shows

time="2026-03-16T22:30:08-05:00" level=warning msg="Failed to retrieve network information via netlink" instance=CONTAINER instanceType=container pid=2689744 project=default

dmesg shows an out of memory kernel error

[17499284.206459] Memory cgroup out of memory: Killed process 2780484 (postgres) total-vm:134769084kB, anon-rss:2560kB, file-rss:7044kB, shmem-rss:1386560kB, UID:1000994 pgtables:2848kB oom_score_adj:0
[17499286.823556] postgres invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[17499286.823567] CPU: 3 UID: 1000994 PID: 2780494 Comm: postgres Tainted: P           OE      6.12.41+deb13-amd64 #1  Debian 6.12.41-1
[17499286.823572] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
...
17499286.823703] memory: usage 15625004kB, limit 15625000kB, failcnt 25042971
[17499286.823709] swap: usage 0kB, limit 0kB, failcnt 0
...
[17499286.824207] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=lxc.payload.gitlab,mems_allowed=0-1,oom_memcg=/lxc.payload.gitlab,task_memcg=/lxc.payload.gitlab/gitlab.slice/gitlab-runsvdir.service,task=postgres,pid=2780494,uid=1000994
[17499286.824223] Memory cgroup out of memory: Killed process 2780494 (postgres) total-vm:134769084kB, anon-rss:2560kB, file-rss:7180kB, shmem-rss:2015040kB, UID:1000994 pgtables:4064kB oom_score_adj:0

So I guess I limited the memory too much? But how can I kill that container now?

What kind of host is it running on?

On a non IncusOS system you can usually search for the PID of the container and kill it hardly.

>ps -ef |grep gitlab

Take the first PID of [lxc monitor] and perform a kill -9 <pid>

That usually kills the whole container and after some time it can be restarted.

Not sure how you can do this for IncusOS as there is no access to the command line.

Thanks!

This is on Debian/13 (trixie).

I found the PID of the container

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2689715 0.0 0.0 6609156 22676 ? Ss Mar16 0:00 [lxc monitor] /var/lib/incus/containers gitlab

and did a kill -9 on it

It disappeared.

Which changed the status of the container from RUNNING to ERROR

There are still child processes running though. It seems they are taking up enormous CPU time

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000996  2754193 95.2  0.0      0     0 ?        Zl   Mar16 735:55 [bundle] <defunct>
1000996  2754717  8.9  0.0      0     0 ?        Zl   Mar16  69:12 [bundle] <defunct>
1000996  2754722 11.4  0.0      0     0 ?        Zl   Mar16  88:21 [bundle] <defunct>
1000996  2754726  0.0  0.0      0     0 ?        Zl   Mar16   0:24 [bundle] <defunct>
1000996  2754747  8.9  0.0      0     0 ?        Zl   Mar16  68:52 [bundle] <defunct>
1000996  2754749  9.4  0.0      0     0 ?        Zl   Mar16  73:12 [bundle] <defunct>
1000996  2754751 13.8  0.0      0     0 ?        Zl   Mar16 106:41 [bundle] <defunct>
1000996  2754753  2.4  0.0      0     0 ?        Zl   Mar16  19:13 [bundle] <defunct>
1000996  2754755  9.3  0.0      0     0 ?        Zl   Mar16  71:58 [bundle] <defunct>
1000996  2754778  8.4  0.0      0     0 ?        Zl   Mar16  64:50 [bundle] <defunct>
1000996  2754797  9.8  0.0      0     0 ?        Zl   Mar16  76:15 [bundle] <defunct>
1000996  2754828  9.0  0.0      0     0 ?        Zl   Mar16  69:29 [bundle] <defunct>
1000996  2754853  0.2  0.0      0     0 ?        Zl   Mar16   2:09 [bundle] <defunct>
1000996  2754856 13.9  0.0      0     0 ?        Zl   Mar16 107:48 [bundle] <defunct>
1000996  2754871 10.0  0.0      0     0 ?        Zl   Mar16  77:20 [bundle] <defunct>
1000996  2754901  9.0  0.0      0     0 ?        Zl   Mar16  69:47 [bundle] <defunct>
1000996  2754916  5.2  0.0      0     0 ?        Zl   Mar16  40:43 [bundle] <defunct>
1000996  2754991  7.7  0.0      0     0 ?        Zl   Mar16  59:36 [bundle] <defunct>
1000996  2755011  9.8  0.0      0     0 ?        Zl   Mar16  76:01 [bundle] <defunct>
1000996  2755049  9.0  0.0      0     0 ?        Zl   Mar16  69:37 [bundle] <defunct>
1000996  2755051  9.4  0.0      0     0 ?        Zl   Mar16  72:36 [bundle] <defunct>
1000996  2755072 12.7  0.0      0     0 ?        Zl   Mar16  98:06 [bundle] <defunct>
1000996  2755112  1.4  0.0      0     0 ?        Zl   Mar16  11:06 [bundle] <defunct>
1000996  2763976  0.0  0.2 1606500 1252232 ?     D    Mar16   0:02 puma: cluster worker 10: 47548 [gitlab-puma-worker]
1000996  2763978  0.0  0.2 1606436 1252080 ?     D    Mar16   0:01 puma: cluster worker 17: 47548 [gitlab-puma-worker]
1000996  2764242  0.0  0.2 1606436 1251300 ?     D    Mar16   0:02 puma: cluster worker 8: 47548 [gitlab-puma-worker]
1000996  2764309  0.0  0.2 1606436 1251116 ?     D    Mar16   0:03 puma: cluster worker 36: 47548 [gitlab-puma-worker]
1000996  2764367  0.0  0.2 1606436 1250548 ?     D    Mar16   0:02 puma: cluster worker 6: 47548 [gitlab-puma-worker]
1000996  2764410  0.0  0.2 1606436 1251628 ?     D    Mar16   0:02 puma: cluster worker 3: 47548 [gitlab-puma-worker]
1000996  2767469  0.0  0.2 1606436 1250860 ?     D    Mar16   0:02 puma: cluster worker 29: 47548 [gitlab-puma-worker]
1000996  2767521  0.0  0.2 1606436 1250792 ?     D    Mar16   0:02 puma: cluster worker 7: 47548 [gitlab-puma-worker]
1000996  2769011  0.0  0.2 1606436 1251548 ?     D    Mar16   0:04 puma: cluster worker 9: 47548 [gitlab-puma-worker]
1000996  2769017  0.0  0.2 1606436 1251884 ?     D    Mar16   0:04 puma: cluster worker 27: 47548 [gitlab-puma-worker]
1000996  2769863  0.0  0.2 1606436 1251160 ?     D    Mar16   0:03 puma: cluster worker 32: 47548 [gitlab-puma-worker]
1000996  2770731  0.0  0.2 1606436 1251928 ?     D    Mar16   0:02 puma: cluster worker 1: 47548 [gitlab-puma-worker]
1000996  2770827  0.0  0.2 1606340 1248036 ?     D    Mar16   0:05 puma: cluster worker 23: 47548 [gitlab-puma-worker]
1000996  2771736  0.0  0.2 1606372 1251560 ?     D    Mar16   0:04 puma: cluster worker 26: 47548 [gitlab-puma-worker]
1000996  2772504  0.0  0.2 1606340 1251512 ?     D    Mar16   0:02 puma: cluster worker 28: 47548 [gitlab-puma-worker]
1000996  2773498  0.0  0.2 1606340 1251024 ?     D    Mar16   0:03 puma: cluster worker 39: 47548 [gitlab-puma-worker]
1000996  2773503  0.0  0.2 1606340 1251200 ?     D    Mar16   0:03 puma: cluster worker 5: 47548 [gitlab-puma-worker]
1000996  2773532  0.0  0.2 1606340 1249344 ?     D    Mar16   0:04 puma: cluster worker 37: 47548 [gitlab-puma-worker]

none of those processes respond to kill -9

Edit: The parent process is the shutdown command in the container:

$ps -fp  2689744
UID          PID    PPID  C STIME TTY          TIME CMD
1000000  2689744       1  0 Mar16 ?        00:00:27 [systemd-shutdow]

and

$ps ps axwwwjf

PPID     PID    PGID     SID TTY        TPGID STAT   UID   TIME COMMAND
1 2672696 2672696 2672696 ?             -1 Ss       0   0:00 /usr/lib/systemd/systemd-udevd
1 2689744 2689744 2689744 ?             -1 Ss   1000000   0:27 [systemd-shutdow]
2689744 2754193 2754193 2753945 ?             -1 Zl   1000996 748:17  \_ [bundle] <defunct>
2689744 2754717 2754483 2754483 ?             -1 Zl   1000996  69:12  \_ [bundle] <defunct>
2689744 2754722 2754483 2754483 ?             -1 Zl   1000996  88:21  \_ [bundle] <defunct>
...

Restarting the incus daemon didn’t work

So, unfortunately, the solution was to reboot the incus server. Interesting learning experience with container kernel errors impacting the host.