LXC Container in Proxmox using 90% of memory with all processed killed

Hello everyone,

I’m currently running Plex inside a docker container inside an LXC container. The cluster is backed by CEPH storage. The issue I’m encountering though is that sometimes the LXC container runs out of memory. I have this happening for a few other LXC containers too.

The container has a memory limit of 8gb but after a while it gets closer and closer to this limit until it eventually runs out. When this happens I can’t even connect to the container anymore and have to force stop it.

Today when I wanted to investigate the issue a bit further I noticed my Plex LXC was filling up again. So I logged in and started stopping all processed.
I first manually stopped all containers, and then also stopped docker and containerd using systemctl stop.

The strange thing is though that after that the memory usage in both Proxmox and htop still reported 7gb/8gb used:

image

(I can only add one image, so I’ll add the other one later)

What is weird though is that none of the processes in htop show any significant memory usage.

The command free -m also shows that about 7gb is used:

root@lxc-plex:~/dockercomposers/plexplox# free -m 
               total        used        free      shared  buff/cache   available
Mem:            8192        7129         769           0         292        1062
Swap:              0           0           0

Next I ran ps aux to find out the memory usage per process:

root@lxc-plex:~/dockercomposers/plexplox# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 168800  8320 ?        Ss   Mar13   0:26 /sbin/init
root          43  0.0  0.2  74308 20608 ?        Ss   Mar13   0:02 /lib/systemd/systemd-journald
systemd+      82  0.0  0.0  17996  3840 ?        Ss   Mar13   0:00 /lib/systemd/systemd-networkd
root         114  0.0  0.0   3600   640 ?        Ss   Mar13   0:00 /usr/sbin/cron -f
message+     115  0.0  0.0   9296  2176 ?        Ss   Mar13   0:02 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         119  0.0  0.0  17164  2688 ?        Ss   Mar13   0:00 /lib/systemd/systemd-logind
root         124  0.0  0.0   2516   640 pts/0    Ss+  Mar13   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud - 115200,38400,9600 linux
root         125  0.0  0.0   6120  1152 pts/1    Ss   Mar13   0:00 /bin/login -p --
root         126  0.0  0.0   2516   512 pts/2    Ss+  Mar13   0:00 /sbin/agetty -o -p -- \u --noclear - linux
root         132  0.0  0.0  15412  1920 ?        Ss   Mar13   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root         287  0.0  0.0  42652   788 ?        Ss   Mar13   0:00 /usr/lib/postfix/sbin/master -w
postfix      289  0.0  0.0  43088   896 ?        S    Mar13   0:00 qmgr -l -t unix -u
root        3367  0.0  0.0   6632  3712 pts/1    S    Mar13   0:00 -bash
postfix   519708  0.0  0.0  43052  6400 ?        S    18:09   0:00 pickup -l -t unix -u -c
root      519711  0.0  0.0   8088  4096 pts/1    R+   18:25   0:00 ps aux

ps aux again suggest that there’s barely any memory usage.

Another interesting one is cat /proc/meminfo:

root@lxc-plex:~/dockercomposers/plexplox# cat /proc/meminfo 
MemTotal:        8388608 kB
MemFree:          788216 kB
MemAvailable:    1087576 kB
Buffers:               0 kB
Cached:           299360 kB
SwapCached:            0 kB
Active:          6515140 kB
Inactive:         444628 kB
Active(anon):    6337184 kB
Inactive(anon):   323328 kB
Active(file):     177956 kB
Inactive(file):   121300 kB
Unevictable:           0 kB
Mlocked:          221280 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:       6660408 kB
Mapped:                0 kB
Shmem:               104 kB
KReclaimable:     787484 kB
Slab:                  0 kB
SReclaimable:          0 kB
SUnreclaim:            0 kB
KernelStack:       20336 kB
PageTables:        41200 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16314548 kB
Committed_AS:   16080536 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      253936 kB
VmallocChunk:          0 kB
Percpu:             4448 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      372040 kB
DirectMap2M:    15124480 kB
DirectMap1G:    17825792 kB

This lists about 6.5gb and 6.3gb respectively for Active and Active(anon).

Even with all this information I have no clue what is using the memory inside this LXC container.

One vague idea I have is that it might have something to do with me mapping the Intel N100 chip to the container so that Plex can do hardware transcoding, but again, this might be something that’s completely unrelated.

Here’s my LXC config /etc/pve/lsx/201.conf:

root@proxmox1:/etc/pve/lxc# cat 201.conf 
arch: amd64
cores: 4
features: nesting=1
hostname: lxc-plex
memory: 8192
mp0: /mnt/lxc_shares/Plex/,mp=/mnt/Plex,shared=1
net0: name=eth0,bridge=vmbr0,gw=10.88.20.254,hwaddr=8E:48:71:B7:12:98,ip=10.88.21.201/23,type=veth
onboot: 1
ostype: debian
rootfs: ReplicatedPool_2:vm-201-disk-0,size=100G
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

I know I could just restart the container and have things resolved again but I think I have a beautiful scenario now for debugging. Let’s hope I can get some input here to figure out what’s going wrong.

Note: I also crossposted this issue here:
https://forum.proxmox.com/threads/lxc-container-in-proxmox-using-90-of-memory-with-all-processed-killed.143415/

And a screenshot of htop:

Got any tmpfs or devshm mounted with stuff on it?

Forgive me for not being that well versed in the core workings of Linux / memory management. So I’m trying to answer your question by my own interpretation. (I’m kinda learning on the job here :slight_smile: )

Reproducing the problem

As of making this post I have accidently restarted the LXC container which resulted it in being nice and empty again in terms of memory usage. I left it running for about 12 hours without plex running inside of it and that all kept it at like 100mb of memory. (Due to some other containers running).

What I noticed though is that once started Plex again and I start watching a movie on Plex with transcoding enabled it seems that it allocates some memory, but doesn’t fully release it.

For example, when I started the Plex container the whole system was using around 1gb of memory. When I then started a movie everything stayed at around the same amount. When I then started transcoding the movie (using my hardware mapped /dev/dri device) the memory usage increased by around 100-200mb. When I stopped the transcoding session this memory kept being used.

By doing this a few times (stopping / starting transcoding for a movie) I’m now sitting at 3gb used:

root@lxc-plex:/run# free
               total        used        free      shared  buff/cache   available
Mem:         8388608     3178884     3468652      390760     1741072     5209724
Swap:              0           0           0

I then stopped the Plex container again and ran free again:

root@lxc-plex:~/dockercomposers/plexplox# docker compose down
[+] Running 3/3
 _ Container plex            Removed                                                                                                                                                                             7.8s 
 _ Container tautulli        Removed                                                                                                                                                                             2.8s 
 _ Network plexplox_default  Removed                                                                                                                                                                             0.5s 
root@lxc-plex:~/dockercomposers/plexplox# free   
               total        used        free      shared  buff/cache   available
Mem:         8388608     2998296     4232132         112     1158180     5390312
Swap:              0           0           0

As you can see we seem to be running into the same problem again as before (only 3gb used instead of 8 because the container has only been running for an hour or so).

Back to the question

Anyway, with the problem reproduced I’d now like to go back to your question.

if I run df -h inside the container I see that I have 3 tmpfs filesystems mounted:

root@lxc-plex:~/dockercomposers/plexplox# df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/rbd0               99G   44G   50G  47% /
//************/PlexPlox   63T   41T   23T  65% /mnt/PlexPlox
none                   492K  4.0K  488K   1% /dev
udev                    16G     0   16G   0% /dev/dri
tmpfs                   16G     0   16G   0% /dev/shm
tmpfs                  6.3G  108K  6.3G   1% /run
tmpfs                  5.0M     0  5.0M   0% /run/lock

I don’t see significant usage here on tmpfs or shm mounts. So I don’t expect that to be the problem.

Some thoughts about the issue seeming to happen when transcoding

Currently I’m using an N100 CPU with hardware acceleration for video transcoding. I have however enabled SRIOV for this CPU so that I could (Not using this at the moment) also map virtual GPU’s to VM’s.
As you can see in the LXC config I map all devices in /dev/dri to the LXC container. I then mount /dev/dri/renderD128 to the Plex container.
The guide I followed for that is here:

Could it be that somehow the SRIOV implementation screws up something which causes this problem?

I would like to try to turn off SRIOV for now but I’m not sure how to “uninstall” / “disable” it. I’ve already asked on their github to see if I can get some help there:

(So if by any chance you know how to do that, that would be helpfull as well so I can continue my investigation)

I did another test. My Plex LXC (without the Plex Docker Container running) was using 3.7gb.

When I monitored the memory usage on the Host it was sitting at around 17.3gb.

I then completely shutdown the LXC container.

After that the memory usage on the host still remained around 17.3gb.

I found an update on the SRIOV Github that more people are running into an issue with memory not being released:

I’m going to investigate if disabling the DKMS module solves the issue for me too.

My issue is solved by uninstalling the DKMS SRIOV driver for the N100 CPU.
(It seems to have a memory leak on kernel 6.5, see the github post above)