LXD memory usage gradually increasing

I’m new to LXD and have been playing around with it lately to see what it can do and how it works. I’ve been playing with ephemeral containers lately and I’ve noticed that continually destroying and creating a lot of containers gradually increases LXD memory usage over time. Here’s exactly what I’m doing:

On a fresh boot I run the following on the host to spin up 25 Ubuntu 16.04 containers:
for i in {1..25}; do echo -n "btest&i: lxc launch -p default -p bridgeprofile ubuntu:16.04 btest$i; done

I have a cron job that runs once every minute that randomly stops one of the containers and starts a new one with the same name. Here the script that does that:

#!/bin/bash
num=$((1 + RANDOM % 25))
echo -n "Containter btest$num ip address: "
lxc info btest$num | grep "eth0" | grep -w "inet" | cut -f 3
echo "Stopping btest$num"
lxc stop btest$num
lxc launch --ephemeral -p default -p bridgeprofile ubuntu:16.04 btest$num
sleep 10 #Wait to get an IP address
echo -n "Containter btest$num new ip address: "
lxc info btest$num | grep "eth0" | grep -w "inet" | cut -f 3
echo

The containers themselves aren’t doing anything other than running the OS at this point. I’ve noticed that when I first create all the containers system memory usage is ~1GB. After running for 24+ hours memory usage on the system has doubled, hitting 2-3GB used.

My expectation is that memory usage would remain relatively consistent and not grow like this since I have the same number of containers running the whole time. Am I missing something that’s eating this memory or is there a possible memory leak somewhere? Like I said, I’m new to this so I’m assuming that I’m missing something basic here.

Hi @Jeffers00n

Can you let me know what is using the additional memory?

Is it the LXD process itself or something like the OS disk cache?

What tool are you using to measure memory usage?

Thanks
Tom

It looks like the LXD process to me and not disk caching. I started just using free to check it and now I’m using htop to watch live. I just noticed I have a bunch of what appear to be stuck lxc exec processes from me checking container uptimes periodically. I’m going to try killing these and see if that fixes the issue.

jeff      1554  0.0  0.1 395192  9720 pts/5    Tl   May16   0:00 lxc exec btest17 uptime
jeff      2805  0.0  0.1 329656  9716 pts/5    Tl   May16   0:00 lxc exec btest3 uptime
jeff      1563  0.0  0.1 396504  9700 pts/5    Tl   May16   0:00 lxc exec btest19 uptime
jeff      1521  0.0  0.1 395192  9664 pts/5    Tl   May16   0:00 lxc exec btest4 uptime
jeff      1549  0.0  0.1 328248  9656 pts/5    Tl   May16   0:00 lxc exec btest16 uptime
jeff      1527  0.0  0.1 328248  9652 pts/5    Tl   May16   0:00 lxc exec btest8 uptime
jeff      1520  0.0  0.1 395192  9644 pts/5    Tl   May16   0:00 lxc exec btest3 uptime
jeff      1547  0.0  0.1 470236  9632 pts/5    Tl   May16   0:00 lxc exec btest15 uptime
jeff      2803  0.0  0.1 329656  9632 pts/5    Tl   May16   0:00 lxc exec btest1 uptime
jeff      1534  0.0  0.1 396248  9624 pts/5    Tl   May16   0:00 lxc exec btest11 uptime
jeff      2815  0.0  0.1 329656  9624 pts/5    Tl   May16   0:00 lxc exec btest7 uptime
jeff      2839  0.0  0.1 396248  9624 pts/5    Tl   May16   0:00 lxc exec btest17 uptime
jeff      2804  0.0  0.1 396504  9620 pts/5    Tl   May16   0:00 lxc exec btest2 uptime
jeff      1576  0.0  0.1 328248  9604 pts/5    Tl   May16   0:00 lxc exec btest24 uptime
jeff      2813  0.0  0.1 396248  9604 pts/5    Tl   May16   0:00 lxc exec btest6 uptime
jeff      1518  0.0  0.1 395192  9600 pts/5    Tl   May16   0:00 lxc exec btest1 uptime
jeff      1532  0.0  0.1 328248  9592 pts/5    Tl   May16   0:00 lxc exec btest10 uptime
jeff      2832  0.0  0.1 329656  9580 pts/5    Tl   May16   0:00 lxc exec btest16 uptime
jeff      2829  0.0  0.1 393784  9572 pts/5    Tl   May16   0:00 lxc exec btest13 uptime
jeff      2822  0.0  0.1 272316  9568 pts/5    Tl   May16   0:00 lxc exec btest9 uptime
jeff      1565  0.0  0.1 468828  9536 pts/5    Tl   May16   0:00 lxc exec btest21 uptime
jeff      2850  0.0  0.1 396504  9536 pts/5    Tl   May16   0:00 lxc exec btest22 uptime
jeff      1560  0.0  0.1 328248  9524 pts/5    Tl   May16   0:00 lxc exec btest18 uptime
jeff      2849  0.0  0.1 469180  9524 pts/5    Tl   May16   0:00 lxc exec btest21 uptime
jeff      2806  0.0  0.1 328248  9508 pts/5    Tl   May16   0:00 lxc exec btest4 uptime
jeff      2846  0.0  0.1 396248  9504 pts/5    Tl   May16   0:00 lxc exec btest20 uptime
jeff      2856  0.0  0.1 396248  9488 pts/5    Tl   May16   0:00 lxc exec btest23 uptime
jeff      2807  0.0  0.1 262456  9460 pts/5    Tl   May16   0:00 lxc exec btest5 uptime
jeff      1522  0.0  0.1 396504  9444 pts/5    Tl   May16   0:00 lxc exec btest5 uptime
jeff      1578  0.0  0.1 320052  9440 pts/5    Tl   May16   0:00 lxc exec btest25 uptime
jeff      1525  0.0  0.1 395192  9420 pts/5    Tl   May16   0:00 lxc exec btest7 uptime
jeff      2824  0.0  0.1 469980  9404 pts/5    Tl   May16   0:00 lxc exec btest11 uptime
jeff      1543  0.0  0.1 396248  9400 pts/5    Tl   May16   0:00 lxc exec btest14 uptime
jeff      2830  0.0  0.1 319796  9348 pts/5    Tl   May16   0:00 lxc exec btest14 uptime
jeff      2823  0.0  0.1 468572  9344 pts/5    Tl   May16   0:00 lxc exec btest10 uptime
jeff      1524  0.0  0.1 396248  9328 pts/5    Tl   May16   0:00 lxc exec btest6 uptime
jeff      1528  0.0  0.1 395096  9280 pts/5    Tl   May16   0:00 lxc exec btest9 uptime
jeff      2831  0.0  0.1 396248  9280 pts/5    Tl   May16   0:00 lxc exec btest15 uptime
jeff      2840  0.0  0.1 395192  9280 pts/5    Tl   May16   0:00 lxc exec btest18 uptime
jeff      1570  0.0  0.1 395192  9272 pts/5    Tl   May16   0:00 lxc exec btest22 uptime
jeff      1535  0.0  0.1 394936  9268 pts/5    Tl   May16   0:00 lxc exec btest12 uptime
jeff      2865  0.0  0.1 395192  9256 pts/5    Tl   May16   0:00 lxc exec btest24 uptime
jeff      2869  0.0  0.1 396248  9256 pts/5    Tl   May16   0:00 lxc exec btest25 uptime
jeff      1572  0.0  0.1 396504  9248 pts/5    Tl   May16   0:00 lxc exec btest23 uptime
jeff      1537  0.0  0.1 396504  9244 pts/5    Tl   May16   0:00 lxc exec btest13 uptime
jeff      1519  0.0  0.1 327992  9240 pts/5    Tl   May16   0:00 lxc exec btest2 uptime
jeff      2825  0.0  0.1 468924  9212 pts/5    Tl   May16   0:00 lxc exec btest12 uptime
jeff      2841  0.0  0.1 396504  9212 pts/5    Tl   May16   0:00 lxc exec btest19 uptime
jeff      2820  0.0  0.1 403644  9180 pts/5    Tl   May16   0:00 lxc exec btest8 uptime
jeff      1564  0.0  0.1 394840  9108 pts/5    Tl   May16   0:00 lxc exec btest20 uptime

I was able to kill all the lxc exec processes and memory usage is still the same. Here is the output of sudo ps aux --sort rss: https://pastebin.com/raw/ppvf1ZLu

*Edit: Also here’s what free is showing:

jeff@ubuntu-lxd:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7976        3474         257         171        4244        3969
Swap:          4095         257        3838

thanks @Jeffers00n

Which memory figure have you been tracking and seeing increasing?

Was it the “used” col in output of free -m or one of the cols in ps aux related to LXD?

At first I was just tracking just the “used” col so I’m not entirely sure what the growth of memory over time has been for any particular process. This evening I’ll try shutting everything down, rebooting, and kludging together some tracking scripts to try to replicate and get more info.

From your paste bin, saving it, removing the 2 first lines and running

cat yourpaste.txt | tr -s ' ' | cut -d' ' -f 6 | paste -sd+ | bc

returns 1.7 Go of Rss. Quite a lot but not 2-3 Gb. You should probably track this use rather than free - a notoriously unreliable tool. And LXD itself and the containers are not taking a lot.
A nitpick: when you say that the containers are doing nothing, don’t confuse a complex OS like Ubuntu 16 with something like Alpine). When you start a new Ubuntu 16, it does lot of thngs (cloud-init, automatic updates)

Is htop as unreliable as free? Because it shows the same number for memory usage.

I have never looked at the global memory stats of htop to be candid. Once I realized that enormous efforts are done at the kernel level to optimize memory usage (all sorts of deduplication) at the expense of naive accounting, I was not interested anymore in global counters. The main characteristic of htop global counters is that they are unreadable by default with their nasty coloring. I prefer free. Both have the only counter meaning unequivocally that the kernel is running in trouble with memory allocation: swap. If the swap goes over 25% and still grows, system is in trouble. If your test leads to swapping, something is wrong. If not, nothing is proved by memory stats.