Container Size keeps growing and eating the disk space everyday

be7taname · May 14, 2021, 6:30pm

I use ZFS as the storage backend and have 4 Linux containers running on it. The OS is Ubuntu20.04 and the LXD version is 4.06.
My biggest concern is that the 4 running containers are growing in size everyday and the largest one is growing by 4GB everyday. I will soon running out of disk space. Here is some outputs from zfs list for the last there days.

The output format is
NAME USED AVAIL REFER MOUNTPOINT
Day 1:
default/containers/repo 49.0G 165G 27.1G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo

Day 2:
default/containers/repo 53.0G 161G 26.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo

Day 3:
default/containers/repo 57.1G 157G 26.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo

I have no clue on what’s going on here. Please give some advices and instructions how I can resolve this issue.
BTW, two more things I noticed.

The stopped container doesn’t grow in size, only the running ones.
I noticed that size differences between the column USED and the column REFER in the above zfs list outputs. When I copied the snapshot of the running container to the backup server. The size of the backup container is the same as the size shown in the column REFER not the column USED. I am not clear what it means.

//////////////////////////////////////////
Some background if helps:
I started to use LXD several years ago and the version was 2.21 on Ubuntu16.04. It have been running with no big problems and therefore I never tempted to upgrade to newer version.

Starting about 1 to 2 week ago, all the containers suddenly became no responsive. Then I noticed that the AVAIL disk space was running out. It became 0 literally. I then deleted some snapshots to make some free disk space, several hundreds MB. However, all the free space were gone the 2nd day. As a matter of fact, I noticed the free space was shrinking by several MB every a few minutes.

So I built a new server with much larger SSD and the latest Ubuntu and LXD. I copied the containers from the old server to the new one and started them again. Then I noticed that issue mentioned in the beginning of this post. Please advise and thank you!

stgraber · May 14, 2021, 7:00pm

hmm, do you maybe have snapshots accumulating on those instances?

be7taname · May 14, 2021, 7:13pm

In my old server, the size keeps growing even after I stopped auto snapshot and backup process. For new server, I am currently doing the process manually. I think that maybe I got panic and drew the conclusion too quickly. Below is the possible scenario.
The snapshot doesn't increase size immediately.
But it will increase the size eventually
The time difference tricked me to think the size increase was not related to the snapshot
I got panic because of my bad experience on the old server and I may need to cool down a little bit. Please give me a couple of days to do the further observation. Thank you!

stgraber · May 14, 2021, 7:28pm

Yes, by definition a snapshot is free at the time it’s taken but will then hold on to its state as the instance starts diverging, causing a size increase as that happens.

be7taname · May 15, 2021, 5:28pm

Hello, @stgraber, it seems that my worries are true. Without any snapshot yesterday, the container size grows again, 4GB everyday. It does not seem to be related to the snapshot.
Day 4:
default/containers/repo 61.1G 153G 26.3G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
Do you have any advices how I can tackle this issue?

stgraber · May 15, 2021, 5:37pm

Does restarting the container clear up the space?

be7taname · May 15, 2021, 6:00pm

no. stop/start containers doesn’t clean up space.

stgraber · May 15, 2021, 6:05pm

Ok, so it’s not a deleted inode taking up the space.
Is du -sch --one-file-system / in the container getting you something close to what zfs reports?

be7taname · May 15, 2021, 6:36pm

I removed option sc to get the report on each directory as following
du -h --one-file-system /
The best information I can get from this command is the total size of LXD
314G /var/snap/lxd/common
It is kinda larger than what zfs list reports as below
NAME USED AVAIL REFER MOUNTPOINT
default 297G 153G 24K none

stgraber · May 15, 2021, 7:35pm

Your container has nested containers?

be7taname · May 15, 2021, 7:36pm

no. just a simple container.

stgraber · May 15, 2021, 7:42pm

So a bit confused why you’d have a /var/snap/lxd/common directory inside your container then.

be7taname · May 15, 2021, 7:47pm

I have ZFS as the storage backend. And I kept the storage pool’s name as default during lxd init which is default as shown below.
/var/snap/lxd/common/lxd/storage-pools/default/
All the containers are under
/var/snap/lxd/common/lxd/storage-pools/default/containers/
For example, the container repo has its path in ZFS as shown below
/var/snap/lxd/common/lxd/storage-pools/default/containers/repo

stgraber · May 15, 2021, 10:44pm

Right but I asked for du -sch --one-file-system / INSIDE the container, not run on the host.

be7taname · May 15, 2021, 11:16pm

Actually I did run the command du -sch --one-file-system / inside the container as well. Below is the result. It is the same as the REFER size of the container.
27G /
27G total

stgraber · May 15, 2021, 11:20pm

Can you show zfs list -t all?

be7taname · May 15, 2021, 11:33pm

Below are the results from zfs list -t all

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 61.1G 153G 26.2G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210507021418 4.35G - 35.5G -
default/containers/repo@snapshot-repo-bck-20210508021416 219M - 35.5G -
default/containers/repo@snapshot-repo-bck-20210510140352 4.15G - 39.4G -
default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 260M - 26.7G -

stgraber · May 15, 2021, 11:38pm

And I suspect you care about those snapshots so can’t just blow them away to see if they’re the ones holding back to state that’s slowly diverging, causing the increase you’re seeing?

be7taname · May 16, 2021, 12:07am

I did remove snapshots when I experienced the issue on the old server. It didn’t stop the size increasing. On the other hand, I didn’t have enough time to observe every details since the service crashed every night and my focus was to move it out to a new server first.

So yes I can do it again but this experiment will take some days and I will need to do it carefully.

BTW, I am not sure when and how this 4GB size increase happens, slowly or all of a sudden. Every morning I check the size, the container becomes 4GB bigger than last night around as late as 2am. It stays the same size for the entire day until I go to bed at night. Then it repeats…

stgraber · May 16, 2021, 3:46am

Yeah, what makes me think the snapshots could have something to do with holding onto that 4G of data is that you have a couple of snaphosts listed above which suspiciously report as USED size of just around 4G more than other smaller snapshots.

So whatever that 4G change is in your container, it looks like it got caught in snapshots sometimes.