Container Size keeps growing and eating the disk space everyday

The increase is consistent , maybe its some backup. Also make sure Jira is uninstalled properly. Don’t have any more ideas sorry. There are some linux commands which shows you which files change over the last 24 hours.

Hello, @Jimbo, I am already grateful for you suggestions and help! Thank you so much!

Day 11:

  1. The container size increased by ~4GB from yesterday.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 67.3G 145G 33.6G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -

  1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 67.3G 145G 33.6G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 1.51M - 33.6G -

  1. Removed the existing oldest snapshot 0512. The container size immediately reduced by ~4.5GB, which is about the size of the deleted snapshot. The remaining oldest snapshot 0512 has its size jumped from 263M to 13.1G. The latest snapshot 0522 is slowly growing in size.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 62.8G 149G 33.6G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210514012200 13.1G - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 28.2M - 33.6G -

Day 12:

  1. The container USED size increased by 4GB from yesterday. The container REFER size suddenly increased by ~4GB too. Look back to the history, the REFER size of the container increased by ~4GB almost every time on the 2nd day after I deleted a snapshot. By ‘almost’, I meant it happened 3 out 4 times I deleted snapshot. To me, this 4GB REFER size increase is becoming more concerning than the everyday 4GB increase in USED size.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 66.8G 145G 37.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210514012200 13.1G - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 191M - 33.6G -

  1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 66.8G 145G 37.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210514012200 13.1G - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 12.6M - 37.5G -

Day 13:

  1. The container USED size increased by ~4GB from yesterday. But surprisingly, the container REFER size suddenly reduced by ~4GB.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 71.0G 141G 33.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210514012200 13.1G - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -

  1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 71.1G 141G 33.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210514012200 13.1G - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 25.5M - 33.7G -

  1. Removed the existing oldest snapshot 0514. The container size immediately reduced by 13.1GB, which is exactly the size of the deleted snapshot. The remaining oldest snapshot 0518 has its size jumped from 271M to 8.04G. The latest snapshot 0524 is slowly growing in size.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 58.0G 154G 33.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210518081400 8.04G - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 26.2M - 33.7G -

Day 14:

  1. The container USED size increased by ~4GB from yesterday.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 62.1G 150G 33.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210518081400 8.04G - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 166M - 33.7G -

  1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 62.1G 151G 33.7G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210518081400 8.04G - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 166M - 33.7G -
default/containers/repo@snapshot-repo-bck-20210525220500 19.3M - 33.7G -

Day 15:

  1. The container USED size increased by ~4GB from yesterday.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 66.2G 147G 33.8G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210518081400 8.04G - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 259M - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 166M - 33.7G -
default/containers/repo@snapshot-repo-bck-20210525220500 172M - 33.7G -

  1. Removed the oldest snapshot 0518 and created a new snapshot. The container size immediately reduced by ~8GB, which is about the size of the deleted snapshot. The remaining oldest snapshot 0520 has its size jumped from 259M to 8.04G. The latest snapshot 0526 is slowly growing in size.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 58.2G 155G 33.8G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210520162800 8.04G - 33.5G -
default/containers/repo@snapshot-repo-bck-20210522210400 195M - 33.6G -
default/containers/repo@snapshot-repo-bck-20210523193900 159M - 37.5G -
default/containers/repo@snapshot-repo-bck-20210524203000 166M - 33.7G -
default/containers/repo@snapshot-repo-bck-20210525220500 172M - 33.7G -
default/containers/repo@snapshot-repo-bck-20210526220200 14.0M - 33.8G -

You say the container increased 4GB today but the snapshot size difference between today and yesterday is 14MB which does not make sense, unless you have a growing file in a directory which the ZFS does not snapshot.

Forget snapshots forget everything else break the problem down. Go inside the container find out the usage, and track the files, and then tomorrow see the size, if its 4gb diff, then check the difference (see where files are being added or what is growing) between the files. If not, I think we will reach Day 365…

sorry, there was another issue (to be mentioned later) popped up so that I missed update and reply to your post yesterday. Yes, I will follow your suggestion to go inside and take a look the file changes. But this 4GB container size increase is most likely related to the snapshots.

I, in fact, planned to observe another few days before I conclude my observations and results and close this post thread. But just let me just summarize all the findings here: @stgraber @Jimbo

  1. zfs list will report two sizes for a container, USED and REFER size. REFER size is the size of the most current container. USED size is the size of the most current container plus the size of all its snapshots. In a word, USED = REFER + SNAPSHOT. You don’t see this is true in my previous post and I will explain why.
  2. Every morning, the container USED size will increase by 4GB, REFER size most of time no changes (a few exceptions I will explain later) and there is no other size changes reported. The 4GB increase is actually added into the latest snapshot and stay hidden. If I don’t create newer snapshot, the 4GB increment will be added into the same latest snapshot everyday. ZFS just doesn’t report this size increase for unknown reason.
  3. ZFS will eventually reflect the true size only for the oldest snapshot. That’s why whenever I deleted the oldest snapshot, the next oldest one will see its size suddenly jumps to a very big number which includes all the 4GB increments (depending on how many days this snapshot was the latest one).
  4. For example, look at Day 15 in my previous post, the oldest snapshot is created on 05/18 and the next one is created on 05/20. So the snapshot 0518 stayed to be the latest one for 2 days and why its size is 4GBx2 = 8GB. For the same reason, the size of snapshot 0520 is also 8GB. And the next 4 snapshots from 0522 to 0525 are all 4GB each. Now we can see USED = REFER + SNAPSHOT is also true because SNAPSHOT = 8 + 8 + 4*4 = 32GB. REFER is 33.8GB. USED is 66.2GB.
  5. I went back go over all the previous 15 days and verified this observation and conclusion is true.

I think that I just stop here unless there is anything contradict to my conclusion here. Just a few more thoughts.

  1. The issue is real but it seems I am the only one having this issue.
  2. The REFER size will also sudden changes by 4GB (3 increases and 1 decrease) occasionally and that’s why I will look into the container self.
  3. I can reach dynamic balance as long as I can create and delete snapshot everyday. This is not the solution but it helps containing the trouble.

However, I just had another issue that I cannot delete snapshot due to the following error.
Error: Failed to run: zfs destroy default/containers/repo@snapshot-repo-bck-20210520162800: cannot destroy snapshot default/containers/repo@snapshot-repo-bck-20210520162800: dataset is busy
Any suggestions? @stgraber @Jimbo Thank you!

I keep suggesting to look at what is going on inside the container as that is what is going to cause the snapshots to change in size. Also you keep saying the container grows in size, so again that is why i say look inside, you can say its the snapshot fault, but they are just differences. Unless as @stgraber suggested previously, you have some nested situation going on.

As for dataset is busy , did you create the snapshot and then try to delete it before ZFS has finished? if its still stuck ,sometimes restarting lxd helped me out that situation.

2 Likes

Hello, @Jimbo , no, I don’t have nested LXD.

And you are right. I may find the reason when I finally looked into inside the container.

I found a folder gitlab/backup where it contains 7 tar files created at 02:02 AM on each day for the last 7 days. And each file has its size around 4GB.

It appears there is a cron job that creates and deletes a backup everyday. This explains why the REFER size is not changing. It also explains why it is the latest snapshot which absorbs 4GB increase everyday. The only trick part is that ZFS doesn’t tell us until the snapshot becomes the oldest one.

Anyway, thank you @Jimbo for keeping pushing me to get to the bottom. Really appreciate it!
Also thank you @stgraber for all your help on everyone of my questions.