Container Size keeps growing and eating the disk space everyday

Day 5:

  1. Without doing anything special, the container size increase by 4GB again comparing with yesterday.
    default/containers/repo 65.0G 149G 26.0G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
    default/containers/repo@snapshot-repo-bck-20210507021418 4.35G - 35.5G -
    default/containers/repo@snapshot-repo-bck-20210508021416 219M - 35.5G -
    default/containers/repo@snapshot-repo-bck-20210510140352 4.15G - 39.4G -
    default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
    default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
  2. After deleting the oldest snapshot 0507, the container size immediately reduce by around 4.3G, which looks like just the size of the snapshot. Interestingly, the size of the snapshot 0508 increased from 219M to 721M.
    default/containers/repo 60.7G 153G 26.0G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
    default/containers/repo@snapshot-repo-bck-20210508021416 721M - 35.5G -
    default/containers/repo@snapshot-repo-bck-20210510140352 4.15G - 39.4G -
    default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
    default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -

Yeah, that’s always the weird thing with snapshot, they make calculating disk usage very tricky as they hold on to the past, causing anything that changes after them cause data duplication, when you have multiple snapshots at play, two snapshot taken in a short succession may show as one large one and a tiny one, but as soon as you delete the large one, the tiny one will grow as it’s now the one holding onto that older data.

Day 6:

  1. As usual, the container size increased by 4GB again and thus the available size in ZFS Pool reduced by 4GB comparing with yesterday. Another noticeable size changes is the REFER size of the container. It has been stayed around ~27GB for the last 5 days and jumped to ~30G after the oldest snap 0507 was deleted yesterday. There are no other size changes.

default/containers/repo 64.6G 149G 29.9G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210508021416 721M - 35.5G -
default/containers/repo@snapshot-repo-bck-20210510140352 4.15G - 39.4G -
default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -

  1. Then I removed the existing oldest snapshot 0508. The container size immediately reduced by around ~800MB, which looks like just the size of the deleted snapshots. Very interestingly, the remaining oldest snapshot 0510 has its size jumped from 4.15G to 16.6G. Hmm~~~!

default/containers/repo 63.8G 150G 29.9G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210510140352 16.6G - 39.4G -
default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -

Day 7:

  1. The container size increased by 3.2GB instead of 4GB comparing with yesterday. The REFER size of the container increased by 3.5G. There are no other size changes.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 67.8G 146G 33.4G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210510140352 16.6G - 39.4G -
default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -

  1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 67.8G 146G 33.4G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210510140352 16.6G - 39.4G -
default/containers/repo@snapshot-repo-bck-20210512114000 265M - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 10.6M - 33.4G -

  1. Removed the existing oldest snapshot 0510. The container size immediately reduced by 16.6GB, which is exact the size of the deleted snapshots. The remaining oldest snapshot 0512 has its size jumped from 4.61G. The latest snapshot 0518 is slowly growing in size.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 51.2G 162G 33.4G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 11.3M - 33.4G -

Day 8:

  1. The container size increased by 4GB comparing with yesterday, again. This 4GB size increase, just like the previous 7 days, is not reflected in any snapshot or anywhere. I think that the issue is real.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 55.3G 157G 33.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 214M - 33.4G -

What are you running in the container? What cron jobs do you have running inside it?

Only GibLab is running in the container. And I am not runing any cron jobs either in the container or in the host machine.

  1. What about docker images, look at using docker image prune to remove unused images
  2. Check your log level maybe it is too high.
  3. Are you storing artifacts from jobs (https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html)?, this can eat a lot of diskspace.
  1. I didn’t download any docker images. The only thing I installed is LXD and then copied the containers from old server.
  2. How do I check log level? However, I am pretty sure that the log level will be at default since I never changed it.
  3. I am not storing any artifacts from any jobs.

Day 9:

  1. The container size increased by ~4GB from yesterday.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 59.2G 153G 33.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -

    1. Created a new snapshot.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 59.3G 153G 33.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 96.5M - 33.5G -

What is the usage like inside the container itself?

What about large files that are on there?

du -ah / | sort -n -r | head -n 50

What about the directory /var/opt/gitlab/prometheus/data

The command du -ah / | sort -n -r | head -n 50 took forever to complete.
However, I tried the other command du -sch --one-file-system / inside the container several days ago and just now again. The output is always the same as the REFER size of the container, which is 34G today. FYI, the REFER size of the container in my last post today is 33.5G.
The directory /var/opt/gitlab/prometheus/data looks normal too, the entire size is 273M.

The REFER size of the container is quite stable, relatively speaking. And it doesn’t account for the 4GB increase everyday. Instead, the 4GB increase is counted towards the USED size of the container, where USED size includes the sizes of the snapshots. That’s the reason I think the issue is not related to the application running inside of the container. Rather, it seems to be the issue related to the LXD file management of the container and its snapshots.

How many snapshots does that container have? when where they created?

I have been posting the sizes of the container and its snapshots everyday for the last week.
In my latest post, I just created a snapshot and there are 4 snapshots now. Every snapshot has the created day/time embedded in its name. It should not be hard or time consuming to read.
I also put down my observations. Please let me know what you think. Thanks a lot!

Your snapshots are increasing because your container is increasing in size.

I did see what you posted above, but i wanted to double check. It is like when you said there are no cron jobs running, but gitlab does run cronjobs, pretty sure of it. I myself many times when posting support, i might shorten the output of stuff which i think is irrelevant or repeated.

You said you don’t use docker, but if you are using the Gitlab runner and that is using docker containers, then an image prune might solve the issue.

It’s a little bit hard to help without asking questions.

I think you need to find out what is consuming the disk space on your container to solve this problem, maybe break it down to directories with largest disk space usage.

@Jimbo, I am sorry that if I said anything felt offensive that I didn’t mean to. As your question regarding snapshots can not be summarized and I didn’t realize that you were double-checking, I wanted you to read the snapshots history that I had posted. On the other hand, I felt very sorry that I had to ask you to read long posts in order to help me. So let me rephrase the sentence you quote. Here is what I really meant. I hope that it won't be hard or time consuming for you to read. I hope I explain myself well and please accept my apologies for any misunderstandings that I introduced.

No problem, we have to find out what is increasing the space inside the container. Check what the devs are doing and uploading, and asking if they are doing anything different over the last few weeks, such uploading large binaries to repos or using docker images. Just run the docker command to see whats dangling.

  1. Cron jobs: I checked cron jobs using cat /etc/crontab on the host machine and inside the container. The outputs are the same as shown below. For GitLab cron jobs, if anything, it has to be default cron jobs. We didn’t set up any special cron jobs for GitLab.
    17 * * * * root cd / && run-parts --report /etc/cron.hourly
    25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
    47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
    52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

  2. Docker / Gitlab Runner: we didn’t use it.

  3. Devs: could you please show me how I can check what the devs are doing and uploading? Sorry that I am not familiar with this. Thanks.

I meant ask them, have not used gitlab locally.

Other than that, since there are only 3-4 snapshots, and container size is increasing, its really about seeing why the container is increasing. I cant think of anything else.

oh. I thought you were talking about some background process called devs.
I checked the size of every repo in Gitlab and didn’t notice anything large, not mentioning 4GB consumed everyday.
There was one thing different they did. They integrated Jira with Gitlab and on the very same day my old server consumed all the remaining free space and crashed. It could be coincidence because the free space on the old server could have been consumed earlier than that. Anyway, I found it out a couple of days later after crashing and then disabled the integration. It didn’t help my problem here.

Day 10:

  1. The container size increased by ~4GB from yesterday.

NAME USED AVAIL REFER MOUNTPOINT
default/containers/repo 63.4G 149G 33.5G /var/snap/lxd/common/lxd/storage-pools/default/containers/repo
default/containers/repo@snapshot-repo-bck-20210512114000 4.61G - 27.1G -
default/containers/repo@snapshot-repo-bck-20210514012200 263M - 26.7G -
default/containers/repo@snapshot-repo-bck-20210518081400 271M - 33.4G -
default/containers/repo@snapshot-repo-bck-20210520162800 197M - 33.5G -