Disk space problem on lxd containers with zfs pool increases and does not decrease

I have a problem with disk space on a lxd host server. Two containers are running on this server, one is a proxy server, the other is with sites. both containers have a default zfs pool with a total capacity of 15GB. each container has its own limited proxy disk space - 3GB, sites 12GB. snapshots and tar archives are taken on these two containers every day, and each snapshot has an expiration date of 7 days. In the evening, containers with lxd copies are transferred from the productive server to a backup server with lxd mosaic.

My problem is the containers on the production server are running out of disk space.

Default storage pool space:
space used: 11.93GiB
total space: 14.41GiB

sites root disk size is . root: 10.23GiB
proxy root disk size is . root: 1.68GiB

Name: sites
Status: RUNNING
Type: container
Architecture: x86_64
PID: 15278
Created: 2022/09/05 13:59 UTC
Last Used: 2022/12/01 19:04 UTC

Resources:
Processes: 120
Disk usage:
root: 10.24GiB
CPU usage:
CPU usage (in seconds): 100563
Memory usage:
Memory (current): 384.51MiB
Memory (peak): 1.40GiB
Network usage:
eth0:
Type: broadcast
State: UP
Host interface: veth08317273
MAC address: 00:16:3e:79:76:39
MTU: 1500
Bytes received: 5.56GB
Bytes sent: 14.49GB
Packets received: 34152548
Packets sent: 29411480
IP addresses:
inet: 10.148.154.110/24 (global)
inet6: fd42:dc6b:5124:630c:216:3eff:fe79:7639/64 (global)
inet6: fe80::216:3eff:fe79:7639/64 (link)
lo:
Type: loopback
State: UP
MTU: 65536
Bytes received: 158.58MB
Bytes sent: 158.58MB
Packets received: 1527285
Packets sent: 1527285
IP addresses:
inet: 127.0.0.1/8 (local)
inet6: ::1/128 (local)

Snapshots:
±----------------------±---------------------±---------------------±---------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
±----------------------±---------------------±---------------------±---------+
| snapshot-1111111023-0 | 2023/01/11 01:19 UTC | 2023/01/18 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+
| snapshot-1212121023-0 | 2023/01/12 01:19 UTC | 2023/01/19 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+
| snapshot-1313131023-0 | 2023/01/13 01:19 UTC | 2023/01/20 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+
| snapshot-1414141023-0 | 2023/01/14 01:19 UTC | 2023/01/21 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+
| snapshot-1515151023-0 | 2023/01/15 01:19 UTC | 2023/01/22 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+
| snapshot-1616161023-0 | 2023/01/16 01:19 UTC | 2023/01/23 01:19 UTC | NO |
±----------------------±---------------------±---------------------±---------+

NAME USED AVAIL REFER MOUNTPOINT
default/containers/sites@migration-93f59a63-f099-422d-9b5d-a3c207729411 296M - 2.50G -
default/containers/sites@migration-e2d3e821-8dde-45ad-bd01-4b01699598af 82.6M - 2.64G -
default/containers/sites@migration-61aa56db-39d2-4e03-81b4-3c4eea95ee83 7.88M - 2.38G -
default/containers/sites@migration-27946a23-f10b-4e96-966b-710a5df1108b 7.96M - 2.51G -
default/containers/sites@migration-16d62f87-8483-42cd-8d97-4b63fede8a1a 77.5M - 2.64G -
default/containers/sites@migration-e5edfa48-1429-4f01-af09-7e0ffad6db08 76.6M - 2.77G -
default/containers/sites@migration-fcae50bb-cbe1-43b4-9592-aa0cc46a7dae 73.9M - 2.90G -
default/containers/sites@migration-10ac5b99-39e8-4a1e-9894-d24cbf9a6a55 74.8M - 3.03G -
default/containers/sites@migration-6ad928f3-05dc-4bf0-8513-4699585be238 73.9M - 3.16G -
default/containers/sites@migration-f96c2d63-8bbc-4e24-9b15-02a4ca7e8b1b 68.2M - 3.16G -
default/containers/sites@migration-ff54f0b7-63e6-4f8d-8719-9e2b05bc8ed2 67.6M - 3.29G -
default/containers/sites@migration-95a473e1-b3ac-433b-af39-716d8f5ea567 205M - 3.42G -
default/containers/sites@snapshot-snapshot-1111111023-0 79.6M - 5.27G -
default/containers/sites@snapshot-snapshot-1212121023-0 77.4M - 5.40G -
default/containers/sites@snapshot-snapshot-1313131023-0 82.1M - 5.53G -
default/containers/sites@snapshot-snapshot-1414141023-0 72.1M - 5.67G -
default/containers/sites@snapshot-snapshot-1515151023-0 13.0M - 4.99G -
default/containers/sites@snapshot-snapshot-1616161023-0 11.2M - 5.12G -
default/containers/proxy@migration-b52819ad-22db-4a3f-880e-40df55f79ab9 62.6M - 528M -
default/containers/proxy@migration-21bb0807-3c9c-45b4-b692-d93969fffcf2 62.0M - 528M -
default/containers/proxy@migration-34d47244-ee34-469f-a59a-b51d49a5108c 63.9M - 528M -
default/containers/proxy@migration-9f4d3ff8-f55d-4d15-aa11-d84dea659bb3 58.3M - 528M -
default/containers/proxy@migration-d48226e2-3c87-4dcd-99ae-5b5dbd561430 58.7M - 528M -
default/containers/proxy@migration-9f5ac97f-b523-4b51-80de-1bec3ad7f702 2.62M - 529M -
default/containers/proxy@migration-c1f77c64-988b-43ee-8186-61694c0ba571 2.65M - 529M -
default/containers/proxy@migration-90c4bf05-a086-4db2-af10-63572afd9386 73.0M - 530M -
default/containers/proxy@migration-c4124144-b354-4cee-a2f2-05ab97e86109 70.9M - 536M -
default/containers/proxy@migration-7c8f440b-a58a-4850-8657-ac9fded3f126 66.3M - 531M -
default/containers/proxy@migration-55dc2f94-f770-4fbe-a1d0-f9d65deb7b5a 56.1M - 529M -
default/containers/proxy@migration-4ff8cebb-6ed2-4183-9a6c-d7c8b27d6e92 56.6M - 529M -
default/containers/proxy@snapshot-snapshot-1111111023-0 74.6M - 530M -
default/containers/proxy@snapshot-snapshot-1212121023-0 2.14M - 531M -
default/containers/proxy@snapshot-snapshot-1313131023-0 2.51M - 533M -
default/containers/proxy@snapshot-snapshot-1414141023-0 75.0M - 537M -
default/containers/proxy@snapshot-snapshot-1515151023-0 57.2M - 539M -
default/containers/proxy@snapshot-snapshot-1616161023-0 57.1M - 537M -

when i deleted all the snapshots the disk space did not decrease. The problem is not inside the container, I have certainly limited the log level, I have repeatedly searched for problems inside, but I do not find anything that increases the disk space.

the only thing i noticed is these files that are created in the zfs file system

default/containers/proxy@migration-55dc2f94-f770-4fbe-a1d0-f9d65deb7b5a

default/containers/sites@migration-93f59a63-f099-422d-9b5d-a3c207729411 296M - 2.50G -

please help and advise what are these files and how to proceed, can I delete them with zfs destroy and how do I know which ones to deleteĐż

There was a bug in LXD that left behind those optimized volume @migration- snapshots when doing a migration.

So they can be manually deleted.

What LXD version are you using?

1 Like

Thanks for the fast answer @tomp . I want to ask how to fix this bug or if not how to determine which @migration files to delete. I guess I can do it with a zfs destroy.

#sudo lxd --version
[sudo] password for sites:
5.10

You can delete all of the @migration ones. the current LXD shouldn’t be leaving anymore of them of them behind.

1 Like

how to do that ?

The ZFS and BTRFS storage drivers were already not doing optimised transfers in the final stage of multi-sync mode (just returning nil). Which causes ZFS temporary snapshots to be left behind. So make this invocation type an error, and detect the use of non-optimized transfer mode earlier to avoid using MultiSync=true in the first place.

This then resolves the issue of the temporary snapshots not being cleaned up on the source.

When i delete all @migration file I cannot copy container to backend server -

$ lxc copy --mode=push --refresh --stateless easyliving office:easyliving --verbose
Error: Failed creating instance record: Unknown configuration key: volatile.last_state.ready
what can i do now

I think that is unrelated error, it suggests the target server is older than the source, see:

It’s realy was older version of lxd 5.0.2. Front end server is lxd 5.10 with Ubuntu 18.04.6 LTS and backup server is 5.0.2 with Ubuntu 22.04.1 LTS

i successfully transferred one container but the other is giving an error i don’t know what the problem is anymore

lxc copy proxy office:proxy-t

Error: Failed instance creation: Error transferring instance data: migration dump failed
(00.210335) Error (criu/sk-netlink.c:77): netlink: The socket has data to read
(00.210371) Error (criu/cr-dump.c:1635): Dump files (pid: 3372) failed with -1
(00.233758) Error (criu/cr-dump.c:2053): Dumping FAILED.

or

$ lxc copy --mode=push --refresh --stateless proxy office:proxy --verbose

Error: User signaled us three times, exiting. The remote operation will keep running
$

hi doing nothing

Live migration for containers doesn’t work currently.

You should disable CRIU on both systems using:

sudo snap unset criu.enable
sudo systemctl reload snap.lxd.daemon

Separately it looks like --stateless and --refresh aren’t working when combined.

I execute command llxc copy --mode=push --refresh --stateless easyliving office:easyliving --verbose its not copy container to remote server. steel does nothing only black screen, then i execute to stop him and execute

lxc copy --mode=push proxy office:proxy --verbose

But I see its create some container with same name on remote server with the front commands it didn’t even create the name and apparently the command doesn’t end for some reason. I stopped the command after 30 minutes of waiting. I logged into the remote server where I need to copy the container, I tried to run the container but it won’t start. I deleted the newly created name from the command lxc copy --mode=push proxy office:proxy --verbose

then i ran the command
lxc copy --mode=push --refresh --stateless proxy office:proxy --verbose
which completed successfully

Does it work from a LXD 5.0.2 server to LXD 5.0.2 backup server?

We don’t generally support migrating to an older server.

copying from newer to older server worked perfectly(from 5.10 to 5.0.2) but the problem was those @migration files
now I updated the backup server and leveled the versions but after deleting the migration files I couldn’t copy from primary to secondary server when I type the command
lxc copy --mode=push --refresh --stateless proxy office:proxy --verbose didn’t do anything didn’t even create the name in the backend server
I removed --refresh --stateless from the command
lxc copy --mode=push proxy office:proxy --verbose
then it created the backend server name but it didn’t copy the container i waited about 30 min to see if the command will complete. after seeing that there was no result, I stopped the command, deleted the name of the container from the backend server, restarted the copying with
lxc copy --mode=push --refresh --stateless proxy office:proxy --verbose
the container is copied successfully