LXC EXPORT is CRAP SLOW on large Containers and worse it eats up your storage in both local and zpool and never releases it. This needs to be fixed! Is there a better way?

First, the problem. I have 4 Servers with hundreds of containers. That are in production. I am not playing around here. I need to be able to backup my container without being an issue or causing problems.

Well, using LXC export does the following. On small containers under 10GB it seem to work ok, though it takes way too long over 15 minutes in some cases. I have all the small containers in one server and they seem to work best.

The problem is the bigger containers over 50G, not only do they take over a couple hour of hours but many never finish. The Export command seems to eat up local storage in massive amounts, and Zpool storage also. But the problem is local storage, worse after it finishes or doesn’t finish because it crashes it does not remove temp files somewhere.

If I start out with 200G free on local and 300g free on Zpool, after backup attempt I am left with 50G local and 150 free in Zpool. Basically I am in a critical situation because the only container on this server is 50G and it needs room to grow and I have found that without free space they crash.

On one server I was able to free up 100g by going into /var/snap/lxd/comon/lxd/backups and deleting a file from an aborted backup. It seems Lxc export unpacks the container into a local directory to then zip into a file. A very very time and storage intensive solution. There has to be a better way. Perhaps lxc export should be zipping container in zpool before it exports.

BTW, I am running these on massive dual cpu servers with plenty of ram and ssd drives, so performance is not a problem for any operation beside LXC export.

In a production environment, a system must not only run, but you have to back it up regularly. And yes you can do Raids, which I have. But you have to have physical backups that work. It is unacceptable that you have to take down a container (a client) to back it up and worse that it take hours. This is something that you have to be able to do every night hopefully, reliably and automatically without fear.

So far I have tried three methods.

  1. Stopping Container and doing Lxc export to a hard drive not the same as zpool drive or local. ssd. Does not work well with large containers.
  2. Stopping container and doing a lxc copy to another servver, takes a while but works, but problem is that if cluster crashes, I still dont have anything. Container is still unusable.
  3. Mounting container zpool directly and copy files directly to hard drive via Rsync. This is what I was doing before lxc export, but I am afraid it might damage container or other issues, with stability. But right now seems best way.

Backing up a container (no matter its size) should not be a big deal. and it is something that should be done regularly.

Is there a better way?

What are these files all about, can I delete them.

root@Q3:/var/snap/lxd/common/lxd/images# ls -l
total 183112
-rw------- 1 root root 784 Mar 3 19:49 8c4e87e53c024e0449003350f0b0626b124b68060b73c0a7ad9547670e00d4b3
-rw------- 1 root root 187502592 Mar 3 19:49 8c4e87e53c024e0449003350f0b0626b124b68060b73c0a7ad9547670e00d4b3.rootfs
root@Q3:/var/snap/lxd/common/lxd/images#

They are image files, they should have a matching entry in lxc image list.
If you don’t want the image anymore, use lxc image delete to delete it.

Are these some kind of orphan images. They don’t show up in lxc list images
?

How do I recover my l lost disk space. I had 150G before backup attempt now there is 67G. There are no temp files or img files that are hoarding space. Where could it be hiding?

Can you show lxd sql global "SELECT * FROM images;"

I guess it shows Ubuntu 18.04 image, which is ok

lxd sql global “SELECT * FROM images;”
±—±-----------------------------------------------------------------±----------------------------------------------±---------------±-------±-------------±--------------------------±--------------------------±------------------------------------±-------±------------------------------------±------------±-----------±-----+
| id | fingerprint | filename | size | public | architecture | creation_date | expiry_date | upload_date | cached | last_use_date | auto_update | project_id | type |
±—±-----------------------------------------------------------------±----------------------------------------------±---------------±-------±-------------±--------------------------±--------------------------±------------------------------------±-------±------------------------------------±------------±-----------±-----+
| 1 | 8c4e87e53c024e0449003350f0b0626b124b68060b73c0a7ad9547670e00d4b3 | ubuntu-18.04-server-cloudimg-amd64-lxd.tar.xz | 1.87503376e+08 | 0 | 2 | 2020-02-17T19:00:00-05:00 | 2023-04-25T20:00:00-04:00 | 2020-03-03T19:48:27.039132839-05:00 | 1 | 2020-03-14T14:49:51.122221263-04:00 | 1 | 1 | 0 |
±—±-----------------------------------------------------------------±----------------------------------------------±---------------±-------±-------------±--------------------------±--------------------------±------------------------------------±-------±------------------------------------±------------±-----------±-----+

Yeah but then it should also show up in lxc image list unless you’re using projects.

Yes it show up in lxc image list

Could the problem be zpool needs some kind of packing

Any luck with this? lxd sounds great, but I’m starting to get concerned that it does not run reliably in its preferred filesystem (ZFS).

It’s solid with ZFS, don’t be scared.

However don’t expect the backups to be point and click like proxmox, it takes a bit more creativity.

I find the easiest way is to use syncoid to do ZFS replication to another DR host.

Also also find a humble bash script seems to be the best way of backup directly to a file without going through the pain of filling up the local disks with LXC export (btw I think you can change the temp folder/disk), which is good for small containers but for large ones I go with my bash script, something like this:

sudo /sbin/zfs send zfs1/containers/eve-ng@snapshot-snap13 | /usr/bin/mbuffer | /usr/bin/pigz -9 | /usr/bin/mbuffer > eve-ng.tar.xz

and for replication to another zfs host, something like this:

#!/bin/bash

LOCALSTORAGEPOOL=zfs1
REMOTESTORAGEPOOL=backups
LOCALHOST=p68
REMOTEHOST=p67
CIPHER=aes128-cbc
SSHPORT=22000
USERNAME=admin

containers=()
while IFS= read -r line; do
        containers+=( "$line" )
done < <( lxc list --format json | jq -r   ' .[] | select(.status =="Running") | .name' )

function replicate() {
    for C in "${containers[@]}"
    do
      syncoid $LOCALSTORAGEPOOL/containers/$C $USERNAME@$REMOTEHOST:$REMOTESTORAGEPOOL/containers/$C -c=$CIPHER -sshport=$SSHPORT

    done

}