Object storage: problem with 2 directories created in the same path in a storage volume

casperan · May 25, 2023, 9:31am

…Many moving parts here, so I’m reaching out for more minds to help identify the cause & resolve for this issue. Pardon me, for not clearly understanding where the problem resides as of yet.
…feel free to ask for more relevant information that you can think of.
…I haven’t created an issue on this yet, because I need to understand it better first.

Problem Illustration:

#min = minio mc, lxd = minio host bucket / alias.
min tree lxd
lxd
└─ backups
   ├─ velero
   │  └─ backups
   │     └─ nginx2-backup
   └─ velero 
      └─ backups
         └─ nginx-backup-2

sudo nsenter --mount=/run/snapd/ns/lxd.mnt
ls -lA /var/snap/lxd/common/lxd/storage-pools/kube/buckets/backups/minio/backups
total 0
drwxr-xr-x 1 lxd lxd 14 May 25 08:28 velero
drwxr-xr-x 1 lxd lxd 30 May 21 09:34 velero

Expected:

min tree lxd
lxd
└─ backups
   └─  velero
     └─ backups
        ├─ nginx-backup-2
        └─ nginx2-backup

sudo nsenter --mount=/run/snapd/ns/lxd.mnt
ls -lA /var/snap/lxd/common/lxd/storage-pools/kube/buckets/backups/minio/backups
total 0
drwxr-xr-x 1 lxd lxd 14 May 25 08:28 velero

Questions:

what could be the cause ?
how to prevent this ?
any ideas for resolve ?
any ideas for how to inspect this issue further ?
where is the problem: velero, minio, lxd, btrfs, linux, …other ?
can lxd protect integrity & prevent this from happening and be allowed ?

Environment:

host: ubuntu 22.04 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux, m2 sata ssd, the OS is also on a btrfs partition
lxd v5.13
lxd storage pool driver: btrfs on a partition
lxd storage bucket: backups
lxd object storage using shim/wrapped minio
lxd container instance running ubuntu22.04/cloud with k8s(kubeadm install), velero, nginx
velero v1.11.0

Context:

lxd-minio configured with core.storage_buckets_address

lxd container created

k8s installed with kubeadm
nginx installed with kubectl
velero installed with velero install using bucket: backups, prefix: velero
velero backup create nginx-backup2 --selector app=nginx

lxd container deleted
lxd container created

k8s installed with kubeadm
nginx installed with kubectl
velero installed with helm using bucket: backups, prefix: velero
velero backup create nginx2-backup --selector app=nginx

tomp · May 25, 2023, 10:22am

Please can you show from a fresh installation of LXD the reproducer commands using lxc to end up in this scenario?

Thanks

tomp · May 25, 2023, 10:24am

Also, I am not clear on the problem you are describing, please can you explain it further?

casperan · May 27, 2023, 9:08pm

No reproducer.
I will let this rest for now. When I did it again, both with s3cmd and velero it didn’t create directories with the same path on the partition or in minio mc or s3cmd listing.

This time I followed the following approach for recovering (listing it here for reference)

lxc config set … && lxc network create|edit … && lxc profile create|edit …
lxd recover pools, volumes, … (NB: if no volumes, and only buckets, recovery will not be started)
sudo nsenter --mount=/run/snapd/ns/lxd.mnt – mv …buckets/backups …buckets/backups.b4
lxc storage bucket create …
sudo nsenter --mount=/run/snapd/ns/lxd.mnt – mv …buckets/backups.b4/minio/backups/velero …buckets/backups/minio/backups/velero
sudo nsenter --mount=/run/snapd/ns/lxd.mnt – rm -rf …buckets/backups.b4

I could then start the restored container instances with k8s nodes and upgrade velero in the k8s cluster with the new storage bucket key credentials, and the system was restored and produced as expected.

I suspect the culprit lies somewhere in that I must have used a different approach before as I got flustered by uncertainties when the lxd recovery of buckets etc left some unknowns I first needed to get my head around. In this process I may have tried to delete the new backups bucket and entirely replace it with the b4 backups bucket, and tried to delete and recreate bucket keys to be in sync with those that were applied to the cluster and workloads before. I am not sure, but will let it rest for now, and see what lxd recover has in store moving forward for buckets. Thanks for tipping me to double check with s3cmd too.