LXD container filesystem read only

amikhalitsyn · November 25, 2022, 3:48pm

No-no, that’s a problem on btrfs level. Something is wrong with your loop device. Let me think about possible ways to recover from that.

genesis96839 · November 25, 2022, 4:00pm

Sure, thanks

amikhalitsyn · November 25, 2022, 4:00pm

But inside the container it still shows that 45GB of disk space is left.

where have you seen that? df -h / lsblk? Couldn’t you show output from that thing?

genesis96839 · November 25, 2022, 4:01pm

Inside the container:

df -h:

Filesystem      Size  Used Avail Use% Mounted on
/dev/loop27     1.1T  1.1T   45G  96% /
none            492K  4.0K  488K   1% /dev
udev             63G     0   63G   0% /dev/fuse
tmpfs           100K     0  100K   0% /dev/lxd
/dev/nvme0n1p2  1.9T  1.5T  272G  85% /dev/nvidia2
tmpfs           100K     0  100K   0% /dev/.lxd-mounts
tmpfs            63G     0   63G   0% /dev/shm
tmpfs            13G  132K   13G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            63G     0   63G   0% /sys/fs/cgroup

lsblk:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0         7:0    0   2.5M  1 loop 
loop1         7:1    0  49.7M  1 loop 
loop2         7:2    0  70.4M  1 loop 
loop3         7:3    0     4K  1 loop 
loop4         7:4    0 414.4M  1 loop 
loop5         7:5    0   704K  1 loop 
loop6         7:6    0  55.6M  1 loop 
loop7         7:7    0   476K  1 loop 
loop8         7:8    0   696K  1 loop 
loop9         7:9    0  72.8M  1 loop 
loop10        7:10   0 346.3M  1 loop 
loop11        7:11   0   2.6M  1 loop 
loop12        7:12   0   219M  1 loop 
loop13        7:13   0  81.3M  1 loop 
loop14        7:14   0  55.6M  1 loop 
loop15        7:15   0 446.3M  1 loop 
loop16        7:16   0  63.2M  1 loop 
loop17        7:17   0   219M  1 loop 
loop18        7:18   0  63.2M  1 loop 
loop19        7:19   0   2.6M  1 loop 
loop20        7:20   0    48M  1 loop 
loop21        7:21   0 136.7M  1 loop 
loop22        7:22   0 346.3M  1 loop 
loop23        7:23   0   1.5M  1 loop 
loop24        7:24   0 136.5M  1 loop 
loop25        7:25   0  91.7M  1 loop 
loop26        7:26   0   556K  1 loop 
loop27        7:27   0   1.1T  0 loop /snap
loop28        7:28   0 139.7G  0 loop 
sda           8:0    0 931.5G  0 disk 
├─sda1        8:1    0   512M  0 part 
└─sda2        8:2    0   931G  0 part 
nvme0n1     259:0    0   1.9T  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part 
└─nvme0n1p2 259:2    0   1.9T  0 part /dev/nvidia3

tomp · November 25, 2022, 4:04pm

Thats about right 1.05TiB-1.00TiB is approx 45GB.

Perhaps its not full and its nothing to do with it, but its certainly very close, and may not be a coincidence.

genesis96839 · November 25, 2022, 4:14pm

I also thought if that could be an issue and tried increasing the volume size further but couldn’t do as it said it is a read only filesystem… Although I feel 45GB is a decent space.

tomp · November 25, 2022, 4:16pm

Well now its corrupted you won’t be able to do anything with it unless BTRFS tooling allows it to be fixed. Otherwise it will need to be wiped and then the instances re-imported from backup.

As its read only, you could try stopping the instances and doing lxc export <instance> <file> and then try reimporting them into a new pool.

I would suggest doing this anyway, in case any repair step you take make things worse.
A search for btrfs_free_extent show people in similar situations,

amikhalitsyn · November 25, 2022, 4:22pm

For future: IMHO, loop device is not a best way to store 1TB volumes. It’s better to use the btrfs shared with the host or on a separate device. It’s really hard to manage so big raw images.

You need to stop all containers which are using this loop device. Then perform (on host):

# IMPORTANT!!! 
# stop all containers which are using loop device
# make BACKUP of your image or valuable data. Because recovery attempt may end up with data loss.
#create loop device on the host
losetup -L -f --show /var/snap/lxd/common/lxd/disks/vol-ce9e2a76-8b0c-437b-81d4-4c4f47f147e7.img
# then try /dev/loopN from output of previous command
# this will only check filesystem without modification
btrfs check /dev/loopN
# this will try repair. REALLY dangerous! Data loss is possible!
btrfs check --repair -p /dev/loopN
# detach device and then try to start the container
losetup -d /dev/loopN

genesis96839 · November 25, 2022, 4:24pm

Ohk. Is there any other way to recover from current state? As I would not be able to export as there isn’t enough storage on the system to keep both container data and exported file.

Also, is there any way to avoid this in future? This is not the first time I am facing this… Does ZFS offer a superior solution compared to BTRFS for LXD containers?

amikhalitsyn · November 25, 2022, 4:24pm

And again. Just to be on the safe side. @genesis96839, make backup of all valuable data before trying recovering btrfs. And… good luck (-:

genesis96839 · November 25, 2022, 4:24pm

Thanks. Let me try this.

genesis96839 · November 25, 2022, 4:25pm

Yes, I’ve transferred most of the data on a secondary disk for now. Will try these commands now.

tomp · November 25, 2022, 4:28pm

BTRFS isn’t well known for its reliability sadly.

genesis96839 · November 25, 2022, 5:14pm

I tried the commands after stopping the container but received errors:

losetup -L -f --show /var/snap/lxd/common/lxd/disks/vol-ce9e2a76-8b0c-437b-81d4-4c4f47f147e7.img

Response:

losetup: /var/snap/lxd/common/lxd/disks/vol-ce9e2a76-8b0c-437b-81d4-4c4f47f147e7.img: failed to re-use loop device

sudo btrfs check /dev/loop27

Response:

ERROR: cannot open device '/dev/loop27': Device or resource busy
ERROR: cannot open file system

sudo btrfs check --repair -p /dev/loop27

Response:

ERROR: cannot open device '/dev/loop27': Device or resource busy
ERROR: cannot open file system

genesis96839 · November 25, 2022, 6:26pm

So would you recommend zfs next time in container deployment?

genesis96839 · November 25, 2022, 7:01pm

As an update:
This worked!! Wow I was not expecting this to get resolved. Thank you so much

I had stopped the container but the loop device was still in use. So to make it work, I stopped snap lxd deamon. Then it got detached and I was able to run these commands.

It did delete 2GB of data and I am not sure it went from which folder. But anyways it recovered 99% of the container.

Thanks again!

tomp · November 28, 2022, 8:21am

@stgraber has made a whole series of videos covering the pros and cons of each of the storage drivers that LXD supports: