Cluster storage failed without hardware failure (need help)

Hello. I made some mistake and now I need you help guys :slight_smile:

Basically im hosting two servers with LXD workloads for semi professional appliances. Everything seems fine since yesterday. I dropped one of my server about 5cm above shelf, but it was enough - something happened with sdd. I saw some issues in journal about filesystem errors. Reinstalling ssd in socket helped. Somehow…

None of my lxd containers/vms dont want to start.

$ lxc start tank
Error: saving config file for the container failed
Try `lxc info --show-log tank` for more info
$ lxc info --show-log tank
Name: tank
Status: STOPPED
Type: container
Architecture: x86_64
Location: hammer
Created: 2023/09/23 14:11 UTC
Last Used: 2024/04/09 14:23 UTC

Log:

Yep. No logs at all.

Here is what I found:
Those are logs from journal. I dont think they are important

Apr 09 17:41:20 hammer systemd-networkd[827]: veth986feada: Link UP
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 13 seen, reloading interface list
Apr 09 17:41:20 hammer systemd-udevd[3107]: Using default interface naming scheme 'v249'.
Apr 09 17:41:20 hammer NetworkManager[853]: <info>  [1712684480.7943] manager: (vethc310e3df): new Veth device (/org/freedesktop/NetworkManager/Devices/13)
Apr 09 17:41:20 hammer systemd-udevd[3109]: Using default interface naming scheme 'v249'.
Apr 09 17:41:20 hammer NetworkManager[853]: <info>  [1712684480.7958] manager: (veth986feada): new Veth device (/org/freedesktop/NetworkManager/Devices/14)
Apr 09 17:41:20 hammer kernel: br0: port 2(veth986feada) entered blocking state
Apr 09 17:41:20 hammer kernel: br0: port 2(veth986feada) entered disabled state
Apr 09 17:41:20 hammer kernel: device veth986feada entered promiscuous mode
Apr 09 17:41:20 hammer kernel: device veth986feada left promiscuous mode
Apr 09 17:41:20 hammer kernel: br0: port 2(veth986feada) entered disabled state
Apr 09 17:41:20 hammer systemd-networkd[827]: veth986feada: Link UP
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 14 seen, reloading interface list
Apr 09 17:41:20 hammer systemd-networkd[827]: veth986feada: Link DOWN
Apr 09 17:41:20 hammer NetworkManager[853]: <info>  [1712684480.9004] device (veth986feada): released from master device br0
Apr 09 17:41:20 hammer networkd-dispatcher[859]: ERROR:Unknown interface index 14 seen even after reload
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 14 seen, reloading interface list
Apr 09 17:41:20 hammer networkd-dispatcher[859]: ERROR:Unknown interface index 14 seen even after reload
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 14 seen, reloading interface list
Apr 09 17:41:20 hammer networkd-dispatcher[859]: ERROR:Unknown interface index 14 seen even after reload
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 14 seen, reloading interface list
Apr 09 17:41:20 hammer networkd-dispatcher[859]: ERROR:Unknown interface index 14 seen even after reload
Apr 09 17:41:20 hammer networkd-dispatcher[859]: WARNING:Unknown index 14 seen, reloading interface list
Apr 09 17:41:20 hammer networkd-dispatcher[859]: ERROR:Unknown interface index 14 seen even after reload
Apr 09 17:42:22 hammer systemd[1856]: Started snap.lxd.lxc-9e3af830-1c94-4088-88b5-58f65d499cfc.scope.

And here is somerthing more interesting and I think its root cause. When Im trying to edit any profile/configuration, anything in lxd here is what I get:

  - Project: default, Instance: smith-1: Failed to write backup file: Failed to create file "/var/snap/lxd/common/lxd/virtual-machines/smith-1/backup.yaml": open /var/snap/lxd/common/lxd/virtual-machines/smith-1/backup.yaml: read-only file system
 - Project: default, Instance: agent-1: Failed to write backup file: Failed to create file "/var/snap/lxd/common/lxd/virtual-machines/agent-1/backup.yaml": open /var/snap/lxd/common/lxd/virtual-machines/agent-1/backup.yaml: read-only file system

SSD is visible withing lsblk. sudo smartctl -H /dev/sda shows it passes all tests. Its formatted in btrfs and btrfs check ends with success

sda                         8:0    0   1.8T  0 disk
└─sda1                      8:1    0   1.8T  0 part

I would be really happy if there is anything I can do to restore that data.

Some informations about my setup:
Intel NUC + nvme + ssd (ssd was used for lxd storage exclusively, nvme for boot)
Ubuntu server 22.04 with LXD 5.20

I would love to restore data from two containers at least. I can setup host from scratch. I just want to restore that database… Any ideas or I already lose?

It finds out btrfs volume is actually broken. Problem from another domain, to closel

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.