ZFS Pool destroyed

Hello,

my zfs pool seems to have a bug due to my stupidity.

Background:
I copied a lxc container while the system speed drops dramatically. The copy process did not stop and I forced the process with crtl-c. I found out that the pool space could be the reason. So I increased the size (truncate +20G, autoexspand, ...). Then I tried to copy the container again, with the same result (system speed drops, cp process did not finished). Then I had an bad feeling that all ZFS pools are over the capacity limit of the SSD, so I tried to reduce the size of the ZFS pool (truncate -20G, ...) again. But the pool size was not reduced. Then I rebooted the system and the real problem began.

Status quo:
The system boots. But everything connected to lxd seems to be dead. When I type lxc list, I get an error: Get "http://unix.socket/1.0": net/http: timeout awaiting response headers after a few minutes. I get the same result when I type zpool status -vg or zfs list, …

The name of the pool I changed is secondpool.

Mar 14 20:16:16 server-zotac lxd.daemon[1838]: - loadavg_daemon
Mar 14 20:16:16 server-zotac lxd.daemon[1838]: - pidfds
Mar 14 20:16:17 server-zotac lxd.daemon[1578]: => Starting LXD
Mar 14 20:16:17 server-zotac kernel: NET: Registered PF_VSOCK protocol family
Mar 14 20:16:17 server-zotac lxd.daemon[1852]: time=“2023-03-14T20:16:17+01:00” level=warning msg=" - Couldn’t find the CGroup network priority controller, network priority will be ignored"
r 14 20:16:19 server-zotac kernel: ata1.00: exception Emask 0x50 SAct 0x7800 SErr 0x4c0900 action 0x6 frozen
Mar 14 20:16:19 server-zotac kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Mar 14 20:16:19 server-zotac kernel: ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk }
Mar 14 20:16:19 server-zotac kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Mar 14 20:16:19 server-zotac kernel: ata1.00: cmd 61/40:58:00:c9:10/05:00:25:00:00/40 tag 11 ncq dma 688128 out
** res 40/00:5c:00:c9:10/00:00:25:00:00/40 Emask 0x50 (ATA bus error)

Mar 14 20:16:19 server-zotac kernel: ata1.00: status: { DRDY }
Mar 14 20:16:19 server-zotac kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Mar 14 20:16:19 server-zotac kernel: ata1.00: cmd 61/c0:60:40:ce:10/02:00:25:00:00/40 tag 12 ncq dma 360448 out
res 40/00:5c:00:c9:10/00:00:25:00:00/40 Emask 0x50 (ATA bus error)
Mar 14 20:16:19 server-zotac kernel: ata1.00: status: { DRDY }
Mar 14 20:16:19 server-zotac kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Mar 14 20:16:19 server-zotac kernel: ata1.00: cmd 61/40:68:00:d1:10/05:00:25:00:00/40 tag 13 ncq dma 688128 out
res 40/00:5c:00:c9:10/00:00:25:00:00/40 Emask 0x50 (ATA bus error)
Mar 14 20:16:19 server-zotac kernel: ata1.00: status: { DRDY }
Mar 14 20:16:19 server-zotac kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Mar 14 20:16:19 server-zotac kernel: ata1.00: cmd 61/c0:70:40:d6:10/02:00:25:00:00/40 tag 14 ncq dma 360448 out
res 40/00:5c:00:c9:10/00:00:25:00:00/40 Emask 0x50 (ATA bus error)
Mar 14 20:16:19 server-zotac kernel: ata1.00: status: { DRDY }
Mar 14 20:16:19 server-zotac kernel: ata1: hard resetting link
Mar 14 20:16:20 server-zotac kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar 14 20:16:20 server-zotac kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20210730/psargs-330)
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: No Local Variables are initialized for Method [_GTF]
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: No Arguments are initialized for method [_GTF]
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: ACPI Error: Aborting method _SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Mar 14 20:16:20 server-zotac kernel: ata1.00: supports DRM functions and may not be fully accessible
Mar 14 20:16:20 server-zotac kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20210730/psargs-330)
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: No Local Variables are initialized for Method [_GTF]
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: No Arguments are initialized for method [_GTF]
Mar 14 20:16:20 server-zotac kernel:
Mar 14 20:16:20 server-zotac kernel: ACPI Error: Aborting method _SB.PCI0.SAT0.PRT0._GTF due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Mar 14 20:16:20 server-zotac kernel: ata1.00: supports DRM functions and may not be fully accessible
Mar 14 20:16:20 server-zotac kernel: ata1.00: configured for UDMA/133
Mar 14 20:16:20 server-zotac kernel: ahci 0000:00:17.0: port does not support device sleep
Mar 14 20:16:20 server-zotac kernel: ata1: EH complete
Mar 14 20:16:20 server-zotac lxd.daemon[1852]: time=“2023-03-14T20:16:20+01:00” level=error msg=“Cannot currently listen on https socket, re-trying once in 30s…” err="Bind network address: listen tcp 192.>
Mar 14 20:16:20 server-zotac kernel: bpfilter: Loaded bpfilter_umh pid 1904
Mar 14 20:16:20 server-zotac unknown: Started bpfilter
Mar 14 20:16:22 server-zotac zed[2718]: eid=27 class=checksum pool=‘secondpool’ vdev=secondpool.img algorithm=fletcher4 size=3072 offset=107901570560 priority=0 err=52 flags=0x180880 bookmark=0:0:0:0
Mar 14 20:16:22 server-zotac zed[2720]: eid=28 class=checksum pool=‘secondpool’ vdev=secondpool.img algorithm=fletcher4 size=512 offset=108947770880 priority=0 err=52 flags=0x180880 bookmark=0:1:0:0
Mar 14 20:16:22 server-zotac zed[2723]: eid=29 class=checksum pool=‘secondpool’ vdev=secondpool.img algorithm=fletcher4 size=4096 offset=108946208256 priority=0 err=52 flags=0x180880 bookmark=0:1:0:1
Mar 14 20:16:22 server-zotac kernel: WARNING: Pool ‘secondpool’ has encountered an uncorrectable I/O failure and has been suspended.
Mar 14 20:16:22 server-zotac zed[2813]: eid=30 class=data pool=‘secondpool’ priority=0 err=52 flags=0x808801 bookmark=0:31:0:1
Mar 14 20:16:22 server-zotac zed[2814]: eid=31 class=io_failure pool=‘secondpool’

In between I cloned the sdd to a lager one and incresed the partition size. But still the same issue.

Does anyone has a clou how to fix this issue?

BR,
Olli

Hmm, you seem to have 2 problems. One is an apparently dying SSD (those ATA errors are not normal) and the other is that you shrunk the backing file of the zpool. Maybe you can re-add those 20G back and hope they will use the same blocks but that sounds unlikely and highly dependent of the underlying filesystem.

If you have backups, I guess it’s time to test them, sorry :confused:

I found now way to recover my ZFS pool. Thus I started from scratch :frowning: