Hi, I rebooted my server yesterday, now I’m not getting any connection to the local socket.
sudo lxc list
Error: LXD unix socket not accessible: Get "http://unix.socket/1.0": EOF
I think maybe the zfs pool storage is broken.
sudo zpool status -v
pool: lxd
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 9h3m with 1 errors on Sun Apr 14 09:27:53 2024
config:
NAME STATE READ WRITE CKSUM
lxd ONLINE 0 0 0
/var/snap/lxd/common/lxd/disks/lxd.img ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
lxd/containers/amr:/rootfs/home/amr/Data/manipulated_sequences/Deepfakes/c40/videos/DF_C40/41756.png
I do sudo zpool scrub
twice, but the error is still there, so I mount it and delete by hand craft.
After that, I do sudo zpool scrub
and sudo zpool clear
, the zpool error disappear.
sudo zpool status -v
pool: lxd
state: ONLINE
scan: scrub repaired 0B in 9h17m with 0 errors on Sun Apr 21 05:41:04 2024
config:
NAME STATE READ WRITE CKSUM
lxd ONLINE 0 0 0
/var/snap/lxd/common/lxd/disks/lxd.img ONLINE 0 0 0
errors: No known data errors
but lxc list
still get error.
Error: LXD unix socket "/var/snap/lxd/common/lxd/unix.socket" not accessible: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: resource temporarily unavailable
The debug info is here.
sudo lxd --debug --group lxd
DEBUG [2024-04-21T08:08:14+08:00] Connecting to a local LXD over a Unix socket
DEBUG [2024-04-21T08:08:14+08:00] Sending request to LXD etag= method=GET url="http://unix.socket/1.0"
INFO [2024-04-21T08:08:14+08:00] LXD is starting mode=normal path=/var/snap/lxd/common/lxd version=5.21.1
INFO [2024-04-21T08:08:14+08:00] Kernel uid/gid map:
INFO [2024-04-21T08:08:14+08:00] - u 0 0 4294967295
INFO [2024-04-21T08:08:14+08:00] - g 0 0 4294967295
INFO [2024-04-21T08:08:14+08:00] Configured LXD uid/gid map:
INFO [2024-04-21T08:08:14+08:00] - u 0 1000000 1000000000
INFO [2024-04-21T08:08:14+08:00] - g 0 1000000 1000000000
INFO [2024-04-21T08:08:14+08:00] Kernel features:
INFO [2024-04-21T08:08:14+08:00] - closing multiple file descriptors efficiently: no
INFO [2024-04-21T08:08:14+08:00] - netnsid-based network retrieval: yes
INFO [2024-04-21T08:08:14+08:00] - pidfds: no
INFO [2024-04-21T08:08:14+08:00] - core scheduling: no
INFO [2024-04-21T08:08:14+08:00] - uevent injection: yes
INFO [2024-04-21T08:08:14+08:00] - seccomp listener: yes
INFO [2024-04-21T08:08:14+08:00] - seccomp listener continue syscalls: yes
INFO [2024-04-21T08:08:14+08:00] - seccomp listener add file descriptors: no
INFO [2024-04-21T08:08:14+08:00] - attach to namespaces via pidfds: no
INFO [2024-04-21T08:08:14+08:00] - safe native terminal allocation : yes
INFO [2024-04-21T08:08:14+08:00] - unprivileged file capabilities: yes
INFO [2024-04-21T08:08:14+08:00] - cgroup layout: hybrid
WARNING[2024-04-21T08:08:14+08:00] - Couldn't find the CGroup blkio.weight, disk priority will be ignored
WARNING[2024-04-21T08:08:14+08:00] - Couldn't find the CGroup memory swap accounting, swap limits will be ignored
INFO [2024-04-21T08:08:14+08:00] - idmapped mounts kernel support: no
INFO [2024-04-21T08:08:14+08:00] Instance type operational driver=lxc features="map[]" type=container
ERROR [2024-04-21T08:08:14+08:00] Unable to run feature checks during QEMU initialization: Unable to locate the file for firmware "OVMF_CODE.4MB.fd"
WARNING[2024-04-21T08:08:14+08:00] Instance type not operational driver=qemu err="QEMU failed to run feature checks" type=virtual-machine
INFO [2024-04-21T08:08:14+08:00] Initializing local database
DEBUG [2024-04-21T08:08:14+08:00] Refreshing identity cache with local trusted certificates
INFO [2024-04-21T08:08:14+08:00] Set client certificate to server certificate fingerprint=7bfa6d5710e943f5f23524bcca9f0a51bb5f58f819d1b9fb3e1d843facc0a20b
DEBUG [2024-04-21T08:08:14+08:00] Initializing database gateway
INFO [2024-04-21T08:08:14+08:00] Starting database node id=1 local=1 role=voter
ERROR [2024-04-21T08:08:14+08:00] Failed to start the daemon err="Failed to start dqlite server: raft_start(): io: load closed segment 0000000000185550-0000000000185550: entries batch 45 starting at byte 487448: entries count in preamble is zero"
INFO [2024-04-21T08:08:14+08:00] Starting shutdown sequence signal=interrupt
INFO [2024-04-21T08:08:14+08:00] Not unmounting temporary filesystems (instances are still running)
INFO [2024-04-21T08:08:14+08:00] Daemon stopped
Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000000185550-0000000000185550: entries batch 45 starting at byte 487448: entries count in preamble is zero
What can I do to have this fixed permanently?
Thanks in advance!