Okay, then you probably want to use the big hammer and do snap disable lxd
, then reboot the system and use btrfsck
at that point when you have a clean kernel and have never tried mounting or using the btrfs volume on that boot.
i run btrfsck /dev/loop15
Opening filesystem to check…
ERROR: could not check mount status: No such device or address
Yep, that’s normal, the loop device isn’t setup yet.
Does btrfsck /var/snap/lxd/common/lxd/disks/default.img
work?
That worked!
btrfsck /var/snap/lxd/common/lxd/disks/default.img
Opening filesystem to check…
Checking filesystem on /var/snap/lxd/common/lxd/disks/default.img
UUID: 37ab66ba-6522-43ce-adcc-024792370708
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 55864983552
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 59086209024
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 80561045504
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 98413187072 bytes used, no error found
total csum bytes: 95433292
total tree bytes: 607059968
total fs tree bytes: 475201536
total extent tree bytes: 23707648
btree space waste bytes: 92449963
file data blocks allocated: 128041459712
referenced 114545954816
Ok, so you’ll want to run it again as btrfsck --repair /var/snap/lxd/common/lxd/disks/default.img
btrfsck --repair /var/snap/lxd/common/lxd/disks/default.img
enabling repair mode
WARNING:
Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. Eg.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check…
Checking filesystem on /var/snap/lxd/common/lxd/disks/default.img
UUID: 37ab66ba-6522-43ce-adcc-024792370708
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
No device size related problem found
[3/7] checking free space cache
cache and super generation don’t match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 98413187072 bytes used, no error found
total csum bytes: 95433292
total tree bytes: 607059968
total fs tree bytes: 475201536
total extent tree bytes: 23707648
btree space waste bytes: 92449963
file data blocks allocated: 128041459712
referenced 114545954816
Ok, that’s encouraging.
Try snap enable lxd
and lxc info
to see if LXD comes back up.
Succesfully enabled!
lxc info
config:
core.https_address: ‘[::]:8443’
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
api_status: stable
api_version: “1.0”
auth: trusted
public: false
auth_methods: - tls
environment:
addresses:- 192.168.0.126:8443
- 10.0.3.1:8443
- 10.112.7.1:8443
- ‘[fd42:c2bb:440f:2b97::1]:8443’
architectures: - x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIICADCCAYagAwIBAgIQDvQAvOYhJ4lS3iW/1Mga3zAKBggqhkjOPQQDAzAzMRww
GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRMwEQYDVQQDDApyb290QGNoaWNv
MB4XDTIwMTExMTE2NDUzMFoXDTMwMTEwOTE2NDUzMFowMzEcMBoGA1UEChMTbGlu
dXhjb250YWluZXJzLm9yZzETMBEGA1UEAwwKcm9vdEBjaGljbzB2MBAGByqGSM49
AgEGBSuBBAAiA2IABG2XBRUONBDaXlhGzoA7802xlZEY2z8hzx/XeRyOywxbaItb
f8iKu3Ixvfx0TS0t/6BcaivfQOzcwumZYkX796yp5AopRQtUVuxfjlYyYCOxayud
Qc+WPp4YqIgxVpeY9qNfMF0wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoGCCsG
AQUFBwMBMAwGA1UdEwEB/wQCMAAwKAYDVR0RBCEwH4IFY2hpY2+HBH8AAAGHEAAA
AAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaAAwZQIxAJFr209OzEGrzfYCuafw
veeWUpfx8pn+sfLBB4+tfA/b25hKctTbtEfMaaWznHlagQIwe4uMLDbxz4Ll2Cet
s9WwjyXaISptN1ryD54IaZBMihgZQVaNvAePj5+YkTnYuvtk
-----END CERTIFICATE-----
certificate_fingerprint: 04090c253d2e71917cfdbfdaf9a36d2702276d4fe2c38362d5f841d2f267e626
driver: lxc
driver_version: 4.0.5
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: “true”
seccomp_listener: “true”
seccomp_listener_continue: “true”
shiftfs: “false”
uevent_injection: “true”
unpriv_fscaps: “true”
kernel_version: 5.4.0-54-generic
lxc_features:
cgroup2: “true”
devpts_fd: “true”
mount_injection_file: “true”
network_gateway_device_route: “true”
network_ipvlan: “true”
network_l2proxy: “true”
network_phys_macvlan_mtu: “true”
network_veth_router: “true”
pidfd: “true”
seccomp_allow_deny_syntax: “true”
seccomp_notify: “true”
seccomp_proxy_send_notify_fd: “true”
os_name: Ubuntu
os_version: “20.04”
project: default
server: lxd
server_clustered: false
server_name: chico
server_pid: 7689
server_version: “4.8”
storage: btrfs
storage_version: 4.15.1
Ok, so far so good, you can see if your container feels like starting now.
If this blows up again, then we’ll need to disable+reboot again, do the btrfsck repair again, manually mount the pool and run a full scrub this time.
It did not work. I’m trying to mount the storage pool manually with:
mount /var/snap/lxd/common/lxd/storage-pools/default /mnt
mount: /mnt: /var/snap/lxd/common/lxd/storage-pools/default is not a block device
mount /var/snap/lxd/common/lxd/disks/default.img /mnt
That will mount the pool on /mnt
at which point you can run btrfs scrub start /mnt
btrfs scrub start /mnt
scrub started on /mnt, fsid 37ab66ba-6522-43ce-adcc-024792370708 (pid=7570).
That worked, should i enable lxd now?
Now, you need to monitor it with btrfs scrub status /mnt
The output:
btrfs scrub status /mnt
UUID: 37ab66ba-6522-43ce-adcc-024792370708
Scrub started: Mon Nov 30 17:45:15 2020
Status: running
Duration: 0:01:40
Time left: 0:00:32
ETA: Mon Nov 30 17:47:28 2020
Total to scrub: 92.22GiB
Bytes scrubbed: 69.49GiB
Rate: 711.58MiB/s
Error summary: csum=15787872
Corrected: 0
Uncorrectable: 15787872
Unverified: 0
Yeah, so about halfway through and so far looking very much not good…
What exactly did you run the first time you attempted to resize that pool?
Because it seems like you somehow erase a big chunk in the middle of it…
btrfs is functional enough to detect the damage but there’s really not much it can do about the data that got wiped…
yeah! i followed this topic (Snap) LXD Resize default BTRFS storage pool. I did your suggestions and then tried bluis8’s as well. So, what would be the best approach for this.
Please tell me that when following my suggestion you updated the seek= to point to the end of your existing disk?
if you didn’t, then that command would have wiped everything after the 10GB mark on your drive.
bluis8’s suggestion is better and is what is in our official documentation these days:
https://linuxcontainers.org/lxd/docs/master/storage#growing-a-loop-backed-btrfs-pool
It looks like by mistake that’s what i did.
But i really appreciate your help and time.
Thanks a lot for all this walk through.