(Snap) LXD cannot Resize default BTRFS storage pool

Okay, then you probably want to use the big hammer and do snap disable lxd, then reboot the system and use btrfsck at that point when you have a clean kernel and have never tried mounting or using the btrfs volume on that boot.

i run btrfsck /dev/loop15
Opening filesystem to check…
ERROR: could not check mount status: No such device or address

Yep, that’s normal, the loop device isn’t setup yet.

Does btrfsck /var/snap/lxd/common/lxd/disks/default.img work?

That worked!

btrfsck /var/snap/lxd/common/lxd/disks/default.img
Opening filesystem to check…
Checking filesystem on /var/snap/lxd/common/lxd/disks/default.img
UUID: 37ab66ba-6522-43ce-adcc-024792370708
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 55864983552
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 59086209024
btrfs: csum mismatch on free space cache
failed to load free space cache for block group 80561045504
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 98413187072 bytes used, no error found
total csum bytes: 95433292
total tree bytes: 607059968
total fs tree bytes: 475201536
total extent tree bytes: 23707648
btree space waste bytes: 92449963
file data blocks allocated: 128041459712
referenced 114545954816

Ok, so you’ll want to run it again as btrfsck --repair /var/snap/lxd/common/lxd/disks/default.img

btrfsck --repair /var/snap/lxd/common/lxd/disks/default.img

enabling repair mode
WARNING:

Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. Eg.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.

10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check…
Checking filesystem on /var/snap/lxd/common/lxd/disks/default.img
UUID: 37ab66ba-6522-43ce-adcc-024792370708
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
No device size related problem found
[3/7] checking free space cache
cache and super generation don’t match, space cache will be invalidated
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 98413187072 bytes used, no error found
total csum bytes: 95433292
total tree bytes: 607059968
total fs tree bytes: 475201536
total extent tree bytes: 23707648
btree space waste bytes: 92449963
file data blocks allocated: 128041459712
referenced 114545954816

Ok, that’s encouraging.

Try snap enable lxd and lxc info to see if LXD comes back up.

Succesfully enabled!

lxc info
config:
core.https_address: ‘[::]:8443’
core.trust_password: true
api_extensions:

  • storage_zfs_remove_snapshots
  • container_host_shutdown_timeout
  • container_stop_priority
  • container_syscall_filtering
  • auth_pki
  • container_last_used_at
  • etag
  • patch
  • usb_devices
  • https_allowed_credentials
  • image_compression_algorithm
  • directory_manipulation
  • container_cpu_time
  • storage_zfs_use_refquota
  • storage_lvm_mount_options
  • network
  • profile_usedby
  • container_push
  • container_exec_recording
  • certificate_update
  • container_exec_signal_handling
  • gpu_devices
  • container_image_properties
  • migration_progress
  • id_map
  • network_firewall_filtering
  • network_routes
  • storage
  • file_delete
  • file_append
  • network_dhcp_expiry
  • storage_lvm_vg_rename
  • storage_lvm_thinpool_rename
  • network_vlan
  • image_create_aliases
  • container_stateless_copy
  • container_only_migration
  • storage_zfs_clone_copy
  • unix_device_rename
  • storage_lvm_use_thinpool
  • storage_rsync_bwlimit
  • network_vxlan_interface
  • storage_btrfs_mount_options
  • entity_description
  • image_force_refresh
  • storage_lvm_lv_resizing
  • id_map_base
  • file_symlinks
  • container_push_target
  • network_vlan_physical
  • storage_images_delete
  • container_edit_metadata
  • container_snapshot_stateful_migration
  • storage_driver_ceph
  • storage_ceph_user_name
  • resource_limits
  • storage_volatile_initial_source
  • storage_ceph_force_osd_reuse
  • storage_block_filesystem_btrfs
  • resources
  • kernel_limits
  • storage_api_volume_rename
  • macaroon_authentication
  • network_sriov
  • console
  • restrict_devlxd
  • migration_pre_copy
  • infiniband
  • maas_network
  • devlxd_events
  • proxy
  • network_dhcp_gateway
  • file_get_symlink
  • network_leases
  • unix_device_hotplug
  • storage_api_local_volume_handling
  • operation_description
  • clustering
  • event_lifecycle
  • storage_api_remote_volume_handling
  • nvidia_runtime
  • container_mount_propagation
  • container_backup
  • devlxd_images
  • container_local_cross_pool_handling
  • proxy_unix
  • proxy_udp
  • clustering_join
  • proxy_tcp_udp_multi_port_handling
  • network_state
  • proxy_unix_dac_properties
  • container_protection_delete
  • unix_priv_drop
  • pprof_http
  • proxy_haproxy_protocol
  • network_hwaddr
  • proxy_nat
  • network_nat_order
  • container_full
  • candid_authentication
  • backup_compression
  • candid_config
  • nvidia_runtime_config
  • storage_api_volume_snapshots
  • storage_unmapped
  • projects
  • candid_config_key
  • network_vxlan_ttl
  • container_incremental_copy
  • usb_optional_vendorid
  • snapshot_scheduling
  • container_copy_project
  • clustering_server_address
  • clustering_image_replication
  • container_protection_shift
  • snapshot_expiry
  • container_backup_override_pool
  • snapshot_expiry_creation
  • network_leases_location
  • resources_cpu_socket
  • resources_gpu
  • resources_numa
  • kernel_features
  • id_map_current
  • event_location
  • storage_api_remote_volume_snapshots
  • network_nat_address
  • container_nic_routes
  • rbac
  • cluster_internal_copy
  • seccomp_notify
  • lxc_features
  • container_nic_ipvlan
  • network_vlan_sriov
  • storage_cephfs
  • container_nic_ipfilter
  • resources_v2
  • container_exec_user_group_cwd
  • container_syscall_intercept
  • container_disk_shift
  • storage_shifted
  • resources_infiniband
  • daemon_storage
  • instances
  • image_types
  • resources_disk_sata
  • clustering_roles
  • images_expiry
  • resources_network_firmware
  • backup_compression_algorithm
  • ceph_data_pool_name
  • container_syscall_intercept_mount
  • compression_squashfs
  • container_raw_mount
  • container_nic_routed
  • container_syscall_intercept_mount_fuse
  • container_disk_ceph
  • virtual-machines
  • image_profiles
  • clustering_architecture
  • resources_disk_id
  • storage_lvm_stripes
  • vm_boot_priority
  • unix_hotplug_devices
  • api_filtering
  • instance_nic_network
  • clustering_sizing
  • firewall_driver
  • projects_limits
  • container_syscall_intercept_hugetlbfs
  • limits_hugepages
  • container_nic_routed_gateway
  • projects_restrictions
  • custom_volume_snapshot_expiry
  • volume_snapshot_scheduling
  • trust_ca_certificates
  • snapshot_disk_usage
  • clustering_edit_roles
  • container_nic_routed_host_address
  • container_nic_ipvlan_gateway
  • resources_usb_pci
  • resources_cpu_threads_numa
  • resources_cpu_core_die
  • api_os
  • container_nic_routed_host_table
  • container_nic_ipvlan_host_table
  • container_nic_ipvlan_mode
  • resources_system
  • images_push_relay
  • network_dns_search
  • container_nic_routed_limits
  • instance_nic_bridged_vlan
  • network_state_bond_bridge
  • usedby_consistency
  • custom_block_volumes
  • clustering_failure_domains
  • resources_gpu_mdev
  • console_vga_type
  • projects_limits_disk
  • network_type_macvlan
  • network_type_sriov
  • container_syscall_intercept_bpf_devices
  • network_type_ovn
  • projects_networks
  • projects_networks_restricted_uplinks
  • custom_volume_backup
  • backup_override_name
  • storage_rsync_compression
  • network_type_physical
  • network_ovn_external_subnets
  • network_ovn_nat
  • network_ovn_external_routes_remove
  • tpm_device_type
  • storage_zfs_clone_copy_rebase
  • gpu_mdev
    api_status: stable
    api_version: “1.0”
    auth: trusted
    public: false
    auth_methods:
  • tls
    environment:
    addresses:
    • 192.168.0.126:8443
    • 10.0.3.1:8443
    • 10.112.7.1:8443
    • ‘[fd42:c2bb:440f:2b97::1]:8443’
      architectures:
    • x86_64
    • i686
      certificate: |
      -----BEGIN CERTIFICATE-----
      MIICADCCAYagAwIBAgIQDvQAvOYhJ4lS3iW/1Mga3zAKBggqhkjOPQQDAzAzMRww
      GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRMwEQYDVQQDDApyb290QGNoaWNv
      MB4XDTIwMTExMTE2NDUzMFoXDTMwMTEwOTE2NDUzMFowMzEcMBoGA1UEChMTbGlu
      dXhjb250YWluZXJzLm9yZzETMBEGA1UEAwwKcm9vdEBjaGljbzB2MBAGByqGSM49
      AgEGBSuBBAAiA2IABG2XBRUONBDaXlhGzoA7802xlZEY2z8hzx/XeRyOywxbaItb
      f8iKu3Ixvfx0TS0t/6BcaivfQOzcwumZYkX796yp5AopRQtUVuxfjlYyYCOxayud
      Qc+WPp4YqIgxVpeY9qNfMF0wDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoGCCsG
      AQUFBwMBMAwGA1UdEwEB/wQCMAAwKAYDVR0RBCEwH4IFY2hpY2+HBH8AAAGHEAAA
      AAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaAAwZQIxAJFr209OzEGrzfYCuafw
      veeWUpfx8pn+sfLBB4+tfA/b25hKctTbtEfMaaWznHlagQIwe4uMLDbxz4Ll2Cet
      s9WwjyXaISptN1ryD54IaZBMihgZQVaNvAePj5+YkTnYuvtk
      -----END CERTIFICATE-----
      certificate_fingerprint: 04090c253d2e71917cfdbfdaf9a36d2702276d4fe2c38362d5f841d2f267e626
      driver: lxc
      driver_version: 4.0.5
      firewall: xtables
      kernel: Linux
      kernel_architecture: x86_64
      kernel_features:
      netnsid_getifaddrs: “true”
      seccomp_listener: “true”
      seccomp_listener_continue: “true”
      shiftfs: “false”
      uevent_injection: “true”
      unpriv_fscaps: “true”
      kernel_version: 5.4.0-54-generic
      lxc_features:
      cgroup2: “true”
      devpts_fd: “true”
      mount_injection_file: “true”
      network_gateway_device_route: “true”
      network_ipvlan: “true”
      network_l2proxy: “true”
      network_phys_macvlan_mtu: “true”
      network_veth_router: “true”
      pidfd: “true”
      seccomp_allow_deny_syntax: “true”
      seccomp_notify: “true”
      seccomp_proxy_send_notify_fd: “true”
      os_name: Ubuntu
      os_version: “20.04”
      project: default
      server: lxd
      server_clustered: false
      server_name: chico
      server_pid: 7689
      server_version: “4.8”
      storage: btrfs
      storage_version: 4.15.1

Ok, so far so good, you can see if your container feels like starting now.

If this blows up again, then we’ll need to disable+reboot again, do the btrfsck repair again, manually mount the pool and run a full scrub this time.

It did not work. I’m trying to mount the storage pool manually with:

mount /var/snap/lxd/common/lxd/storage-pools/default /mnt
mount: /mnt: /var/snap/lxd/common/lxd/storage-pools/default is not a block device

mount /var/snap/lxd/common/lxd/disks/default.img /mnt

That will mount the pool on /mnt at which point you can run btrfs scrub start /mnt

btrfs scrub start /mnt
scrub started on /mnt, fsid 37ab66ba-6522-43ce-adcc-024792370708 (pid=7570).

That worked, should i enable lxd now?

Now, you need to monitor it with btrfs scrub status /mnt

The output:

btrfs scrub status /mnt
UUID: 37ab66ba-6522-43ce-adcc-024792370708
Scrub started: Mon Nov 30 17:45:15 2020
Status: running
Duration: 0:01:40
Time left: 0:00:32
ETA: Mon Nov 30 17:47:28 2020
Total to scrub: 92.22GiB
Bytes scrubbed: 69.49GiB
Rate: 711.58MiB/s
Error summary: csum=15787872
Corrected: 0
Uncorrectable: 15787872
Unverified: 0

Yeah, so about halfway through and so far looking very much not good…

What exactly did you run the first time you attempted to resize that pool?
Because it seems like you somehow erase a big chunk in the middle of it…

btrfs is functional enough to detect the damage but there’s really not much it can do about the data that got wiped…

yeah! i followed this topic (Snap) LXD Resize default BTRFS storage pool. I did your suggestions and then tried bluis8’s as well. So, what would be the best approach for this.

Please tell me that when following my suggestion you updated the seek= to point to the end of your existing disk?

if you didn’t, then that command would have wiped everything after the 10GB mark on your drive.

bluis8’s suggestion is better and is what is in our official documentation these days:

https://linuxcontainers.org/lxd/docs/master/storage#growing-a-loop-backed-btrfs-pool

It looks like by mistake that’s what i did.
But i really appreciate your help and time.

Thanks a lot for all this walk through.