Lost all my container configuration due to snap reinstall

Hello everyone,

This evening I woke up to begin work on my website, only to find that LXD would fail to start for the following reason.

internal error, please report: running "lxd.lxc" failed: cannot find installed snap "lxd" at revision 11437: missing file /snap/lxd/11437/meta/snap.yaml

Unbeknownst to me, snapd did not save container configuration on re-install. Is the way to recover from this? The container data itself is fine or do I have to recreate 20 containers by hand and risk losing container configuration again?

Distro: openSuSE Tumbleweed

lxc info

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICCDCCAY+gAwIBAgIQeggTq6Bdddyrs5fCiQYYjzAKBggqhkjOPQQDAzA4MRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRgwFgYDVQQDDA9yb290QGxpbnV4
    LWlxdTkwHhcNMTkwODA5MTkyNDExWhcNMjkwODA2MTkyNDExWjA4MRwwGgYDVQQK
    ExNsaW51eGNvbnRhaW5lcnMub3JnMRgwFgYDVQQDDA9yb290QGxpbnV4LWlxdTkw
    djAQBgcqhkjOPQIBBgUrgQQAIgNiAASNO6sziVrUwRLSG2ByEB4uQNUFI6qlfZGt
    Lkqs1j41qtFt5y6YHBmtKJgYW+IsycoIeN78SgfvWaH11V9pJTBiZCHJgO6geQl5
    zfWpz6fWtsQq6ZaEWa9ln+UYpgNyCtijXjBcMA4GA1UdDwEB/wQEAwIFoDATBgNV
    HSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMCcGA1UdEQQgMB6CCmxpbnV4
    LWlxdTmHBAoAAS+HBKwQAgSHBAoAAzwwCgYIKoZIzj0EAwMDZwAwZAIwftret5Nr
    /7D0jGPFvBtmkVxOM7iB3sYgwQpiUb60PwZrrxAU8AD03CLzc7qzThL7AjAhCctX
    3fBvqS+A3CaLwVpTXOug/ZsRcPV9XoAfm1q7LDJeE7UgfOUxI00n4YPwdP0=
    -----END CERTIFICATE-----
  certificate_fingerprint: 4db3d927daff9c512945b7753c00ef989c0eec37396b4b54d1dcff059fdf923e
  driver: lxc
  driver_version: 3.2.1
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.2.2-1-default
  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    seccomp_notify: "true"
  project: default
  server: lxd
  server_clustered: false
  server_name: linux-iqu9
  server_pid: 21798
  server_version: "3.15"
  storage: ""
  storage_version: ""

EDIT: I recovered from earlier snapshot following this guide here https://snapcraft.io/docs/snapshots.

If all your configuration in LXD is lost but the storage device is intact, you can reimport the containers. See, for example, https://blog.simos.info/reconnecting-your-lxd-installation-to-the-zfs-storage-pool/

The error you mentioned seems to indicate that the snap wasn’t properly mounted under /snap, that’s a problem that can be tracked down and resolved.

The data at /var/snap/lxd is unlikely to have similarly disappeared at that stage, so once the snap is made accessible again, then everything is accessible again.

One thing to keep in mind to NEVER do is snap remove lxd as that will wipe /var/snap/lxd and may cause data loss.

As you mentioned, recent snapd attempts to make a backup on removal which may still save you in such cases, though because of how huge the LXD data can be, that backup has been known to failing in many occasions.

I bite a bullet this time, snap restore 8 lxd, restored 1.5GB of lost data. So I recovered nicely and back to work, thankfully. As for as the error, I opened a post here https://forum.snapcraft.io/t/internal-error-please-report-running-lxd-lxc-failed-lost-data/12720/5 and the issue is track back to a bug in Linux kernel 5.2. AFAIK, there are couple of fixes, one for snapd (in beta if I’m interpretting this correctly here https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1756793) and 5.2-stable.

I actually saw this blog post on my forum searchs after I posted. It’s bookmarked here, thanks.

Oh, yeah, the 5.2 and 5.3 kernels have been giving us a LOT of problems…
Changes to the mount API caused regressions all over the place with sometimes grave consequences.