Lxd.migrate Failed to update the storage pools

Hi there,

I recently installed the snap version of LXD after running the packaged version included with Ubuntu 17.10 (just before this I also upgraded to 18.04). I ran lxd.migrate and got the following error:

=> Connecting to source server
=> Connecting to destination server
=> Running sanity checks

=== Source server
LXD version: 3.0.0
LXD PID: 1824
  Containers: 6
  Images: 0
  Networks: 0
  Storage pools: 2

=== Destination server
LXD version: 3.0.0
LXD PID: 4517
  Containers: 0
  Images: 0
  Networks: 0
  Storage pools: 0

The migration process will shut down all your containers then move your data to the destination LXD.
Once the data is moved, the destination LXD will start and apply any needed updates.
And finally your containers will be brought back to their previous state, completing the migration.

Are you ready to proceed (yes/no) [default=no]? yes
=> Shutting down the source LXD
=> Stopping the source LXD units
=> Stopping the destination LXD unit
=> Unmounting source LXD paths
=> Unmounting destination LXD paths
=> Wiping destination LXD clean
=> Moving the data
=> Moving the database
=> Backing up the database
=> Opening the database
=> Updating the storage backends
error: Failed to update the storage pools: no such table: storage_pools

The old installation is still there, but it’s empty. The new installation lists my containers but won’t start any, with a Error: no such file or directory error. My storage pools are btrfs looped images, and I see they’ve been moved to /var/snap/lxd/common/lxd/disks. I thought I might just be able to set the source of the pools to the new path, but I get Error: The [source] properties cannot be changed for "btrfs" storage pools

I also don’t know what other migration steps were left after configuring the storage pools.

I’ve still got what I presume to be the original database at /var/lib/lxd/lxd.db, but at /var/snap/lxd/common/lxd there is only lxd.db.bak. There is no lxd.db. There is a local.db datbase in the /database directory, but it only has a few tables.

At this point I think I need to make sure the database is in the correct path and possibly manually update the database entries that contain the source paths for the storage pools. I’m going to hold off on trying to make changes for now as I’m very unfamiliar.

Oh crap, we really need to fix that ASAP.

As you’ve noticed, the database doesn’t work the same way with LXD 3.0, that’s not a problem when doing LXD 2.x to LXD 3.x upgrades, but moving from LXD 3.0 deb to LXD 3.0 snap is a bit of a problem…

Right now the safest would be to revert back to the deb, remove the snap and wait until we sort this out.
To do so, this should work:

  • mv /var/snap/lxd/common/lxd /var/lib/lxd
  • systemctl start lxd
  • /usr/bin/lxc list

If you see all your containers and everything looks right, then you may remove the LXD snap with snap remove lxd. DO NOT remove the LXD snap until you’ve confirmed that your data is accessible at the deb’s location as removing the snap will wipe everything under /var/snap/lxd

So I ran ‘mv /var/snap/lxd/common/lxd /var/lib/lxd’ and I’m noticing that it looks like that made an lxd subdirectory at /var/lib/lxd. That’s probably not intended, and maybe I should’ve run ‘mv /var/snap/lxd/common/lxd /var/lib’ instead? Right now, I can start the deb LXD but it’s still looking empty.

Oh yeah, I was expecting /var/lib/lxd to be gone by that point.
Okay, so you should be able to do:

  • systemctl stop lxd lxd.socket
  • mv /var/lib/lxd /var/lib/lxd.bak
  • mv /var/lib/lxd.bak/lxd /var/lib/lxd
  • systemctl start lxd
  • lxc list

So, I did what you suggested, then started LXD again with ‘systemctl start lxd’ and everything still looked empty. Remember that after the failed migration, I did not have an lxd.db in the snap location. The contents of /var/lib/lxd/lxd.db now looked empty, so I went ahead and moved that to lxd.db.bak2, then copied lxd.db.bak to lxd.db, then started LXD again. Now my container list looks correct, but when I try to launch a container I get Error: saving config file for the container failed

Ok, what does lxc info show you?
lxc profile show default may also be useful

Output of lxc info:

    config: {}
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
api_status: stable
api_version: "1.0"
auth: trusted
public: false
- tls
  addresses: []
  - x86_64
  - i686
  certificate: |
    -----END CERTIFICATE-----
  certificate_fingerprint: 7cf21feeb47acfbe9c7cfd9ff83c93e417a804b7144fa923d5560eefae78e9ff
  driver: lxc
  driver_version: 3.0.0
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.15.0-20-generic
  server: lxd
  server_pid: 3200
  server_version: 3.0.0
  storage: btrfs
  storage_version: 4.15.1
  server_clustered: false
  server_name: frontier

Output of lxc profile show default (redacted a bit since it included my SSH keys):

  user.user-data: |
      - name: joey
        sudo: ['ALL=(ALL) NOPASSWD:ALL']
        shell: /bin/bash
description: Default profile
    nictype: bridged
    parent: br0
    type: nic
    path: /
    pool: default
    type: disk
name: default
- /1.0/containers/apps
- /1.0/containers/apps/base
- /1.0/containers/bots
- /1.0/containers/bots/base
- /1.0/containers/bots/prod0
- /1.0/containers/guac
- /1.0/containers/guac/base
- /1.0/containers/nginx
- /1.0/containers/nginx/base
- /1.0/containers/plex
- /1.0/containers/plex/base
- /1.0/containers/steam

Does /var/log/lxd exist on your system?
If it doesn’t, I wonder if it may be as simple as mv /var/lib/lxd/logs /var/log/lxd to fix this issue?

Apologies, but I won’t be able to touch this again until after work in about 8 hours. Hopefully you’re still available around then! Unfortunately, my remote access was served via a container and I haven’t set up an alternative method yet.

Out of curiosity, was it correct to move lxd.db like I did? I’m not aware of exactly how the database setup has changed in LXD 3.0, though I have tried reading through the docs.

Not quite, what you want to do actually is:

  • move database/local.db to /var/lib/lxd/lxd.db
  • move database/global to /var/lib/lxd/raft
  • remove the now empty database directory

Sorry, I forgot about this part, despite both snap and deb having the same version, the snap has a number of cherry-picked improvements from upstream, moving those database files being one of them…

I’m currently in Europe for work meetings so I’ll be sleeping by the time you get back home but hopefully you have enough information above to move things back.

The long version I think is:

  • Move /var/snap/lxd/common/lxd to become /var/lib/lxd (which you’ve kinda done, just needs fixing per above)
  • Move /var/lib/lxd/database/global to /var/lib/lxd/raft
  • Move /var/lib/lxd/database/local.db to /var/lib/lxd/lxd.db
  • Remove empty /var/lib/lxd/database directory
  • systemctl start lxd
  • Check that it works with /usr/bin/lxc list
  • If it’s all back, remove the snap

Until the snap is removed, you’ll need to type /usr/bin/lxc to access the client tool, otherwise you’ll use the snap version which will attempt to talk to the wrong LXD server.

I just want to clarify…after the failed migration, at /var/snap/lxd/common/lxd I did not have an lxd.db, only a lxd.db.bak. Additionally, /var/snap/lxd/common/lxd/database/local.db didn’t appear to have most of the data from my old database - checking tables with sqlite3 showed only a few tables (I’m assuming this is because of the failed migration step noted originally).

So I do think that the only correct version of my previous database is the lxd.db.bak that was created, which is why I moved that one back to /var/lib/lxd/lxd.db. Checking tables in that file show everything you’d expect. Let me know your thoughts on this.

Do you think, after everything, that the contents of /var/lib/lxd/database/global will be correct at this point for moving back to /var/lib/lxd/raft? If so, I will move it when I get home, remove the empty database directory, then see how it goes. If I get the same error, I’ll also check for the presence of /var/log/lxd.

That’s normal, your data would be in /var/snap/lxd/common/lxd/database/global, local.db should be pretty much empty on a non-clustered LXD deployment.

Is there a way to verify that the data in /global is correct in that case? I looked through that folder and didn’t see any databases besides logs.db. I will keep a backup copy of my current lxd.db that I restored, of course, I’m just worried that if I move local.db to lxd.db and /global doesn’t have the right data I’ll hit another roadblock.

logs.db is where the data would be, but it’s not in a particularly easy to read format, it’s a raft transaction log that’s replayed on startup.

Thanks for the info - in that case I will stop worrying so much about lxd.db and just make sure I’ve got my backup, then perform the steps you suggested. Do you still think /var/log/lxd might be an issue, or should I start with the other steps first?

I was able to run home and try a few things out. Here’s what I did:

  • moved database/local.db to /var/lib/lxd/lxd.db
  • I moved files in database/global one-by-one. I will note that besides the /global directory and local.db, the database directory also contained db.bin and logs.db. I initially accidentally moved /database/db.bin to /var/lib/lxd/raft/db.bin, realized my mistake since it wasn’t from the global directory, then moved the correct file from database/global/db.bin instead. Unfortunately I neglected to back it up first, so I no longer have the original db.bin that was located at /var/lib/lxd/database.
  • After finishing moving the rest of the files in /var/lib/lxd/database/global to /var/lib/lxd/raft, I moved /var/lib/lxd/database to /var/lib/lxd/database.bak.
  • When I tried to start lxd with ‘systemctl start lxd’ after this, it hung and never started. I rebooted the host, and LXD seems to be running now. I can still list my containers and configs and everything looks correct, but I’m receiving the same Error: saving config file for the container failed error when trying to start containers.

For what it’s worth, I do indeed have /var/log/lxd, as well as /var/lib/lxd/logs. Let me know if want output from either of those. I’ve got a PC started up at home now so I can try to work on this remotely.

At this point, it might be easiest to just purge and uninstall both LXD instances completely, then start from scratch with the snap package. As long as I can extract the data I need from my btrfs images (which I would also need help to do), it wouldn’t take me too long to reconfigure LXD and the containers. Is that something you can help me out with?

Re-installing and re-importing things by hand is possible but kinda painful as it’s effectively a disaster recovery procedure.

It looks like your database is behaving fine if you can list your containers, so it’s something else that’s causing issues here.

Pick one of your containers, then get the following (replace NAME):

  • lxc config show --expanded NAME
  • ls -lh /var/lib/lxd/containers
  • ls -lh /var/lib/lxd/containers/NAME
  • ls -lh /var/lib/lxd/storage-pools/default/containers
  • ls -lh /var/lib/lxd/storage-pools/default/containers/NAME
  • ls -lh /var/log/lxd/