How to recover a LXD node using btrbk from desaster

lozsui · September 15, 2022, 3:59pm

Introduction

In the following an example is shared on how I think one can insure a LXD infrastructure against a desaster. Since I am not an expert of either LXD nor btrbk yet comments on what can be done better are very welcome. In particular i’d be interested to know how recovering of configurations (“lxc config”, “lxc profile” and “lxc network”) could be done other than by typing single commands.

Setup

The setup consists of two hosts one is called debian the other appenzeller. On host appenzeller backups of snapshots from host debian are stored. LXD runs on host debian:

root@debian:~# cat /etc/os-release | grep -i pretty
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"

LXD is installed by snap:

root@debian:~# snap list 
Name    Version      Rev    Tracking       Publisher   Notes
core20  20220826     1623   latest/stable  canonical✓  base
lxd     5.5-37534be  23537  latest/stable  canonical✓  -
snapd   2.57.1       16778  latest/stable  canonical✓  snapd

btrfs is used as filesystem and the LXD installation resides on its own subvolume:

root@debian:~# btrfs subvolume list /
ID 256 gen 15182 top level 5 path @rootfs
ID 260 gen 15180 top level 5 path @lxd
ID 538 gen 14292 top level 260 path @lxd/common/lxd/storage-pools/lxdstorage1
ID 541 gen 15177 top level 538 path @lxd/common/lxd/storage-pools/lxdstorage1/containers/hww

lxd was initialized as shown in the listing below.

root@debian:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: lxdstorage1 
Name of the storage backend to use (btrfs, dir, lvm, ceph) [default=btrfs]: 
Would you like to create a new btrfs subvolume under /var/snap/lxd/common/lxd? (yes/no) [default=yes]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

There is one container called hww (Hello-World-Webserver):

root@debian:~# lxc list
+------+---------+---------------------+-----------+-----------+-----------+
| NAME |  STATE  |        IPV4         |  IPV6     |   TYPE    | SNAPSHOTS |
+------+---------+---------------------+-----------+-----------+-----------+
| hww  | RUNNING | 10.64.187.39 (eth0) | fd42:Blah | CONTAINER | 0         |
+------+---------+---------------------+-----------+-----------+-----------+

As a backup tool btrbk is used. The configuration of btrbk is:

root@debian:~# cat /root/btrbk.1.config
timestamp_format        long
preserve_day_of_week    thursday
preserve_hour_of_day    16

snapshot_preserve_min   latest

# Keep                                                                                               
# - hourly: 24 hours                                                                                   
# - daily: 30 days
# - weekly: 4 weeks                                                                                      
# - monthly: 12 month                                                                                   
target_preserve       24h 30d 4w 12m

ssh_identity            /root/appenzeller

volume /var/snap/lxd/
  subvolume common/lxd/storage-pools/lxdstorage1
    target ssh://appenzeller/data/btrbk
  subvolume common/lxd/storage-pools/lxdstorage1/containers/hww
    target ssh://appenzeller/data/btrbk

Comment to the btrbk configuration: Snapshots are stored as backups on a remote host. Appenzeller is a Swiss alcoholic drink which doesn’t matter here. For further documentation on btrbk see btrbk - README. I find it a quite cool piece of software! btrbk can be run like listed below:

root@debian:~# btrbk -c btrbk.1.config run

This creates snapshots of lxdstorage1 and hww on host debian:

root@debian:~# btrbk ls / | grep 20220909T0923
/dev/sda1     268  readonly  /var/snap/lxd/hww.20220909T0923
/dev/sda1     267  readonly  /var/snap/lxd/lxdstorage1.20220909T0923

And backups of the snapshots on host appenzeller:

root@appenzeller:~# btrfs subvolume list / | grep 20220909T0923
ID 266 gen 5282 top level 256 path data/btrbk/hww.20220909T0923
ID 268 gen 5279 top level 256 path data/btrbk/lxdstorage1.20220909T0923

Creating a Desaster

To simulate a desaster lxd is removed from host debian. First LXD is shut down:

root@debian:~# lxd shutdown

LXD is removed to the best of my knowledge in two steps:

root@debian:~# snap remove lxd --purge
error: cannot perform the following tasks: ...snip...
- Remove data for snap "lxd" (23537) (failed to remove snap\
 "lxd" base directory: remove /var/snap/lxd: device or resource busy)
root@debian:~# umount /var/snap/lxd
root@debian:~# snap remove lxd --purge
lxd removed

Now, LXD is removed from host debian:

root@debian:~# snap list
Name    Version   Rev    Tracking       Publisher   Notes
core20  20220805  1611   latest/stable  canonical✓  base
snapd   2.56.2    16292  latest/stable  canonical✓  snapd

What’s left on /var/snap/lxd are only the two snapshots hww.20220909T0923 and lxdstorage1.20220909T0923 to keep this showcase simple. In a real desaster they would have to be sent from host appenzeller to host debian first (see btrbk - README):

root@debian:~# ls /var/snap/lxd/
hww.20220909T0923  lxdstorage1.20220909T0923

Now, the desaster is created. Next steps show how to recover from snapshots hww.20220909T0923 and lxdstorage1.20220909T0923.

Recover from Disaster

Well, first LXD is again installed on host debian:

root@debian:~# snap install lxd
lxd 5.5-37534be from Canonical✓ installed
root@debian:~# ls /var/snap/lxd/
23537  common  current	hww.20220909T0923  lxdstorage1.20220909T0923

Next, subvolume lxdstorage1 is restored:

root@debian:~# mkdir /var/snap/lxd/common/lxd/storage-pools
root@debian:~# btrfs send /var/snap/lxd/lxdstorage1.20220909T0923\
 | btrfs receive\
 /var/snap/lxd/common/lxd/storage-pools
At subvol /var/snap/lxd/lxdstorage1.20220909T0923
At subvol lxdstorage1.20220909T0923
root@debian:/var/snap/lxd/common/lxd/storage-pools# btrfs\
 subvolume snapshot lxdstorage1.20220909T0923 lxdstorage1
Create a snapshot of 'lxdstorage1.20220909T0923' in './lxdstorage1'

And the same has to be done for hww:

root@debian:~# btrfs send /var/snap/lxd/hww.20220909T0923 \
 | btrfs receive\
 /var/snap/lxd/common/lxd/storage-pools/lxdstorage1/containers
At subvol /var/snap/lxd/hww.20220909T0923
At subvol hww.20220909T0923
root@debian:/var/snap/lxd/common/lxd/storage-pools/lxdstorage1/containers# btrfs\
 subvolume snapshot hww.20220909T0923 hww

Then, delete the snapshots in the storage directory:

root@debian:/var/snap/lxd/common/lxd/storage-pools/lxdstorage1/containers# btrfs\
 subvolume delete hww.20220909T0923
Delete subvolume (no-commit):\
 '/var/snap/lxd/common/lxd/storage-pools/lxdstorage1/containers/hww.20220909T0923'

root@debian:/var/snap/lxd/common/lxd/storage-pools# btrfs\
 subvolume delete lxdstorage1.20220909T0923
Delete subvolume (no-commit):\
 '/var/snap/lxd/common/lxd/storage-pools/lxdstorage1.20220909T0923'

Next, “lxd recover” does its good work:

root@debian:~# lxd recover
This LXD server currently has the following storage pools:
Would you like to recover another storage pool? (yes/no) [default=no]: yes
Name of the storage pool: lxdstorage1
Name of the storage backend (btrfs, ceph, cephfs, cephobject, dir, lvm): btrfs
Source of the storage pool (block device, volume group, dataset, path, ... as applicable): /var/snap/lxd/common/lxd/storage-pools/lxdstorage1
Additional storage pool configuration property (KEY=VALUE, empty when done): 
Would you like to recover another storage pool? (yes/no) [default=no]: 
The recovery process will be scanning the following storage pools:
 - NEW: "lxdstorage1" (backend="btrfs", source="/var/snap/lxd/common/lxd/storage-pools/lxdstorage1")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: 
Scanning for unknown volumes...
The following unknown volumes have been found:
 - Container "hww" on pool "lxdstorage1" in project "default" (includes 0 snapshots)
You are currently missing the following:
 - Network "lxdbr0" in project "default"

This tells us that lxbr0 has to be created:

root@debian:~# lxc network create lxdbr0
If this is your first time running LXD on this machine, you should also run: lxd init
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

Network lxdbr0 created
root@debian:~# lxc network info lxdbr0
Name: lxdbr0
MAC address: 00:16:3e:b4:62:3c
MTU: 1500
State: up
Type: broadcast

IP addresses:
  inet	10.64.187.1/24 (global)
  inet6	fd42:8dd4:9192:f461::1/64 (global)

Network usage:
  Bytes received: 0B
  Bytes sent: 0B
  Packets received: 0
  Packets sent: 0

Bridge:
  ID: 8000.00163eb4623c
  STP: false
  Forward delay: 1500
  Default VLAN ID: 1
  VLAN filtering: true
  Upper devices:

After, that one can go on with the lxd recover process:

Please create those missing entries and then hit ENTER: 
The following unknown volumes have been found:
 - Container "hww" on pool "lxdstorage1" in project "default" (includes 0 snapshots)
Would you like those to be recovered? (yes/no) [default=no]: yes
Starting recovery...

One also has to configure the default profile:

root@debian:~# lxc profile device add default root disk path=/ pool=lxdstorage1
root@debian:~# lxc profile device add default eth0 nic name=eth0 network=lxdbr0
root@debian:~# lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: lxdstorage1
    type: disk
name: default
used_by:
- /1.0/instances/hww

Container hww Up and Running again
Now I could start hww again and everything seemed to be back to normal:

root@debian:~# lxc start hww 
root@debian:~# lxc list
+------+---------+---------------------+-------------------+-----------+-----------+
| NAME |  STATE  |        IPV4         |     IPV6          |   TYPE    | SNAPSHOTS |
+------+---------+---------------------+-------------------+-----------+-----------+
| hww  | RUNNING | 10.64.187.39 (eth0) | ...snip... (eth0) | CONTAINER | 0         |
+------+---------+---------------------+-------------------+-----------+-----------+

root@debian:~# wget -q 10.64.187.39 -O - | grep Hello
<title>Hello World Webserver</title>
<h1>Hello World Webserver</h1>

johnrm · September 15, 2022, 10:20pm

You haven’t really simulated a disaster, you just removed the LXD application and left the data folder intact. In a disaster you’ll have nothing left and will need to reinstall LXD on a new machine.

In my opinion, the only secure way of backing up your guests is to export each one and store it on another server or offsite storage.

Then import them into a newly installed LXD host in the case of a failure.

You don’t need any third party software, LXD can export a running guest without interruption.

lozsui · September 16, 2022, 8:44am

Thank you for the feedback, johnrm.

You haven’t really simulated a disaster, you just removed the LXD application and left the data folder intact. In a disaster you’ll have nothing left and will need to reinstall LXD on a new machine.

After purging LXD there are only two snapshots left. One is storage and one is the hww container:

root@debian:~# ls /var/snap/lxd/
hww.20220909T0923  lxdstorage1.20220909T0923

What do you mean by ‘[you] left the data folder intact’? What folder are you referring to?

In my opinion, the only secure way of backing up your guests is to export each one and store it on another server or offsite storage.

In the example this other server is appenzeller.

Then import them into a newly installed LXD host in the case of a failure.

I was to lazy there in my example. I just used the two snapshots on host debian. But, it would perfectly be possible to send the snapshots backed up on host appenzeller back to host debian.

You don’t need any third party software, LXD can export a running guest without interruption.

Nice. Can you provide a link the shows that approach in more detail?

johnrm · September 16, 2022, 11:07am

I think, to simulate a disaster, you should reinstall the OS and start again, so that all your previous files are wiped. When you have a catastrophic drive failure you won’t have any files left on the server, you will need to start again.

Exporting a container is quite straight forward;

Assuming ct1 is the name of the container and you send the dumped file to /dumps/

lxc export ct1 /dumps/ct1-bak1.tar.gz

So, I think, it’s best to export the full container to a local folder, then rsync or scp it to a backup server.

Then to restore a dump on a new host you just import the dump file.

lxc import ct1-bak1.tar.gz

Here’s a simple bash script I use on my hosts to export all containers, I separate running from stopped so I know which ones need to be restored and started.

#!/bin/bash

hostname=$(hostname)
timestamp=$(date +"%Y-%m-%d-%H%M%S")
foldername=$(date +"%Y-%m-%d")

# make the dump folder
if [ ! -d "/dumps/$foldername" ]
then
    /bin/mkdir /dumps/$foldername
fi

for name in $(/snap/bin/lxc list -c n status=running --format=csv) ;
do
    echo backing up $name
    /snap/bin/lxc export $name /dumps/$foldername/$hostname-$name-$timestamp.tar.gz
done

for name in $(/snap/bin/lxc list -c n status=stopped --format=csv) ;
do
    echo backing up $name
    /snap/bin/lxc export $name /dumps/$foldername/$hostname-$name-stopped-$timestamp.tar.gz
done

# cleanup old dumps older than ~5 days
/usr/bin/find /dumps/* -mmin +7500 -exec /bin/rm -rf {} \;