LXD cluster on Raspberry Pi 4

Introduction

Would you like a very compact, silent, yet surprisingly powerful home lab capable of running containers or virtual machines, accessible from any system on your network?

That’s what’s possible these days using very cheap ARM boards like the Raspberry Pi.
Those boards have become more and more powerful over the years and are now even capable of running full virtual machines. Combined with LXD’s ability to cluster systems together, it’s now easier than ever to setup a lab which can be easily grown in the future.

Ubuntu has now released a dedicated LXD appliance targeting both the Raspberry Pi 4 and traditional Intel systems.

Hardware

In this setup, I’ll be using 3 of the newest Raspberry Pi 4 in their 8GB configuration.
The same will work fine on the 2GB or 4GB models, but if you intend to run virtual machines, try to stick to 4GB or 8GB models.

All 3 boards are connected to the same network and you’ll need an HDMI capable display and a USB keyboard for the initial setup. Once that’s done, everything can be done remotely over SSH and the LXD API.

It’s certainly possible to get this going on a single board or on far more than 3 boards, but 3 is the minimum number for the LXD database to be highly available.

Also worthy of note. In my setup, I didn’t have fast USB 3.1 external drives to use with this cluster, so I’m just using a loop file on the microSD card.
This is rather slow and small storage. If you have access to fast external storage, plug that in and specify it as an “existing empty disk or partition” below.

Installation

First, you’ll need to download the LXD appliance image from https://ubuntu.com/appliance/lxd/raspberry-pi and follow the instructions to write it to a microSD card and load it on your Raspberry Pi boards.

Once booted, each of the boards will ask you for your Ubuntu account, it will then create your user and import your SSH key. At the end, you’re presented with the IP address of the board and you can SSH into them.

Configuration

The appliance image is setup for standalone use but we want to cluster them together so we actually need to undo a bit of the configuration that was automatically applied.

SSH into each of your boards and run:

sudo lxc profile device remove default root
sudo lxc profile device remove default eth0
sudo lxc storage delete local
sudo lxc config unset core.https_address

Then on the first board, run sudo lxd init and go through the steps as shown below:

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=localhost]: rpi01
What IP address or DNS name should be used to reach this node? [default=10.166.11.235]: 
Are you joining an existing cluster? (yes/no) [default=no]: 
Setup password authentication on the cluster? (yes/no) [default=yes]: 
Trust password for new clients: 
Again: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: 
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]:
Create a new BTRFS pool? (yes/no) [default=yes]:
Would you like to use an existing empty disk or partition? (yes/no) [default=no]: 
Size in GB of the new loop device (1GB minimum) [default=5GB]: 20GB
Do you want to configure a new remote storage pool? (yes/no) [default=no]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: eth0
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

That’s it, you have a cluster (of one system) with networking and storage configured. Now, let’s join the other two boards by running sudo lxd init on them too:

stgraber@localhost:~$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=localhost]: rpi02
What IP address or DNS name should be used to reach this node? [default=10.166.11.92]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.166.11.235
Cluster fingerprint: b9d2523a4935474c4a52f16ceb8a44e80907143e219a3248fbb9f5ac5d53d926
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "source" property for storage pool "local": 
Choose "size" property for storage pool "local": 20GB
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
stgraber@localhost:~$ 

And validate that everything looks good by running sudo lxc cluster list on any of them:

stgraber@localhost:~$ sudo lxc cluster list
+-------+----------------------------+----------+--------+-------------------+--------------+
| NAME  |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE |
+-------+----------------------------+----------+--------+-------------------+--------------+
| rpi01 | https://10.166.11.235:8443 | YES      | ONLINE | fully operational | aarch64      |
+-------+----------------------------+----------+--------+-------------------+--------------+
| rpi02 | https://10.166.11.92:8443  | YES      | ONLINE | fully operational | aarch64      |
+-------+----------------------------+----------+--------+-------------------+--------------+
| rpi03 | https://10.166.11.200:8443 | YES      | ONLINE | fully operational | aarch64      |
+-------+----------------------------+----------+--------+-------------------+--------------+
stgraber@localhost:~$ 

And you’re done, chances are you won’t be needing to SSH to those boards ever again, everything else can now be done remotely through the LXD command line client or API from any Linux, Windows or macOS systems.

Operation

Now on any system that you want to use this newly setup cluster, run (updating the IP to match that of any of your boards):

stgraber@castiana:~$ lxc remote add my-cluster 10.166.11.235
Certificate fingerprint: b9d2523a4935474c4a52f16ceb8a44e80907143e219a3248fbb9f5ac5d53d926
ok (y/n)? y
Admin password for my-cluster: 
Client certificate stored at server:  my-cluster
stgraber@castiana:~$ lxc remote switch my-cluster
stgraber@castiana:~$ lxc cluster list
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| NAME  |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE | FAILURE DOMAIN |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| rpi01 | https://10.166.11.235:8443 | YES      | ONLINE | fully operational | aarch64      |                |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| rpi02 | https://10.166.11.92:8443  | YES      | ONLINE | fully operational | aarch64      |                |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| rpi03 | https://10.166.11.200:8443 | YES      | ONLINE | fully operational | aarch64      |                |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+

Whenever you want to interact with your local system rather than the remote cluster, just run:

lxc remote switch local

Because no 64-bit Arm VM images are currently setup for secureboot, let’s disable it altogether with:

lxc profile set default security.secureboot false

And now let’s start some containers and virtual-machines:

lxc launch images:alpine/edge c1
lxc launch images:archlinux c2
lxc launch images:ubuntu/18.04 c3
lxc launch images:ubuntu/20.04/cloud v1 --vm
lxc launch images:fedora/32/cloud v2 --vm
lxc launch images:debian/11/cloud v3 --vm

The initial launch operations will be rather slow, especially if using the main microSD card for storage. However creating more of those instances should then be quite quick thanks to the image being already ready.

stgraber@castiana:~$ lxc list
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| NAME |  STATE  |          IPV4          |                       IPV6                        |      TYPE       | SNAPSHOTS | LOCATION |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| c1   | RUNNING | 10.166.11.113 (eth0)   | fd42:4c81:5770:1eaf:216:3eff:fed7:35c0 (eth0)     | CONTAINER       | 0         | rpi01    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| c2   | RUNNING | 10.166.11.88 (eth0)    | fd42:4c81:5770:1eaf:216:3eff:feaa:81ac (eth0)     | CONTAINER       | 0         | rpi02    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| c3   | RUNNING | 10.166.11.146 (eth0)   | fd42:4c81:5770:1eaf:216:3eff:fe79:75a5 (eth0)     | CONTAINER       | 0         | rpi03    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| v1   | RUNNING | 10.166.11.200 (enp5s0) | fd42:4c81:5770:1eaf: 216:3eff:fe61:1ad6 (enp5s0)  | VIRTUAL-MACHINE | 0         | rpi01    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| v2   | RUNNING | 10.166.11.238 (enp5s0) | fd42:4c81:5770:1eaf:216:3eff:fe8d:e6ae (enp5s0)   | VIRTUAL-MACHINE | 0         | rpi02    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+
| v3   | RUNNING | 10.166.11.33 (enp5s0)  | fd42:4c81:5770:1eaf:216:3eff:fe86:526a (enp5s0)   | VIRTUAL-MACHINE | 0         | rpi03    |
+------+---------+------------------------+---------------------------------------------------+-----------------+-----------+----------+

From there on, everything should feel pretty normal. You can use lxc exec to directly run commands inside those instances, lxc console to access the text or VGA console, …

Conclusion

LXD makes for a very easy and flexible solution to setup a lab, be it at home with a very Rasperry Pi boards, in the cloud using public cloud instances or on any spare hardware you may have around.

Adding additional servers to a cluster is fast and simple and LXD clusters even support mixed architectures, allowing you to mix Raspberry Pi 4 and Intel NUCs into a single cluster capable of running both Intel and Arm workloads.

Everything behaves in most the same way as running on your laptop or desktop computer but it can now be accessed by multiple users from any system with network access.

From this you could grow to a much larger cluster setup, using projects to handle multiple distinct uses of the cluster, attaching remote storage, using virtual networking, integrating with MAAS or with Canonical RBAC, … There are a lot of options which can be progressively added to a setup like this as you feel the need for it.

4 Likes

This is a great concept. The only area in which I was concerned was that of the ubuntu cloud account. Is that needed or are there ways to do this without the ubuntu cloud account tie in?

I wanted to do something like inside the network but the idea of the cloud connection seems counter to the “inside” concept.

Yeah, that’s a limitation of Ubuntu Core I believe.

You can do the exact same as above using a traditional Ubuntu Server image instead:

This will not require any account and should come with the LXD snap preinstalled but unconfigured (so you won’t need the initial remove/delete/unset commands in this post), the rest will behave the same.

I was looking at the concept – I had been playing around with different methods of clustering 4 RaspiPi 4 4gb together for “a home lab” and a friend pointed me here. GREAT write up.

I will probably try both methods - but if I got with the unbuntu side, I will likely drop it on the IoT lan so its not on the internal lan (likely a good place to play anyway)!

Update: Downloaded the UbuntuServer Image you pointed me to - I will update you as to how it went :slight_smile:

Was looking into doing this with proxmox instead. No external account required. They have a write-up on how as well. Like having multiple options though. Great write-up.

So I had a few issues with “Error: Failed to join cluster: Failed request to add member: The joining server version doesn’t (expected 4.0.2 with API count 189)”
But I didnt find an awful lot on that specifically. So, I reloaded. And with 1 of the 2 nodes, a reload seemed to work on the first shot, the second took two shots.

The only analamoly is that when I run lxc cluster list, I see that three of the 4 nodes have the database set to YES
ubuntu@ubuntu:~$ sudo lxc cluster list
±------±---------------------------±---------±-------±------------------±-------------+
| NAME | URL | DATABASE | STATE | MESSAGE | ARCHITECTURE |
±------±---------------------------±---------±-------±------------------±-------------+
| rpi01 | https://x.y.z.20:8443 | YES | ONLINE | fully operational | aarch64 |
±------±---------------------------±---------±-------±------------------±-------------+
| rpi02 | https://x.y.z.21:8443 | NO | ONLINE | fully operational | aarch64 |
±------±---------------------------±---------±-------±------------------±-------------+
| rpi03 | https://x.y.z.22:8443 | YES | ONLINE | fully operational | aarch64 |
±------±---------------------------±---------±-------±------------------±-------------+
| rpi04 | https://x.y.z.23:8443 | YES | ONLINE | fully operational | aarch64 |
±------±---------------------------±---------±-------±------------------±-------------+

BTW - this was with the UBUNTU server

A LXD cluster has either one database server (non-HA mode) or has three database servers providing with HA for the database.

So your output is quite normal for a 4 nodes cluster.

The error you got about mismatch is what I would expect if either the existing or the joining servers were running on a different version. For clustering to work, all servers must be on the same version of LXD which includes the bugfix release too.

One way to make sure of that would be to run snap refresh lxd prior to attempting a join.

Thanks! So being a TOTAL n00b (if not already abundantly clear), what is the fastest learning path??

I started down the clustering aiming to learn if there was a faster way to build a password cracking rig in Raspberry Pi world :slight_smile: Now I have a working clusterm, time to expand into that arena.

Well, now you can run anything you want on there, in containers or virtual-machines.
For something like password cracking, the most efficient would probably be a single container per cluster member and you’d then want to split the dictionary or brute force space so that they each get a quarter of the total but that’s really up to you and what you want to run on it :slight_smile: