We use 4 machines as a LxD clusters providing LxD vm as Jenkins build slaves.
When I use a Jenkins job to delete and re-create a LxD vm, there is an error occurred when doing “lxc config device override oem-iot-focal-1 root size=40GB”:
21:43:40 + lxc config device override oem-iot-focal-1 root size=40GB 21:43:41 Error: Failed to update device “root”: Failed to run: /snap/lxd/19389/bin/sgdisk --move-second-header /var/snap/lxd/common/lxd/storage-pools/local/virtual-machines/oem-iot-focal-1/root.img: Unable to save backup partition table! Perhaps the ‘e’ option on the experts’
Meanwhile it works well when I shell in the machine to do
I use snap lxd v 4.11 on Ubuntu 20.04.2 LTS .
Storage driver: btrfs.
I found the error occurs always on specific node of the cluster.
After I do “lxd cluster remove the_node”, it seems not happen again.
Still need to do more observation.
I use images:ubuntu/bionic/cloud, and images:ubuntu/focal/cloud images, and I use cloud-init to do provision to customize our build environment.
This is the steps that we do:
So when a LxD vm occupy 40GB, does that mean there is only 60GB left to be assigned to another LxD vm? I remember in virtualbox that space is only the most can be used, and the real space to be used decides run-time. Is that not the case here?
@tomp May I know more detail on how you use “sparse raw block file” to solve the issue?
This also raises a question to my mind: If there are 4 machines add to a single LxD cluster, does the total storage space limit on the smallest disk storage among 4? Is that also the case when we talk about RAM?
For non clustered storage pools (everything except ceph and cephfs) the storage pool is created on each node.
When creating a BTRFS storage pool you have two options; either using an existing partition using the source option or LXD will create a loop-back file of the specified size on your main filesystem.
In the latter case it is expected that each node have the space available to accommodate the storage pool’s loopback file.
Please can you show output of lxc storage show local so I can see how your BTRFS pool is set up.
Did you try creating a VM with a smaller disk as I asked, did it work?
@tomp
To your question, yes, when I use smaller disk it works.
Our image build requirements for some projects, however, needs more disk space than default 10G.
Where is the btrfs storage located for snap version of LxD? And how can I extend it? Do I need to extend it on every node in the cluster respectively?
When initiate a lxd vm, lxd cluster would find itself one of the node in the cluster to do it. According to my observation, it just takes turns on node1, node2, and node3. Is there any rule for it to balance the load? Or we need to do that manually according to resource on every node? Let’s say we have 3 different RAM/Storage machines in the same LxD cluster.
For question 2. LXD will distribute instances across the cluster at create time or you can specify a particular node at create time using the --target=<cluster member> flag. It uses some fairly basic metrics based on number of instances on each node.
OK so now we see the problem. The size property on two of the nodes is just 30GB and not 100GB.
This means on 2 of the nodes the BTRFs pool size is only 30GB. This was probably generated automatically for you on storage pool creation based on the free space on the host’s disk if you didn’t specify it.
You can also see that in your setup you have chosen to create a BTRFS storage pool on a LXD generated loopback file (/var/snap/lxd/common/lxd/disks/local.img) so this is emulating a physical disk and is then formatted with BTRFS.
The mounted path for the loopback file is /var/snap/lxd/common/lxd/storage-pools/local however when using the LXD snap package, the mount is hidden inside the snap package’s mount namespace (so looking in that directory directly on the host will show its empty).
Instead you can access it using:
sudo nsenter --mount=/run/snapd/ns/lxd.mnt
E.g.
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- mount | grep /var/snap/lxd/common/lxd/storage-pools | grep btrfs
/var/snap/lxd/common/lxd/disks/local.img on /var/snap/lxd/common/lxd/storage-pools/local type btrfs (rw,relatime,space_cache,user_subvol_rm_allowed,subvolid=5,subvol=/)
You are not able to resize a storage pool from LXD. If there is nothing you need to keep on the storage pool then I would suggest deleting it and creating a new one of the size you want.
E.g. Assuming you already have a BTRFS directory at /some/btrfs/ then you can create a storage pool using it with:
# Specify member specific configs (e.g.`source` could be different for each member):
lxc storage pool create local btrfs source=/some/btrfs --target=<cluster member 1>
lxc storage pool create local btrfs source=/some/btrfs --target=<cluster member 2>
lxc storage pool create local btrfs source=/some/btrfs --target=<cluster member 3>
# Create the storage pool on all members.
lxc storage pool create local btrfs
Alternatively can also use a dedicated empty partition/block device for your BTRFS pool by specifying the block device path with the source property when you create it, and then LXD will create a BTRFS filesystem on the block device for you.
For lxd cluster, is it not good for all nodes in the cluster sharing the same btrfs/zfs pool? So that we can manage the total number without noticing how many spaces left on specific nodes.
Is there a way to use all disk spaces on 3, let’s say 3 nodes in a cluster, as a joint pool? Or we can only share one pool in one specific node in the cluster?
ZFS is not a networked filesystem (as far as I know), so it is not possible for LXD running on other machines to access it and treat the combined storage as a single pool.
A LXD instance can only exist on one cluster member at once, so it will use the disk space on the cluster member only, which is why we don’t treat it as consumed on the other cluster members. It really is just a way of referring to the same sort of storage pool by name on each cluster member.
If you want to have a truly shared pool across the cluster members then we support using ceph storage as a networked filesystem.