Lxd with Linstor storage?

I was wondering, has anybody worked on adding Linstor as a storage backend for lxd?

I’ve written up a brief overview of Linstor here:


In that article I used lxd VM’s to create a demo storage cluster, but lxd itself can’t consume that cluster.

I think Linstor conceptually maps quite closely to lxd’s “LVM” storage backend, except that it supports VolumeMultiNode (and volumes can be replicated with DRBD). It would need a small amount of configuration: basically the address of the controller node, and creds to authenticate to the API (key+certificate).

In order to support Linstor in lxd, I can see two ways of approaching it:

  • add a native Linstor driver to lxd, alongside LVM, ceph etc. There is a go client available.
  • add a generic storage plugin mechanism which invokes an external script. I expect it would need a few general operations such as create volume, delete volume, attach volume, detach volume - perhaps like ganeti extstorage.

Any thoughts or comments?

Cheers,

Brian.

We’ve talked to the folks behind Linstor before as it’s certainly an interesting storage option.

LXD tries to stay away from plugin interfaces as those tend to cause very fragile setups for our users. Instead we’d much prefer a first party integration as a LXD storage driver.

We’ve recently reworked all of our storage drivers, so it should really just be a matter of someone doing the needed work to add a driver to lxd/storage/drivers/.

ceph would be a good starting point for such a driver. Our current ceph integration clocks in at 3.4k lines so it’s not too bad.

Thank you. It sounds like it would be a good Summer Of Code project for someone :slight_smile:

I did look at the documentation for LXD with Ceph, but I got rather confused. There are settings both for Ceph (presumably meaning RBD) and also for CephFS.

I understand that CephFS is a network shared filesystem, analogous to NFS.

Does the main storage volume for an LXD container use RBD - with some filesystem like ext4 inside? Then if you add a second storage volume, do you get a choice of RBD or CephFS?

Ah, it appears there are separate pool types for “ceph” and “cephfs”:

root@nuc1:~# lxc storage create foo bar
Error: Invalid value "bar" (not one of [dir lvm zfs ceph btrfs cephfs])
root@nuc1:~# lxc storage create foo cephfs cephfs.cluster_name=bar source=/baz
Error: The requested '' CEPHFS doesn't exist

(OK given that I don’t have ceph/cephfs available here)

Indeed, ceph uses rbd volumes.

I note that Linstor always puts a given volume at a fixed device node, e.g. /dev/drbd1000. This is a property of the volume-definition itself, so I believe it is persistent and the same across all nodes.

root@node1:~# linstor volume-definition list -p
+---------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size  | Gross | State |
|===============================================================|
| my_ssd_res   | 0        | 1000        | 1 GiB |       | ok    |
| res00        | 0        | 1001        | 1 GiB |       | ok    |
| res00        | 1        | 1002        | 3 GiB |       | ok    |
+---------------------------------------------------------------+

Given this, I was wondering about using a generic “host block device” driver instead of, or as well as, Linstor integration - so you’d just attach to /dev/drbd1000, say.

Thinks: is this possible already??

root@node1:~# lxc config device add foo test-linstor unix-block source=/dev/drbd1000 path=/dev/sdb
Device test-linstor added to foo
root@node1:~# lxc exec foo bash
root@foo:~# ls -l /dev/sdb
brw-rw---- 1 root root 147, 1000 Feb 13 16:43 /dev/sdb
root@foo:~# blockdev --getsize64 /dev/sdb
1077665792

Well, whaddya know :slight_smile: I was also able to create a filesystem on /dev/sdb - although in order to mount it I had to set security.privileged=true on the container.

I have not yet tested migration in an lxd cluster.

It still would be the responsibility of the user to ensure that the given block device exists on the target node before migration (by using the CLI to create a Linstor “resource” on that node), and I don’t know if migration is blocked if the remote node doesn’t have that block device. A proper Linstor plugin could take care of all that via the API.

Cheers,

Brian.

1 Like

And it appears to work! In this test, container “foo” exists on node1, and the /dev/drbd1000 resource initially exists on node1/3/4 but not node2.

root@node1:~# lxc stop foo
root@node1:~# lxc move --target node3 foo
root@node1:~# lxc start foo
root@node1:~# lxc exec foo bash
root@foo:~# mount /dev/sdb /mnt
root@foo:~# ls /mnt
hello  lost+found
root@foo:~# exit
root@node1:~# lxc stop foo
root@node1:~# lxc move --target node2 foo
root@node1:~# lxc start foo
Error: Failed preparing container for start: Failed to start device "test-linstor": The required device path doesn't exist and the major and minor settings are not specified
Try `lxc info --show-log foo` for more info
root@node1:~# linstor resource create node2 my_ssd_res --drbd-diskless
...
root@node1:~# lxc start foo
root@node1:~# lxc exec foo bash
root@foo:~# mount /dev/sdb /mnt
root@foo:~# ls /mnt
hello  lost+found
root@foo:~#

So I can move the container to node2, but I can’t start it there until I’ve enabled /dev/drbd1000.

This is actually pretty awesome even as it stands. The main thing I can’t see how to do is to get lxd to put the root filesystem on such a device (i.e. unpack the initial container or VM image into it) - I’m not even sure if this is possible. I had a go, but couldn’t get the incantation right:

root@node1:~# cat bar.yaml
devices:
  root:
    path: /
    type: unix-block
    source: /dev/drbd1001
root@node1:~# ls -l /dev/drbd1001
brw-rw---- 1 root disk 147, 1001 Feb 13 17:18 /dev/drbd1001
root@node1:~# lxc launch --target node1 ubuntu:18.04 -p diskless bar <bar.yaml
Creating bar
Error: Failed instance creation: Failed creating instance record: Failed initialising instance: Invalid devices: Failed detecting root disk device: No root device could be found
root@node1:~#

However, this is where making a proper Linstor integration starts to make sense.

1 Like

You can attach block devices as external disks for VM type instances using:

lxc config device add v1 mydisk source=/some/block/device

For containers, without needing to run privileged (as that isn’t recommended) you would need to mount the block device on the host and then you can share the directory as an external mount into the container using:

lxc config device add c1 mydisk source=/some/host/mount path=/mount/path/inside/container

For root disks, if you don’t mind having a single linstore volume for multiple instances, you could mount it on the host before LXD starts, and then create a custom dir storage pool using that volume, e.g.

lxc storage create linstore dir source=/host/path/to/linstore/volume
lxc launch images:ubuntu/focal c1 -s linstore

Alternatively you could look at doing something similar, but using a zfs or lvm LXD storage pool on the linstore block device.