Same ceph storage with multiple LXD hosts

cryptofuture · November 6, 2017, 2:22pm

I’m little bit confused now. For example I have 4 different LXD hosts located in different geographical places , and I could connect same ceph storage to all LXD hosts. However as I understand every LXD installation use separate database, and this means I have no access to containers from different LXD host, through all containers stored on same storage.

More above let’s assume I was able to add database entry about container located on ceph storage used by another LXD hosts. I read about it in some storage transfer ticket, so I think its not hard to add container that already exist on storage to LXD database. And now same container could be launched from different LXD hosts, but does that mean I could broke container this way?
Basically what I"m interested in is possibility to run mirror containers across LXD hosts, but without taking extra ceph storage space.

stgraber · November 6, 2017, 4:37pm

Correct, what you’re describing above isn’t currently supported.

LXD keeps track of a lot of things in its database and since those database records can’t be kept in sync with multiple hosts yet, there’s no way to have a container be visible on multiple hosts, even if the container storage comes from the same place.

As you say, you could force LXD to see the same container on two hosts by using “lxd import”, but changes in configuration on one host would not be replicated to the other and there’d also be nothing preventing you from starting that container in two places at once, leading to even more problems.

Our upcoming solution for this is LXD database clustering where you’ll be able to have multiple hosts appear as one big LXD server. In this setup, the database will be shared with multiple hosts and so you will then be able to safely have them all use the same ceph pool.

cryptofuture · November 6, 2017, 5:12pm

Thats will be very cool! But that leads to another question.
Does containers clones (containers that based on another one and take less space, snapshots zfs/btrfs) possible under ceph? Basically if its possible, I don’t think that messing with running same container even worth awhile.
And with database clustering is running same containers on multiple hosts will be locked by LXD, or actually container will represented by multiple container snaphots?
Also, is btrfs worth inside rdb storage volumes?

stgraber · November 6, 2017, 5:25pm

Unless you have the database clustering in place, LXD doesn’t know whether the target host is talking to the same ceph cluster, nor is there a way to make a quick clone across two rbd storage pools. So while we do use optimized ceph tooling for migration today, it doesn’t save you nearly as much space as it could if the pool was truly shared.

With the LXD clustering in place, your “lxc list” will show you containers running on all your hosts. The containers will still be tied to a particular host but can be moved within the cluster as you want. When those containers are entirely based on ceph for their storage, such a container move would be instant as no data would actually need to be moved, it’d effectively just be a DB field update to point to the new host.

As for using btrfs on top of an rbd volume, it really depends on whether you need any of the btrfs features inside the container. If you don’t plan on using btrfs subvolumes inside the container, then btrfs doesn’t really get you anything in this case and going with ext4 is probably a safer, more reliable choice.

cryptofuture · November 6, 2017, 5:47pm

Now I dream about it. Definitely will try to test it as second production, when clustering will be ready. Thx for answers, you awesome as always!

cryptofuture · January 14, 2018, 12:08am

So I see ceph.osd.force_reuse now. Does it mean lxd clustering is also in the place? And same ceph storage with multiple LXD hosts is ready to test.
Also seems like lxd-stable ppa is no more, and it was relatively close to master, but stable enough (and I’m still using it in the production)
I also see a lot issues from people who switched to snap. Where I should switch now?
Read https://blog.simos.info/how-to-migrate-lxd-from-deb-ppa-package-to-snap-package/ but answer from the front would be better.

stgraber · January 15, 2018, 10:31am

No, LXD clustering isn’t here yet. You shouldn’t use the same OSD on multiple LXD nodes as the different daemons won’t know what’s on the other machines and so you risk getting clashes.

KlavsKlavsen · February 4, 2018, 12:32pm

Lxd is there now. Just saw pres. @fossdem. Does it handle ceph osd sharing? Or is it coming soon? Live migration without copying would be very nice to get HA for guests in an lxd cluster.

stgraber · February 7, 2018, 2:02am

LXD clustering as @freeekanayaka mentioned during the talk at FOSDEM isn’t actually there yet. We have a working branch and it should be merged in the next week or so with it considered stable with the release of LXD 3.0 about a month from now (hopefully).

At this point, we don’t have shared OSD support in the clustering branch yet. It’s on our roadmap for after the initial clustering work is merged.

Note that there will be one limitation though that we won’t be able to do away with. When sharing an OSD with multiple hosts, all containers using a particular volume on that OSD will have to be on the same machine. So there won’t be any sharing of a custom storage volume between multiple hosts. That’s a kernel/filesystem limitation, allowing a given volume to be mounted on more than one host would just lead to corruption and data loss.