Launching a large container image with CEPH storage

atrius · April 8, 2020, 2:42am

So, I’ve gotten my Ceph backed storage and such all setup. Went to launch my first containers into it, trying to move away from the previous storage/systems. However, I try and launch one with a 13GB image and it is telling me that I’m out of space. Given that Ceph has 2TB of space it clearly isn’t running out of space. So, where and what would be? The system that is doing the launching isn’t running out either.

If I try and do it again, I get an error from Cech complaining that it failed to set image snapshot, no such file or directory. I cannot try again without deleting the image, deleting it in Ceph and then recopying it over from the source server. Thoughts?

stgraber · April 8, 2020, 3:02am

So assuming that you’re talking about an image which is about 13GB compressed on disk, you’ll need volume.size on the pool to be larger than the expected uncompressed size so that newly created volumes are large enough to contain it.

lxc storage set default default volume.size 50GB should fix this.

stgraber · April 8, 2020, 3:02am

It’s still odd that it would fail in a way that doesn’t let you try again.

What LXD version is this?

atrius · April 8, 2020, 3:04am

A very strange failure mode, yes. It’s LXD 4.0.0 with Ceph Nautilus on Ubuntu 18.04

atrius · April 8, 2020, 3:06am

Oh, in the process of trying to track down what was going on, I found that on my cluster members far more space was being used in /var/snap/lxd/common/lxd/images than I would have expected. It doesn’t appear to be the source of this issue, however I didn’t really expect them to be using anywhere near what they are. It appears that the entire image is being copied there prior to it then being, presumably, sent back up to Ceph

atrius · April 8, 2020, 3:21am

Okay, the larger volume size did fix the issue. However, having to set such a large default seems rather wasteful, no?

stgraber · April 8, 2020, 3:47am

Yeah, for containers we’re working on changes to the migration logic so we can tell in advance how much space we’re expected to need on the receiving side.

For images, it’s trickier. All our images fit in the default volume size for Ceph and LVM (10GB), but your own images can be significantly larger and we can’t tell how much space they’ll actually need.

It’s pretty easy to make an image that only uses 2KB on disk but will require 20GB once written onto an ext4 filesystem on a RBD volume, so we have effectively no idea what’s going to happen when we tell tar to unpack the image.

As the image is only unpacked on first use, you can work around things a bit by setting volume.size to a suitable size, create a container from it, then reduce volume.size back to a lower value or plain unset it (to get back to the default).
That will then apply to all other volumes but your one image will still be in Ceph and new containers from it will get clones of it as usual.

atrius · April 8, 2020, 3:54am

Maybe I’ve missed it, but is there an option at launch time to just set it then? I suppose you could do it via profiles but I could see that being a little fiddly, depending on the situation.

An option something like lxc launch --volsize=foo image target or such. I imagine there is some option buried in the --config section as well, but something more explicitly called out might be nice, no?

stgraber · April 8, 2020, 3:57am

If you want a different root device size, then a profile is pretty much the way to go with the default CLI client.

Note that this will not help you in this case though because the container size is only taken into consideration when the image volume is cloned into the new container volume.

For that you first need an image volume, so you still need the volume.size trick for that one to be large enough.

atrius · April 8, 2020, 3:58am

Ah, fair enough. That makes sense

atrius · April 8, 2020, 3:59am

Along these lines, is there any better method to get containers from 3.0.3 into a 4.0.0 cluster aside from taking them down, turning them into an image and then copying the image?

atrius · April 8, 2020, 5:46am

Just had the same error happen again. However, this time it said this instead of no space:
Error: Create instance from image: Failed to run: mkfs.ext4 /dev/rbd0 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0: mke2fs 1.42.13 (17-May-2015) mkfs.ext4: Device size reported to be zero. Invalid partition specified, or partition table wasn't reread after running fdisk, due to a modified partition being busy and in use. You may need to reboot to re-read your partition table.

Unfortunately, this was a 30GB container and took forever to copy. I tried to do it again, and got the librbd: error opening parent image: (2) No such file or directory mentioned before

atrius · April 8, 2020, 6:01pm

Tried to launch again with a different image, this one built on the target system, got the same “mkfs” error as before

atrius · April 8, 2020, 7:08pm

I was thinking about it earlier actually. Is there some reason that one couldn’t store an expected volume size in the image based on what size was in use at creation time?

stgraber · April 8, 2020, 9:12pm

You’re assuming that various filesystems store the data the same way

btrfs with compression enabled could have the data take a fraction of the size as ext4 and that’s before you start messing with block sizes.

stgraber · April 8, 2020, 9:13pm

As for the original question on moving between 3.0 and 4.0, other than just upgrading a 3.0 system to 4.0 which would be the easiest, you could also use cross server migration to move the container either directly to the 4.0 setup or to a temporary machine and then onto the final 4.0 target.

atrius · April 8, 2020, 9:42pm

That’s what I’m doing at the moment. lxc move is doing the job from 3.0.3 to 4.0.0. The earlier issue appears to have been the result of something I did haha

As to the other point, yeah, there is that. Perhaps just an informational message about an estimate or something? Mainly to prevent someone from trying to launch a 30GB container into 10GB of space or something

I could see that being a possibly confusing situation though and why one might not want to do it haha