How to set root device size when creating new instance?

hifly · December 14, 2022, 12:45pm

This is all extremely subtle…

However, I think you keep not answering the question which is being asked? So if a storage volume has a small “volume.size=xxMB” config set against, it, then it’s absolutely impossible to ever launch an instance on that pool using a source image larger than xxMB?

Sure, if the image is smaller than xxMB, then it will get instantiated and then inflated to the new requested size. But if the image starts out larger then xxMB, then the initial unpack of the image fails because it’s not at all possible to specify it’s initial volume size? Correct?

So if I understand your suggestions, you are saying not to set a “volume.size=xx” key on the storage pool, but instead set it via say a default profile? In this way, the initial unpack of the image isn’t constrained, but the resize of the root fs is?

Can I make a suggestion that this seems quite limiting? Perhaps lots of users aren’t starting massive instances, however, it feels like it would be helpful to have some way to affect the initial container size used for unpacking the initial image?

Thanks

tomp · December 14, 2022, 12:50pm

That is correct.

tomp · December 14, 2022, 12:52pm

Correct, it is not possible to specify the image volume’s size at instance launch time (aside from using the pool’s volume.size setting), only the instance volume size itself can be overridden using the -d flag .

Its really important to understand they are different volumes though.
Its not an “initial” size, but the final size of two different volumes.

tomp · December 14, 2022, 12:55pm

You can do that on most storage pool driver types yes.
But with the lvm driver type this isn’t going to do what you want it to do because you are looking to have as small as possible image volume, and if you remove the volume.size setting then LXD will create an image volume sized at the default 10GiB.

Normally this wouldn’t be an issue as it defaults to EXT4 filesystem which can then be shrunk when the instance volume is created. But as you’re using XFS filesystem which doesn’t support shrinking, this then means your instances have to be 10GiB or larger.

tomp · December 14, 2022, 12:58pm

Perhaps. Its not come up before to my knowledge so isn’t obviously causing many people issues.

The problem scenario is quite specific too, as only occurs when using:

LVM storage pool.
XFS filesystem.
Want varying size image volumes as close as possible to the unpacked image size (to allow both small and larger instances to be created from them).

How would you want it to work?

tomp · December 14, 2022, 1:00pm

Massive instances aren’t the issue here. Its the requirement of having small image volumes to allow creation of small instances in situations where you cannot shrink the filesystem.

If you unset volume.size, the the smallest image volume (and by extension instance due to the use of XFS) you could create would be 10GiB, but larger ones would be fine.

hifly · December 14, 2022, 1:50pm

My source images will vary from 1GB to 200GB…

So it seems like my use case is a bad fit for lxd/xfs/lvm? To literally have it work without further config I would need to set the storage pool to create default to the largest size, ie 200GB… Not ideal!

So it seems like the actual answer to this thread is:

Configure the pool volume.size to be the desired size of the instance (ensure it’s larger than the image)
Create the image with “lxc launch/init” etc.

Not ideal, but at least it’s now clear.

I hesitate to prolong this thread, but I’m confused about why there is the intermediate step? From your description my understanding of the moving part is:

image is stored in a volume (could be anywhere, eg separate VG/pool?)
image is first unpacked to an intermediary volume (why? Can this be in a separate VG/pool?)
root fs is prepared in another volume, initial unpacked image is copied from intermediary volume to final rootfs

I’m happy to believe that this process is there for a good reason. However, I’m left wondering if the intermediary step could be eliminated in some cases? Or could we find a way to flow the device size key down to be used as part of that intermediary step?

tomp · December 14, 2022, 2:04pm

The compressed images are stored by default in /var/snap/lxd/common/images, or can be set to be stored in a custom volume if there is more space there. See the global config option storage.images_volume Linux Containers - LXD - Has been moved to Canonical

But because these files are compressed, if we were to unpack them every time an instance was created from them it would take a (potentially) long time because it would require them to be decompressed and written to the instance volume.

So where possible, if the storage pool supports lightweight snapshots, we unpack the compressed image file into an “image” volume the first time it is used on that storage pool, and any instances launched from the image are created as just snapshots from that image volume.

In this way only the first instance launch from an image on a particular pool incurs the cost of unpacking the image to the storage pool.

This is so-called Optimized image storage, see Linux Containers - LXD - Has been moved to Canonical

What could be useful in your case is to have an option for LVM storage pools that disables optimized images, and just unpacks the image directly into the new instance volume every time.

We do actually have something similar for ZFS pools called zfs.clone_copy:

Whether to use ZFS lightweight clones rather than full dataset copies (Boolean), or rebase to copy based on the initial image

Setting this to false will cause the new volume to be created as a full copy of the image volume.
Although its still quicker than a full unpack because it uses ZFS send/receive to copy the already unpacked image volume.

tomp · December 14, 2022, 2:07pm

I have one other idea, although like everything it comes with its own tradeoffs.

If you use a non-thin LVM pool, i.e:

lxc storage create mylvm lvm lvm.use_thinpool=false

Then because non-thin LVM storage pools don’t support lightweight snapshots (LVM requires us to allocate the full possible size that a snapshot can grow to at create time) LXD doesn’t create image volumes on non-thin LVM pools.

This then means that each instance created will have the image unpacked into its volume directly.

But non-thin LVM pools are less flexible than thin LVM pools. Primarily around space reservation and snapshots.

hifly · December 14, 2022, 2:13pm

I wouldn’t refuse such a feature… (in my case I will unpack images infrequently)

However, I feel that a simple option to pass through to the intermediary image pool creation would be more general purpose? My first thought would be to flow any “-d root,size=xxGB” option down to the intermdiary image size as well? I can see corner cases with this of course, but it seems like most storage backends have somewhat optimised storage and in any case it’s a temp instance so can be pruned afterwards if space is an issue?

The main corner case would be a user unpacking a tiny image with a $massive rootfs size specified. My proposal would lead to the unpacked optimised image partition nominally being $massive in size. However, I presume for lvm-thin/btrfs/zfs this won’t consume real disk space in any case?

What do you think? Is this is a simple change?

tomp · December 14, 2022, 3:28pm

I think using the instance’s root disk size for creating the image volume (only if it doesn’t already exist) would help in some scenarios and cause problems in others. This is because, especially in the case of using XFS that cannot be shrunk, the minimum size of an instance would then be governed by the root disk size used on the first create of the instance volume. Which makes it rather unpredictable.

We also have to take into account multi-user environments and LXD projects, and we would want something happening inside one project to impact what another project can do. At least with the current situation the image volume is created based on the global pool volume.size setting.

However having an option that disables the use of an image volume temporarily at instance creation time, and just directly unpacks the image into the instance volume maybe a good way to generically solve this in a predictable way, and interestingly there is also an issue open for just this feature (but within the context of ZFS):

github.com/lxc/lxd

Create a command-line option for zfs.clone_copy

opened 11:50AM - 29 Oct 20 UTC

gf-mse

# Required information * Distribution: (X)Ubuntu * Distribution version: 1…8.04 Desktop * The output of "lxc info" or if that fails: * Kernel version: 4.15.0-117-generic * LXC version: 3.0.3 * LXD version: 3.0.3 * Storage backend in use: ZFS # Feature request Would be nice to have a command-line option, something like ``` lxc copy --full a/b c ``` To do the same thing as below, but atomically: ``` lxc storage set ... zfs.clone_copy false lxc copy cccp udssr lxc storage unset ... zfs.clone_copy ``` The rationale for this is that a full send is rarely needed and is more often used as an exception -- e.g. when one wants to untangle complex inter-container clone/origin dependencies. Therefore, setting `zfs.clone_copy=false` may cause unwanted effects on all copy operations, when it is actually needed may be once or twice. # Related #2124, #8096 # Apologies Please excuse me if I'm missing some updates that have already happened )

hifly · December 14, 2022, 4:31pm

I’m not disagreeing, but just adding to that chain of reasoning:

I think this middle volume is only used as an optimisation, so in the case that it’s too large, it can be immediately deleted?
In most supported options, if I understand correctly, even creating a multi-TB volume, to unpack a few KBs of image in, will only use a small amount of space, so there shouldn’t be a material increase in storage actually consumed, even if the user sets a foolishly large size=xx option
As near as I can see, the intermediate volume is always created in the same pool as the final rootfs? So if the user/multi-user/project has the ability to create rootfs partitions large enough to be disruptive, then arguably they can already do that by just taking a small image and making a large rootfs instance?
The size change can cut both ways, eg right now, if there is no default volume.size set, and I ask to launch an image with rootfs sized to 2GB, then the intermediate image will be created at 10Gb and the final rootfs at 2GB… So passing across the rootfs size to the intermediate image would be better in that case

Overall, I see and agree with all your corner cases, but short of introducing an actual option to set the size of the intermediate volume, I see only mild downsides in having the intermediate default to the same size as the final rootfs.

My thought would be that the combination of the two would be most useful:

Default to making the intermediate image the same size as the final rootfs (this seems to cover the majority of use cases?)
Offer an option to skip creating the middle image - the use of this would now flip around, ie you would use it when you wanted to create a $massive rootfs and didn’t want the intermediate image to be $massive

I think passing through the rootfs size to the intermediate is the most flexible solution, eg consider:

image is 5GB
pool has volume.size=2.5GB
I want the final rootfs to be 10GB

If I want to optimise this, then I can do:

lxc launch images:large-5GB-image v1 -s lvm -d root,size=5GB
lxc config device override v1 root size=10GB

This would flow the rootfs size down to the intermediate and create both intermediate and rootfs with 5GB size. THen we are free to resize the rootfs up to the final size.

However, if you don’t care about the intermediate size, then just pass “-d root,size=10GB” in the first call (and my understanding is that the disk space used is the same in both cases anyway on btfs/zfs/lvm-thin as only the 5GB touched will be allocated on disk?)

My 2p…

tomp · December 14, 2022, 4:32pm

No absolutely not, it is the basis for all volumes that are created from it.

tomp · December 14, 2022, 4:33pm

Nope, in LVM non-thin mode (that needs to be supported), the full size is reserved.
We also need to take into account VM block volumes that also cannot be shrunk.

tomp · December 14, 2022, 4:34pm

Nope, see Linux Containers - LXD - Has been moved to Canonical limits.disk.

tomp · December 14, 2022, 4:35pm

I think we agree your use case is not supported well currently.

tomp · December 14, 2022, 4:36pm

I disagree, it introduces unpredictability in the storage layer, and would need to take in the various behaviors in all storage drivers and volume types. Changing something like that is no small task (as needs extensive testing) nor is it without risks that need to be balanced.

The project reason alone is reason enough not to do it in my view.

tomp · December 14, 2022, 4:42pm

I just want to check we are on the same page.
You mention “middle image” a few times.

There is a single compressed image file on disk (not in the storage pool).
Then the first time an image is used on a storage pool an image volume is created and the compressed image is decompressed onto it.
For LVM it has to have a fixed size. This is currently either 10GiB or volume.size.

The Instance volumes are created by taking a snapshot of the image volume.
Thus the image volume is shared between all of the instances that are derived from it (across all projects). This reduces instance creation time and duplication of storage.

Once the image volume has been created it cannot be changed.

Therefore deriving its size from the first instance’s root disk is not desirable because it influences (in the case of XFS and VMs which cannot be shrunk) the minimum size any further instances can be that use it.

hifly · December 14, 2022, 5:49pm

OK, so that wasn’t the understanding I had from your previous replies. Thank you for clarifying

I think from what you say then that this initial image is actually very important to size correctly? At present there is almost no control at all over it’s size (only by changing volume.size on the pool itself?)

However, you also illustrate that my instance creation process to just start any old image, delete the whole initial filesystem and then copy in my desired root filesystem is therefore wasting a bunch of space (if I can’t delete the original volume used to bootstrap the instance?)

This is starting to feel all staggeringly complicated just to throw up a couple of containers? Bear in mind I’m trying to migrate from linux-vservers. It feels like there are a ton of constraints on all the features that lxd offers? I’m running servers with a single operator, no need to to prevent rogue use across projects and I just need to find a simple way to spin up 10+ containers, each substantially similar and roughly each running a few simple processes (but want the security of vservers/unpriv containers). Is there a shortcut to getting this all working??

Because the underlying it a rolling distribution I don’t see a lot of value in keeping around the original rootfs image? Within a few months all the sub instances will be 50%+ different to the original image. They will by 90% similar to each other, but vastly different to the original starting point. So I would prefer not to waste disk space keeping around a base image?

(I’m separately struggling with networking. At least with linux-vservers I could throw up a simple iptables script!)

Is there a simple path to success here? I see tons of forest ahead of me and struggling to see the path…

tomp · December 14, 2022, 6:02pm

Its your specific requirements of lvm thin and xfs that are complicating matters. If you can use a different filesystem or non thin lvm pool (or even something like zfs) then things get easier as they can be shrunk.

Ive explained why things are done the way they are done.
Ive proposed future work that would provide more flexibility.

And ive proposed various workarounds and alternatives you can use today.

Beyond that I’m not sure what else I can say I’m afraid.
There will be a learning curve as you get familiar with lxd concepts. As it is likely more opinionated then vservers was.