Is there a way to specify what profile a restored backup will use at restore time? I ask because I’m running into an issue where the default profile has sane defaults for 90% of my containers, but for this one container it is too small. I’m seemingly faced with the option of modifying the default profile, doing the restore, undoing the modification when all I really want to do is apply the profile that already exists for just this type of container. Is there no way to do this presently?
Update: So, I bit the bullet and edited the default profile and set the space to 100GB. No difference. It still fails complaining about being out of space. The problem is, as far as I can tell, there’s no way to tell what is out of space. I’m presuming the Ceph rbd that’s being created isn’t large enough, but if 100GB isn’t, I don’t know what to do since the original container was a fraction of that and I didn’t include the snapshots. I would bump the size to something absurd, but since the only way to do that appears to be editing the default profile I’d rather not needlessly grow a bunch of containers just to have it fail again
Update 2: I finally resorted to exporting/importing an image from the stand alone system into the cluster. That finally worked for the image portion. However, trying to launch the container, using a profile that specifies a 100GB root volume, still results in “No space left on device”. I’m kinda out of ideas here as I can’t find anything on the host that’s running out of space and the container and all of it’s snapshots are less than 50GB, let alone 100GB just for the container
config: {}
description: Profile for 100GB Storage, standard networking
devices:
eth0:
name: eth0
nictype: bridged
parent: br0
type: nic
root:
path: /
pool: ceph_pool
size: 100GB
type: disk
name: 100GBStorage
Those are the default and 100GBStorage profiles. To launch the image with the 100GBStorage profile, but still resulted in the storage from the default profile: lxc launch -p default -p 100GBStorage system-export target
When that failed, I then did: lxc launch -p 100GBStorage system-export target
However, that too resulted in ignoring the size definition in the 100GBStorage and using the default 10GB setting. The only error given is that it runs out of space.
Here you go. I did filter some repeating messages about “updated metadata for task”, as that made up the bulk of the output and never said anything except updating.
The exact error message from the launch operation was:
tar: rootfs/etc/ssl/certs/Actalis_Authentication_Root_CA.pem: Cannot create symlink to '/usr/share/ca-certificates/mozilla/Actalis_Authentication_Root_CA.crt': No space left on device
tar: Exiting with failure status due to previous errors.
Prioor to that line, there were thousands of lines saying much the same, until the end. While it was in progress, I did verify that the ceph volume, /dev/rbd4 in this case, being created was only 10GB, which is the root default. Previous testing shows that if I grow that volume fast enough, prior to the container being able to do anything with it but after it was created, to 100GB, things proceed normally. Additionally, other containers launched via image do not seem to have this issue.
Can you clarify ‘Additionally, other containers launched via image do not seem to have this issue.’ I thought the issue was you couldn’t create containers from this large image. I’m still struggling to understand the specific issue here. Perhaps if you can show what works and what doesn’t work that will help me understand. In the logs above the creation is failing trying to unpack the image files into an image volume ready to create a snapshot from for the container. This would explain why your profile disk config isn’t take effect as they don’t apply to images.
I suspect you need to set the ‘volume.size’ setting temporarily on the storage pool config to allow the image volume created to accommodate the image unpack.
So from what I understand the container that they are trying to restore into has a 10GB size (the 100GB is not being set), but the backup is much larger than the 10GB.
OK makes sense. This is because lxd first creates a so called ‘image volume’ from the image being used, and from there creates the container volume as a cheap snapshot of the image volume.
Because ceph is a block oriented driver, all volumes must have a fixed size and cannot be unlimited. As such the initial image volume is created as volume.size or 10gb if not set.
I am also wondering why you created an image publish export rather than an instance backup export (‘lxc export …’) which would have then allowed you to import the backup file without creating an intermediate image volume.
Makes sense as the core fault, yeah. Is there no way to determine this default a bit more dynamically or at least allow the profile specification to override the default, to prevent this scenario?
As to why I didn’t do the backup export, I did the backup export/import first. That also failed in the same way. I was trying this method out of desperation to get something to work
OK let’s go back to your original problem, and please in all cases provide full command and error examples, along with the debug log output you did earlier.
Well not exactly the same, its failing unpacking the tarball into containers volume and not the image volume, so now I have a scenario I can look to reproduce and confirm a bug.
Interestingly there’s also a mention of zfs in those logs, are either of your pools zfs?
The source system uses ZFS, so yeah. The target system also has a local pool that’s ZFS. What path in the system would it be using for that unpacking process and on which system would it be doing so? The last part is important as it’s a cluster and not all of them as much space outside of Ceph as the one I’m actually running the import command on.
Let me test that, though I’m reasonably sure it will fix it