Specify profile when restoring backup

atrius · February 12, 2021, 8:47pm

Is there a way to specify what profile a restored backup will use at restore time? I ask because I’m running into an issue where the default profile has sane defaults for 90% of my containers, but for this one container it is too small. I’m seemingly faced with the option of modifying the default profile, doing the restore, undoing the modification when all I really want to do is apply the profile that already exists for just this type of container. Is there no way to do this presently?

Update: So, I bit the bullet and edited the default profile and set the space to 100GB. No difference. It still fails complaining about being out of space. The problem is, as far as I can tell, there’s no way to tell what is out of space. I’m presuming the Ceph rbd that’s being created isn’t large enough, but if 100GB isn’t, I don’t know what to do since the original container was a fraction of that and I didn’t include the snapshots. I would bump the size to something absurd, but since the only way to do that appears to be editing the default profile I’d rather not needlessly grow a bunch of containers just to have it fail again

Update 2: I finally resorted to exporting/importing an image from the stand alone system into the cluster. That finally worked for the image portion. However, trying to launch the container, using a profile that specifies a 100GB root volume, still results in “No space left on device”. I’m kinda out of ideas here as I can’t find anything on the host that’s running out of space and the container and all of it’s snapshots are less than 50GB, let alone 100GB just for the container

tomp · February 15, 2021, 9:16am

Please can you provide the contents of the default profile, as well as the exact commands you are running and the errors you are getting.

atrius · February 15, 2021, 4:15pm

config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: ceph_pool
    type: disk
name: default

config: {}
description: Profile for 100GB Storage, standard networking
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: ceph_pool
    size: 100GB
    type: disk
name: 100GBStorage

Those are the default and 100GBStorage profiles. To launch the image with the 100GBStorage profile, but still resulted in the storage from the default profile:
lxc launch -p default -p 100GBStorage system-export target
When that failed, I then did:
lxc launch -p 100GBStorage system-export target

However, that too resulted in ignoring the size definition in the 100GBStorage and using the default 10GB setting. The only error given is that it runs out of space.

tomp · February 15, 2021, 4:35pm

Please can you enable LXD debug mode and re-run and capture the logs:

sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log

In a separate window:

lxc launch -p default -p 100GBStorage system-export target

Then paste the output from the first window.

Also please provide the actual error message from the failing command, and confirm the LXD version.

atrius · February 15, 2021, 6:16pm

Here you go. I did filter some repeating messages about “updated metadata for task”, as that made up the bulk of the output and never said anything except updating.

Log output:

t=2021-02-15T13:00:01-0500 lvl=dbug msg="Started task operation: cfa2edcc-db69-4b14-a3ca-9df4b949d5f4" 
t=2021-02-15T13:00:01-0500 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url="/1.0/operations/cfa2edcc-db69-4b14-a3ca-9df4b949d5f4?target=core.internal.lan" username=root
t=2021-02-15T13:00:01-0500 lvl=info msg="Creating container" ephemeral=false instance=target instanceType=container project=default
t=2021-02-15T13:00:01-0500 lvl=dbug msg="FillInstanceConfig started" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:00:01-0500 lvl=dbug msg="FillInstanceConfig finished" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:00:02-0500 lvl=info msg="Created container" ephemeral=false instance=target instanceType=container project=default
t=2021-02-15T13:00:02-0500 lvl=dbug msg="CreateInstanceFromImage started" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:00:02-0500 lvl=dbug msg="EnsureImage started" driver=ceph fingerprint=4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78 pool=ceph_pool
t=2021-02-15T13:00:02-0500 lvl=dbug msg="Database error: &errors.errorString{s:\"No such object\"}" 
t=2021-02-15T13:00:02-0500 lvl=dbug msg="Activated RBD volume" dev=/dev/rbd4 driver=ceph pool=ceph_pool vol=image_4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78_ext4
t=2021-02-15T13:00:03-0500 lvl=dbug msg="Mounted RBD volume" dev=/dev/rbd4 driver=ceph options=discard path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/images/4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78 pool=ceph_pool
t=2021-02-15T13:00:03-0500 lvl=dbug msg="Running filler function" dev= driver=ceph path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/images/4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78 pool=ceph_pool
t=2021-02-15T13:04:44-0500 lvl=dbug msg="Unpacking failed" 
t=2021-02-15T13:04:50-0500 lvl=dbug msg="Unmounted RBD volume" driver=ceph keepBlockDev=false path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/images/4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78 pool=ceph_pool
t=2021-02-15T13:04:51-0500 lvl=dbug msg="Deactivated RBD volume" driver=ceph pool=ceph_pool vol=image_4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78_ext4
t=2021-02-15T13:04:53-0500 lvl=dbug msg="EnsureImage finished" driver=ceph fingerprint=4195626dac24ffaa1cf9b4c2b7f29e786ca0cfe7b348578235415e9a13379c78 pool=ceph_pool
t=2021-02-15T13:04:53-0500 lvl=dbug msg="CreateInstanceFromImage finished" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:04:53-0500 lvl=info msg="Deleting container" created=2021-02-15T13:00:01-0500 ephemeral=false instance=target instanceType=container project=default used=1969-12-31T19:00:00-0500
t=2021-02-15T13:04:54-0500 lvl=dbug msg="DeleteInstance started" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:04:54-0500 lvl=dbug msg="Deleting instance volume" driver=ceph instance=target pool=ceph_pool project=default volName=target
t=2021-02-15T13:04:54-0500 lvl=dbug msg="DeleteInstance finished" driver=ceph instance=target pool=ceph_pool project=default
t=2021-02-15T13:04:54-0500 lvl=info msg="Deleted container" created=2021-02-15T13:00:01-0500 ephemeral=false instance=target instanceType=container project=default used=1969-12-31T19:00:00-0500
t=2021-02-15T13:05:02-0500 lvl=dbug msg="Event listener finished: e90d130b-df14-43c5-8cc5-c7fe7e696c6b" 
t=2021-02-15T13:05:02-0500 lvl=dbug msg="Disconnected event listener: e90d130b-df14-43c5-8cc5-c7fe7e696c6b"

The exact error message from the launch operation was:

tar: rootfs/etc/ssl/certs/Actalis_Authentication_Root_CA.pem: Cannot create symlink to '/usr/share/ca-certificates/mozilla/Actalis_Authentication_Root_CA.crt': No space left on device
tar: Exiting with failure status due to previous errors.

Prioor to that line, there were thousands of lines saying much the same, until the end. While it was in progress, I did verify that the ceph volume, /dev/rbd4 in this case, being created was only 10GB, which is the root default. Previous testing shows that if I grow that volume fast enough, prior to the container being able to do anything with it but after it was created, to 100GB, things proceed normally. Additionally, other containers launched via image do not seem to have this issue.

LXC version is: 4.11

tomp · February 15, 2021, 6:28pm

Can you clarify ‘Additionally, other containers launched via image do not seem to have this issue.’ I thought the issue was you couldn’t create containers from this large image. I’m still struggling to understand the specific issue here. Perhaps if you can show what works and what doesn’t work that will help me understand. In the logs above the creation is failing trying to unpack the image files into an image volume ready to create a snapshot from for the container. This would explain why your profile disk config isn’t take effect as they don’t apply to images.

tomp · February 15, 2021, 6:33pm

I suspect you need to set the ‘volume.size’ setting temporarily on the storage pool config to allow the image volume created to accommodate the image unpack.

Can you try that and see if it helps.

See https://linuxcontainers.org/lxd/docs/master/storage

Jimbo · February 15, 2021, 6:36pm

So from what I understand the container that they are trying to restore into has a 10GB size (the 100GB is not being set), but the backup is much larger than the 10GB.

tomp · February 15, 2021, 6:40pm

OK makes sense. This is because lxd first creates a so called ‘image volume’ from the image being used, and from there creates the container volume as a cheap snapshot of the image volume.

Because ceph is a block oriented driver, all volumes must have a fixed size and cannot be unlimited. As such the initial image volume is created as volume.size or 10gb if not set.

tomp · February 15, 2021, 6:42pm

I am also wondering why you created an image publish export rather than an instance backup export (‘lxc export …’) which would have then allowed you to import the backup file without creating an intermediate image volume.

atrius · February 15, 2021, 6:44pm

Makes sense as the core fault, yeah. Is there no way to determine this default a bit more dynamically or at least allow the profile specification to override the default, to prevent this scenario?

As to why I didn’t do the backup export, I did the backup export/import first. That also failed in the same way. I was trying this method out of desperation to get something to work

tomp · February 15, 2021, 6:45pm

OK let’s go back to your original problem, and please in all cases provide full command and error examples, along with the debug log output you did earlier.

atrius · February 15, 2021, 6:49pm

Are you referring to the original failure in importing the container? I’ll have to re-export it right quick. One moment

tomp · February 15, 2021, 6:50pm

Yes without using the image publish export.

atrius · February 15, 2021, 6:51pm

Okay, export in progress using --instance-only and --compression none

atrius · February 15, 2021, 8:25pm

Same result

Log output:

t=2021-02-15T15:08:35-0500 lvl=dbug msg="Reading backup file info" 
t=2021-02-15T15:08:35-0500 lvl=dbug msg="Backup file info loaded" backend=zfs name=target optimized=false pool=ceph_pool project=default snapshots=[] type=container
t=2021-02-15T15:08:35-0500 lvl=dbug msg="New task Operation: 017ff9d6-6fd0-4f25-af7c-1898b01b91d2" 
t=2021-02-15T15:08:35-0500 lvl=dbug msg="Started task operation: 017ff9d6-6fd0-4f25-af7c-1898b01b91d2" 
t=2021-02-15T15:08:35-0500 lvl=dbug msg="CreateInstanceFromBackup started" driver=ceph instance=target optimizedStorage=false pool=ceph_pool project=default snapshots=[]
t=2021-02-15T15:08:35-0500 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/events username=atrius
t=2021-02-15T15:08:35-0500 lvl=dbug msg="New event listener: 836f569d-bf72-4fdb-a6f6-d057412a9bd1" 
t=2021-02-15T15:08:35-0500 lvl=dbug msg=Handling ip=@ method=GET protocol=unix url=/1.0/operations/017ff9d6-6fd0-4f25-af7c-1898b01b91d2 username=atrius
t=2021-02-15T15:08:35-0500 lvl=dbug msg="Activated RBD volume" dev=/dev/rbd4 driver=ceph pool=ceph_pool vol=container_target
t=2021-02-15T15:08:36-0500 lvl=dbug msg="Mounted RBD volume" dev=/dev/rbd4 driver=ceph options=discard path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target pool=ceph_pool
t=2021-02-15T15:08:36-0500 lvl=dbug msg="Unmounted RBD volume" driver=ceph keepBlockDev=false path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target pool=ceph_pool
t=2021-02-15T15:08:37-0500 lvl=dbug msg="Deactivated RBD volume" driver=ceph pool=ceph_pool vol=container_target
t=2021-02-15T15:08:37-0500 lvl=dbug msg="Activated RBD volume" dev=/dev/rbd4 driver=ceph pool=ceph_pool vol=container_target
t=2021-02-15T15:08:37-0500 lvl=dbug msg="Mounted RBD volume" dev=/dev/rbd4 driver=ceph options=discard path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target pool=ceph_pool
t=2021-02-15T15:08:37-0500 lvl=dbug msg="Unpacking container filesystem volume" args="[-xf - --xattrs-include=* -C /var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target --strip-components=2 backup/container]" driver=ceph pool=ceph_pool source=backup/container target=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target
t=2021-02-15T15:10:57-0500 lvl=dbug msg="Unmounted RBD volume" driver=ceph keepBlockDev=false path=/var/snap/lxd/common/lxd/storage-pools/ceph_pool/containers/target pool=ceph_pool
t=2021-02-15T15:10:57-0500 lvl=dbug msg="Deactivated RBD volume" driver=ceph pool=ceph_pool vol=container_target
t=2021-02-15T15:10:59-0500 lvl=dbug msg="CreateInstanceFromBackup finished" driver=ceph instance=target optimizedStorage=false pool=ceph_pool project=default snapshots=[]
t=2021-02-15T15:11:24-0500 lvl=dbug msg="Event listener finished: 836f569d-bf72-4fdb-a6f6-d057412a9bd1" 
t=2021-02-15T15:11:24-0500 lvl=dbug msg="Disconnected event listener: 836f569d-bf72-4fdb-a6f6-d057412a9bd1"

Command to export: lxc export --compression none --instance-only target target.tar

Command to import: lxc import holding/target.tar target -s ceph_pool

Error was exactly the same as during the image launch process

tomp · February 15, 2021, 9:00pm

Well not exactly the same, its failing unpacking the tarball into containers volume and not the image volume, so now I have a scenario I can look to reproduce and confirm a bug.

Interestingly there’s also a mention of zfs in those logs, are either of your pools zfs?

tomp · February 15, 2021, 9:03pm

And out interest did setting volume.size on the import pool to a larger size fix it?

atrius · February 15, 2021, 9:08pm

The source system uses ZFS, so yeah. The target system also has a local pool that’s ZFS. What path in the system would it be using for that unpacking process and on which system would it be doing so? The last part is important as it’s a cluster and not all of them as much space outside of Ceph as the one I’m actually running the import command on.

Let me test that, though I’m reasonably sure it will fix it

atrius · February 15, 2021, 9:10pm

Set it to 50GB, the source system is only 30GB, and redoing the import