LXC/LXD v5.3 VolumeType Error cause 'lxc copy' command to fail

In lxc/lxd v 5.3, lxc copy fails with this error:

Error: Create instance from copy: Instance disk effective override field "size" should not be stored in volume config

I see the code commit lxd/storage/utils: Check that instanceDiskVolumeEffectiveFields are not used for instance volumes DB records

The code comment reads:
If the volumeType represents an instance type then check that the volumeConfig doesn't contain any of the instance disk effective override fields (which should not be stored in the database).

And the feature Storage: Simplify instance root disk volume config #10115 reads :
Storage: Simplify instance root disk volume config

From a userā€™s point of view, what worked in v5.2 no longer works in v5.3. I am not sure what problem was being solved and I do not know what I need to change in my environment (storage, profile, container) for the lxc copy command to work.

It will be very helpful to get more background on the problem being solved and steps to remediate.

I rolled lxc back to v5.2 because I have ansible scripts which failed.

Sounds like you have some left over invalid config in your storage volume DB record.

Please can you show the output of lxc storage volume show <pool> container/<instance> and lxc config show <instance> please.

I reverted to v5.3 and ran the commands.

Pool Info:

[ken@big-lab ~]$ lxc storage volume list 3tb 
+----------------------+----------------------------+-------------+--------------+---------+
|         TYPE         |            NAME            | DESCRIPTION | CONTENT-TYPE | USED BY |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | centos7base                |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | rockylinux8                |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container (snapshot) | rockylinux8/dse-cass-base  |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | ubuntu-fossa-base          |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+

Container Info:

[ken@big-lab ~]$ lxc storage volume show 3tb container/rockylinux8
config: {}
description: ""
name: rockylinux8
type: container
used_by:
- /1.0/instances/rockylinux8
location: none
content_type: filesystem

[ken@big-lab ~]$ lxc storage volume show 3tb container/ubuntu-fossa-base
config: {}
description: ""
name: ubuntu-fossa-base
type: container
used_by:
- /1.0/instances/ubuntu-fossa-base
location: none
content_type: filesystem

Snapshot Info:

[ken@big-lab ~]$ lxc storage volume show 3tb container/rockylinux8/dse-cass-base
description: ""
expires_at: 0001-01-01T00:00:00Z
name: dse-cass-base
config: {}
content_type: filesystem

Thanks again for the prompt replies.

And what is the full command you are running that gives the error?

Can you also show output of ā€œlxc storage show 3tbā€ and ā€œlxc config show (instance)ā€ thanks

Iā€™m running into the same issue trying to create an lxc copy on one of my lxd backup servers.

The command I am running is

lxc snapshot c1 --reuse  for-staging
lxc copy c1/for-staging c1-staging

It doesnā€™t create the copy and errors in:

Create instance from copy: Instance disk effective override field "size" should not be stored in volume config

The output of the container in question

lxc config show c1

So I assume itā€™s complaining about the size: 500 GB?

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian bullseye amd64 (20220315_05:24)
  image.os: Debian
  image.release: bullseye
  image.serial: "20220315_05:24"
  image.type: squashfs
  image.variant: default
  limits.cpu: "8"
  limits.cpu.allowance: 100%
  limits.memory: 8GB
  snapshots.expiry: 30d
  snapshots.schedule: 30 19 * * *
  volatile.apply_template: copy
  volatile.base_image: ca9d8388c9d3f83fc6da2517443af27da1b6588dff320a1281a4b7f1f9235815
  volatile.cloud-init.instance-id: f14efa2c-eeef-454a-b4f0-cecd88daf7d9
  volatile.eth0.hwaddr: 00:16:3e:24:48:80
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.uuid: 1533c10e-79dd-43c6-8c96-f4cdda4fd995
devices:
  root:
    path: /
    pool: default
    size: 500GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

If I edit the config to take out the size, it works. But this is the way Iā€™ve limited size of containers in past. Should that be done some other way?

I put storage limits in profiles.

Copy Container

[ken@big-lab ~]$ lxc copy rockylinux8 test
Error: Create instance from copy: Instance disk effective override field ā€œsizeā€ should not be stored in volume config

Copy Snapshot

[ken@big-lab ~]$ lxc copy rockylinux8/dse-cass-base test
Error: Create instance from copy: Instance disk effective override field ā€œsizeā€ should not be stored in volume config

lxc storage show 3tb

[ken@big-lab ~]$ lxc storage show 3tb
config:
  source: /data/lxd-storage
description: 3TB Drive
name: 3tb
driver: dir
used_by:
- /1.0/instances/centos7base
- /1.0/instances/rockylinux8
- /1.0/instances/ubuntu-fossa-base
- /1.0/profiles/default
- /1.0/profiles/dse-cass-default
- /1.0/profiles/kubes
- /1.0/profiles/small-instance-macvlan
status: Created
locations:
- none

lxc config show rockylinux8

[ken@big-lab ~]$ lxc config show rockylinux8
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Rockylinux 8 amd64 (20220602_05:04)
  image.os: Rockylinux
  image.release: "8"
  image.serial: "20220602_05:04"
  image.type: squashfs
  image.variant: default
  volatile.base_image: 0a5b0cfb2410cd12c35a928668db8ea160dcd50495e8fcfadb35ac391e04cfa4
  volatile.cloud-init.instance-id: f50638b0-0560-4697-8a8c-8b871095a600
  volatile.ens0.hwaddr: 00:16:3e:55:58:e5
  volatile.ens0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.uuid: 2a0b0b3a-f251-43f4-8cf4-f8599779aaf4
devices: {}
ephemeral: false
profiles:
- dse-cass-default
stateful: false
description: ""

dse-cass-default Profile:

[ken@big-lab ~]$ lxc profile show dse-cass-default
config:
  limits.cpu: "6"
  limits.memory: 51GB
  limits.memory.enforce: hard
  security.privileged: "true"
description: Disk, CPU, and RAM configs for DSE/C* nodes
devices:
  ens0:
    nictype: macvlan
    parent: eno3
    type: nic
  root:
    path: /
    pool: 3tb
    size: 200GiB
    type: disk
name: dse-cass-default
used_by:
- /1.0/instances/rockylinux8

What storage pool type do you use?

Thanks, so you are using Dir pool type with ext4 project quotas?

Confirmed issue, have reproduced, looking into it now.

1 Like

Thanks, so you are using Dir pool type with ext4 project quotas?

No, I am using Dir w/ xfs.

Got a fix here:

1 Like

Excellent! When do you anticipate it being pushed out to the snap repository?

Once merged, @stgraber can cherry-pick into the latest/stable snap channel after 48 hours or so.

Thanksā€¦ Iā€™ve been in software development a long time, I understand fixes are usually not released in an ad-hoc manner. If the release ETA is 30+ days, I wonā€™t hit F5 on my browser every day until then. :wink:

1 Like

zfs

We can see the same error message in our logs when trying to start LXD after upgrade from 5.2.1 to 5.3.1. When downgrading LXD back to 5.2.1 everything works fine. None of our volume configs contain keyword ā€˜sizeā€™.

Failed to start the daemon" err="Failed applying patch \"storage_missing_snapshot_records\": Failed applying patch to pool \"default\": Instance disk effective override field \"size\" should not be stored in volume config

Is this a different issue or another symptom of the broken commit discussed here?

Interesting, I suspect this is a related, but different issue, and in this case the check is actually doing the right thing.

Now youā€™re back on LXD 5.2 (btw there is no such thing as LXD 5.2.1 or LXD 5.3.1) can you run:

sudo lxd sql global "select * from storage_volumes left join storage_volumes_config on storage_volumes_config.storage_volume_id = storage_volumes.id where type = 0 order by storage_volumes.name"

And provide the output, as that should show any problem volumes.

Apparently I am victim to something similar. My guess is that an automatic snap refresh left me in perpetual

time="2022-08-05T20:02:56-07:00" level=error msg="Failed to start the daemon" err="Failed applying patch \"storage_missing_snapshot_records\": Failed applying patch to pool \"vg0\": Instance disk effective override field \"size\" should not be stored in volume config"

in /var/snap/lxd/common/lxd/logs/lxd.log for I donā€™t know how long. I had to reboot and now none of my containers are started and I cannot connect to lxd because it is not running.

Iā€™m tracking latest/stable: 5.4-82d05d6 2022-07-27 (23339) but my familiarity with the LXD project and git branching/tagging is absent, so I canā€™t tell if this fix has been pulled in or not, or whether it should have fixed this issue. Iā€™ve tried switching to latest/candidate and latest/edge, but the problem persisted. I attempted to downgrade to 5.3/stable, but it would not let me because of a revision mismatch in my local database (I think) - I did not capture the message. If this is a separate issue, I will gladly start a new topic, but it seemed relevant as the message is identical from rdratlos.

-brmiller

Hi, sorry to hear youā€™re having trouble.

I think youā€™ve been affected by two old bugs whose effects have lingered in your LXD database and have now been flagged up due to recent tightening up of validation checks.

Firstly, LXD has detected that some of your instances are missing associated storage volume DB records for some of the snapshot instance DB records. It is then trying to recreate the missing volume DB records using the main instance volume DB record as a basis. Unfortunately youā€™re then being affected by a 2nd issue because the main instance volume DB record contains a size config setting (which is invalid for instance volume DB records as the size should come from the instanceā€™s root disk setting) which is then preventing the replacement instance snapshot volume DB records from being inserted.

To fix this we need to identify which instance volume DB records have an invalid size config setting and remove the setting.

To do this first identify the problematic storage volume records using:

sudo apt install sqlite3
sudo sqlite3 -table /var/snap/lxd/common/lxd/database/global/db.bin 'select storage_volumes_config.id as configID, storage_volumes.name, key, value from storage_volumes left join storage_volumes_config on storage_volumes.id = storage_volumes_config.storage_volume_id where storage_volumes.type = 0 and storage_volumes_config.key = "size";'
+----------+------+------+-------+
| configID | name | key  | value |
+----------+------+------+-------+
| x        | c1   | size | 20GiB |
+----------+------+------+-------+

This should get you one or more rows identifying the instance name(s) and the problematic configID row ID(s) in the storage_volumes_config table.

Next you need to prepare a /var/snap/lxd/common/lxd/database/patch.global.sql file to tell LXD to remove the problematic rows on startup.

E.g.

delete from storage_volumes_config where id in (x,y,z...);

Then reload/restart LXD:

sudo systemctl reload snap.lxd.daemon