LXC/LXD v5.3 VolumeType Error cause 'lxc copy' command to fail

kenphused · July 5, 2022, 5:32pm

In lxc/lxd v 5.3, lxc copy fails with this error:

Error: Create instance from copy: Instance disk effective override field "size" should not be stored in volume config

I see the code commit lxd/storage/utils: Check that instanceDiskVolumeEffectiveFields are not used for instance volumes DB records

The code comment reads:
If the volumeType represents an instance type then check that the volumeConfig doesn't contain any of the instance disk effective override fields (which should not be stored in the database).

And the feature Storage: Simplify instance root disk volume config #10115 reads :
Storage: Simplify instance root disk volume config

From a user’s point of view, what worked in v5.2 no longer works in v5.3. I am not sure what problem was being solved and I do not know what I need to change in my environment (storage, profile, container) for the lxc copy command to work.

It will be very helpful to get more background on the problem being solved and steps to remediate.

I rolled lxc back to v5.2 because I have ansible scripts which failed.

tomp · July 5, 2022, 6:28pm

Sounds like you have some left over invalid config in your storage volume DB record.

Please can you show the output of lxc storage volume show <pool> container/<instance> and lxc config show <instance> please.

kenphused · July 5, 2022, 10:27pm

I reverted to v5.3 and ran the commands.

Pool Info:

[ken@big-lab ~]$ lxc storage volume list 3tb 
+----------------------+----------------------------+-------------+--------------+---------+
|         TYPE         |            NAME            | DESCRIPTION | CONTENT-TYPE | USED BY |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | centos7base                |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | rockylinux8                |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container (snapshot) | rockylinux8/dse-cass-base  |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+
| container            | ubuntu-fossa-base          |             | filesystem   | 1       |
+----------------------+----------------------------+-------------+--------------+---------+

Container Info:

[ken@big-lab ~]$ lxc storage volume show 3tb container/rockylinux8
config: {}
description: ""
name: rockylinux8
type: container
used_by:
- /1.0/instances/rockylinux8
location: none
content_type: filesystem

[ken@big-lab ~]$ lxc storage volume show 3tb container/ubuntu-fossa-base
config: {}
description: ""
name: ubuntu-fossa-base
type: container
used_by:
- /1.0/instances/ubuntu-fossa-base
location: none
content_type: filesystem

Snapshot Info:

[ken@big-lab ~]$ lxc storage volume show 3tb container/rockylinux8/dse-cass-base
description: ""
expires_at: 0001-01-01T00:00:00Z
name: dse-cass-base
config: {}
content_type: filesystem

Thanks again for the prompt replies.

tomp · July 5, 2022, 10:50pm

And what is the full command you are running that gives the error?

tomp · July 5, 2022, 10:53pm

Can you also show output of “lxc storage show 3tb” and “lxc config show (instance)” thanks

robe2 · July 5, 2022, 11:48pm

I’m running into the same issue trying to create an lxc copy on one of my lxd backup servers.

The command I am running is

lxc snapshot c1 --reuse  for-staging
lxc copy c1/for-staging c1-staging

It doesn’t create the copy and errors in:

Create instance from copy: Instance disk effective override field "size" should not be stored in volume config

The output of the container in question

lxc config show c1

So I assume it’s complaining about the size: 500 GB?

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian bullseye amd64 (20220315_05:24)
  image.os: Debian
  image.release: bullseye
  image.serial: "20220315_05:24"
  image.type: squashfs
  image.variant: default
  limits.cpu: "8"
  limits.cpu.allowance: 100%
  limits.memory: 8GB
  snapshots.expiry: 30d
  snapshots.schedule: 30 19 * * *
  volatile.apply_template: copy
  volatile.base_image: ca9d8388c9d3f83fc6da2517443af27da1b6588dff320a1281a4b7f1f9235815
  volatile.cloud-init.instance-id: f14efa2c-eeef-454a-b4f0-cecd88daf7d9
  volatile.eth0.hwaddr: 00:16:3e:24:48:80
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.uuid: 1533c10e-79dd-43c6-8c96-f4cdda4fd995
devices:
  root:
    path: /
    pool: default
    size: 500GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

If I edit the config to take out the size, it works. But this is the way I’ve limited size of containers in past. Should that be done some other way?

kenphused · July 6, 2022, 12:50am

I put storage limits in profiles.

Copy Container

[ken@big-lab ~]$ lxc copy rockylinux8 test
Error: Create instance from copy: Instance disk effective override field “size” should not be stored in volume config

Copy Snapshot

[ken@big-lab ~]$ lxc copy rockylinux8/dse-cass-base test
Error: Create instance from copy: Instance disk effective override field “size” should not be stored in volume config

lxc storage show 3tb

[ken@big-lab ~]$ lxc storage show 3tb
config:
  source: /data/lxd-storage
description: 3TB Drive
name: 3tb
driver: dir
used_by:
- /1.0/instances/centos7base
- /1.0/instances/rockylinux8
- /1.0/instances/ubuntu-fossa-base
- /1.0/profiles/default
- /1.0/profiles/dse-cass-default
- /1.0/profiles/kubes
- /1.0/profiles/small-instance-macvlan
status: Created
locations:
- none

lxc config show rockylinux8

[ken@big-lab ~]$ lxc config show rockylinux8
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Rockylinux 8 amd64 (20220602_05:04)
  image.os: Rockylinux
  image.release: "8"
  image.serial: "20220602_05:04"
  image.type: squashfs
  image.variant: default
  volatile.base_image: 0a5b0cfb2410cd12c35a928668db8ea160dcd50495e8fcfadb35ac391e04cfa4
  volatile.cloud-init.instance-id: f50638b0-0560-4697-8a8c-8b871095a600
  volatile.ens0.hwaddr: 00:16:3e:55:58:e5
  volatile.ens0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.uuid: 2a0b0b3a-f251-43f4-8cf4-f8599779aaf4
devices: {}
ephemeral: false
profiles:
- dse-cass-default
stateful: false
description: ""

dse-cass-default Profile:

[ken@big-lab ~]$ lxc profile show dse-cass-default
config:
  limits.cpu: "6"
  limits.memory: 51GB
  limits.memory.enforce: hard
  security.privileged: "true"
description: Disk, CPU, and RAM configs for DSE/C* nodes
devices:
  ens0:
    nictype: macvlan
    parent: eno3
    type: nic
  root:
    path: /
    pool: 3tb
    size: 200GiB
    type: disk
name: dse-cass-default
used_by:
- /1.0/instances/rockylinux8

tomp · July 6, 2022, 1:50am

What storage pool type do you use?

tomp · July 6, 2022, 1:50am

Thanks, so you are using Dir pool type with ext4 project quotas?

tomp · July 6, 2022, 10:44am

Confirmed issue, have reproduced, looking into it now.

kenphused · July 6, 2022, 11:58am

Thanks, so you are using Dir pool type with ext4 project quotas?

No, I am using Dir w/ xfs.

tomp · July 6, 2022, 1:45pm

Got a fix here:

kenphused · July 6, 2022, 2:01pm

Excellent! When do you anticipate it being pushed out to the snap repository?

tomp · July 6, 2022, 2:21pm

Once merged, @stgraber can cherry-pick into the latest/stable snap channel after 48 hours or so.

kenphused · July 6, 2022, 2:54pm

Thanks… I’ve been in software development a long time, I understand fixes are usually not released in an ad-hoc manner. If the release ETA is 30+ days, I won’t hit F5 on my browser every day until then.

robe2 · July 9, 2022, 11:38pm

zfs

rdratlos · July 13, 2022, 7:16am

We can see the same error message in our logs when trying to start LXD after upgrade from 5.2.1 to 5.3.1. When downgrading LXD back to 5.2.1 everything works fine. None of our volume configs contain keyword ‘size’.

Failed to start the daemon" err="Failed applying patch \"storage_missing_snapshot_records\": Failed applying patch to pool \"default\": Instance disk effective override field \"size\" should not be stored in volume config

Is this a different issue or another symptom of the broken commit discussed here?

tomp · July 13, 2022, 7:25am

Interesting, I suspect this is a related, but different issue, and in this case the check is actually doing the right thing.

Now you’re back on LXD 5.2 (btw there is no such thing as LXD 5.2.1 or LXD 5.3.1) can you run:

sudo lxd sql global "select * from storage_volumes left join storage_volumes_config on storage_volumes_config.storage_volume_id = storage_volumes.id where type = 0 order by storage_volumes.name"

And provide the output, as that should show any problem volumes.

brmiller · August 6, 2022, 3:09am

Apparently I am victim to something similar. My guess is that an automatic snap refresh left me in perpetual

time="2022-08-05T20:02:56-07:00" level=error msg="Failed to start the daemon" err="Failed applying patch \"storage_missing_snapshot_records\": Failed applying patch to pool \"vg0\": Instance disk effective override field \"size\" should not be stored in volume config"

in /var/snap/lxd/common/lxd/logs/lxd.log for I don’t know how long. I had to reboot and now none of my containers are started and I cannot connect to lxd because it is not running.

I’m tracking latest/stable: 5.4-82d05d6 2022-07-27 (23339) but my familiarity with the LXD project and git branching/tagging is absent, so I can’t tell if this fix has been pulled in or not, or whether it should have fixed this issue. I’ve tried switching to latest/candidate and latest/edge, but the problem persisted. I attempted to downgrade to 5.3/stable, but it would not let me because of a revision mismatch in my local database (I think) - I did not capture the message. If this is a separate issue, I will gladly start a new topic, but it seemed relevant as the message is identical from rdratlos.

-brmiller

tomp · August 7, 2022, 8:09pm

Hi, sorry to hear you’re having trouble.

I think you’ve been affected by two old bugs whose effects have lingered in your LXD database and have now been flagged up due to recent tightening up of validation checks.

Firstly, LXD has detected that some of your instances are missing associated storage volume DB records for some of the snapshot instance DB records. It is then trying to recreate the missing volume DB records using the main instance volume DB record as a basis. Unfortunately you’re then being affected by a 2nd issue because the main instance volume DB record contains a size config setting (which is invalid for instance volume DB records as the size should come from the instance’s root disk setting) which is then preventing the replacement instance snapshot volume DB records from being inserted.

To fix this we need to identify which instance volume DB records have an invalid size config setting and remove the setting.

To do this first identify the problematic storage volume records using:

sudo apt install sqlite3
sudo sqlite3 -table /var/snap/lxd/common/lxd/database/global/db.bin 'select storage_volumes_config.id as configID, storage_volumes.name, key, value from storage_volumes left join storage_volumes_config on storage_volumes.id = storage_volumes_config.storage_volume_id where storage_volumes.type = 0 and storage_volumes_config.key = "size";'
+----------+------+------+-------+
| configID | name | key  | value |
+----------+------+------+-------+
| x        | c1   | size | 20GiB |
+----------+------+------+-------+

This should get you one or more rows identifying the instance name(s) and the problematic configID row ID(s) in the storage_volumes_config table.

Next you need to prepare a /var/snap/lxd/common/lxd/database/patch.global.sql file to tell LXD to remove the problematic rows on startup.

E.g.

delete from storage_volumes_config where id in (x,y,z...);

Then reload/restart LXD:

sudo systemctl reload snap.lxd.daemon