Lxc vm initiate error on a specific target in lxd cluster

When initiating lxc vm on a certain target:

lxc init -p oem-iot-focal oem-focal oem-iot-focal-2 --vm --target core-taipei2

There is always this error:

Creating oem-iot-focal-2
Error: Failed creating instance from image: Failed to run: zfs clone local/images/d6df416450ee8dc51290df03162b11035adda24608627b5dc318fbafe7e259d3@readonly local/virtual-machines/oem-iot-focal-2: cannot create ‘local/virtual-machines/oem-iot-focal-2’: dataset already exists

Only error on this target but not any other targets in the cluster.

Looks like you have an old ZFS data set left over on that server.

Can you show output of sudo zfs list?

sudo zfs list

local 846M 720G 24K none
local/containers 24K 720G 24K none
local/custom 24K 720G 24K none
local/deleted 120K 720G 24K none
local/deleted/containers 24K 720G 24K none
local/deleted/custom 24K 720G 24K none
local/deleted/images 24K 720G 24K none
local/deleted/virtual-machines 24K 720G 24K none
local/images 804M 720G 24K none
local/images/d6df416450ee8dc51290df03162b11035adda24608627b5dc318fbafe7e259d3 28.5K 95.3M 27.5K /var/snap/lxd/common/lxd/storage-pools/local/images/d6df416450ee8dc51290df03162b11035adda24608627b5dc318fbafe7e259d3
local/images/d6df416450ee8dc51290df03162b11035adda24608627b5dc318fbafe7e259d3.block 417M 720G 417M -
local/images/e7e1b3eea2eb912b6376c083e5ad5b4265d9642041ebc2c2e5cc3a43c054af10 28.5K 95.3M 27.5K /var/snap/lxd/common/lxd/storage-pools/local/images/e7e1b3eea2eb912b6376c083e5ad5b4265d9642041ebc2c2e5cc3a43c054af10
local/images/e7e1b3eea2eb912b6376c083e5ad5b4265d9642041ebc2c2e5cc3a43c054af10.block 387M 720G 387M -
local/virtual-machines 17.0M 720G 24K none
local/virtual-machines/oem-iot-bionic-1 5.66M 89.7M 5.67M /var/snap/lxd/common/lxd/storage-pools/local/virtual-machines/oem-iot-bionic-1
local/virtual-machines/oem-iot-focal-1 5.66M 89.7M 5.67M /var/snap/lxd/common/lxd/storage-pools/local/virtual-machines/oem-iot-focal-1
local/virtual-machines/oem-iot-focal-2 5.66M 89.7M 5.67M /var/snap/lxd/common/lxd/storage-pools/local/virtual-machines/oem-iot-focal-2

So if you’re confident that you don’t need that existing dataset, then you can delete it using zfs destroy

Yes, that works after I destroy the dataset.
My concern on this issue is that: it just happened to be not working after a while. Day by day my Jenkins job will build an image and renew the VM. Should there be any root cause on this occasionally issue?

It suggests that when the VM was deleted the ZFS data set was not able to be deleted. If you can capture the scenario when that happens (perhaps because the delete command fails or you get some errors in the log), then we can see if we can find the cause of it.

1 Like

the error log is followed:

18:47:55 + lxc stop oem-iot-focal-1
18:47:59 + lxc delete oem-iot-focal-1
18:48:01 Error: Error deleting storage volume: Failed to run: zfs destroy -r local/virtual-machines/oem-iot-focal-1: cannot destroy ‘local/virtual-machines/oem-iot-focal-1’: dataset is busy

so what you mean is when I capture this Error, I can do a post job to do

zfs destroy local/virtual-machines/oem-iot-focal-1

@stgraber I believe you’ve seen this issue before, is that likely to be the snap issue?

@Jason_Lo does this occurs every time or intermittently?

It only happens one time after my Jenkins job was running the renew VM for more than 2 months (at least once per day).
But once it happens, I can no more renew the VM with the same name on that specific target in the cluster.