Error "Volume path already exists" when creating multiple VMs in parallel

mezobari · March 4, 2023, 4:06pm

Hello everyone,

I’m using LXD to create multiple virtual machines in parallel or very close time range. To do this, I launch them with lxc launch, stop them with lxc stop -f, and then delete them with lxc delete -f. I repeat this process to create new VMs.

This works most of the time, but occasionally I get an error message that says:

Error: Failed creating instance from image: Volume path "/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm-x" already exists

This error occurs even though I have successfully stopped and deleted the VM.

What should I do to resolve this error? I appreciate any help or suggestions you can provide.

Thank you!

tomp · March 5, 2023, 12:30pm

What lxd version?

mezobari · March 5, 2023, 1:12pm

LXD version: 5.0.2
System: Ubuntu 22.04.2 LTS

I think install via snap stable channel:

sudo snap install lxd --channel=5.0/stable

mezobari · March 5, 2023, 1:13pm

I think the issue is from parallel creation, since if I create them one by one, it does not matter how many VMs I create I’m never getting this error

tomp · March 5, 2023, 2:27pm

On a separate machine so you get aim issue with lxd 5.11 (if you upgrade your 5.0.2 install to 5.11 you won’t be able to downgrade again)?

mezobari · March 5, 2023, 2:35pm

It’s a testing machine anyway, I will try same with the latest version and will come back to you

mezobari · March 5, 2023, 3:32pm

sudo snap install lxd
lxd init --auto

lxd version: 5.11

Now getting different error randomly during lxc launch ... (3 out of 20 machines)

Stderr: Error: Failed creating instance from image: Failed reading image info 
"/var/snap/lxd/common/lxd/images/102c0fdafc87c8be84a604f1cf4fdc2414f90bb31b9301fae1bba4d8201095a8.rootfs":
Failed to run: prlimit --cpu=2 --as=1000000000 qemu-img info -f qcow2 --output=json /var/snap/lxd/common/lxd/images/102c0fdafc87c8be84a604f1cf4fdc2414f90bb31b9301fae1bba4d8201095a8.rootfs:
Process exited with non-zero value 1 (aa-exec: ERROR: profile 'lxd_qemu-img-var-snap-lxd-common-lxd-images-102c0fdafc87c8be84a604f1cf4fdc2414f90bb31b9301fae1bba4d8201095a8.rootfs' does not exist)

and

Error: Failed to begin transaction: context deadline exceeded

I think all these errors are due to fact that I’m launching them potentially in parallel (20 of them?), otherwise they seem to work

tomp · March 17, 2023, 9:48am

Thanks, I’ll see if I can recreate the issue.

Do you have a simple script that causes it on your system?

tomp · March 17, 2023, 9:49am

Also, please can you show lxc storage show <pool> for the storage pool you’re using with the instances?

mezobari · March 19, 2023, 11:31am

config:
  source: /var/snap/lxd/common/lxd/storage-pools/default
description: ""
name: default
driver: dir
used_by:
- /1.0/instances/worker-init
- /1.0/instances/worker-init/snapshots/snap
- /1.0/profiles/default
status: Created
locations:
- none

mezobari · March 19, 2023, 11:33am

Unfortunately, it’s part of a big project.
We simultaneously creating/stopping/deleting VMs

Someone on Github had same or similar issue and they provided simple bash script to run vms in parallel, I think same can be used here as well

tomp · March 21, 2023, 4:58pm

This error seems like maybe LXD is refreshing the base image do a different hash ID at the same time that the image is being used to create an image from it. There should be a lock that prevents this, but perhaps there’s an edge case here. I’ve assigned this to myself so will try and take a look ad reproducing it.

mezobari · March 21, 2023, 8:37pm

Problem seems to be have more VM than my CPU can handle

I used to run 20 VMs with 4vcpu when I have total 24 cores, and that VMs each run full CPU capacity.
with more realistic load like 6 VMs, everything seems to work