Error: Error transferring instance data: Failed getting instance storage pool name: Instance storage pool not found

robberer · June 7, 2022, 10:01am

No error when i stop the source container

tomp · June 7, 2022, 10:09am

Do you have snap set lxd criu.enable=true out of interest? (it defaults to off).

robberer · June 7, 2022, 10:18am

It is enabled on source and target.

tomp · June 7, 2022, 10:18am

Do you see the lxd pid on the target change after the migration fails (indicating LXD crashed and was restarted)?

tomp · June 7, 2022, 10:18am

Interesting.

tomp · June 7, 2022, 10:19am

Can you disable it and try with it running again?

tomp · June 7, 2022, 10:20am

Ah yes thats the issue:

sudo snap set lxd criu.enable=true; sudo systemctl reload snap.lxd.daemon
lxc copy --mode push --refresh --stateless  --config boot.autostart=false c1 v2:c2
Error: Failed instance migration: Failed reading migration header: websocket: close 1006 (abnormal closure): unexpected EOF

robberer · June 7, 2022, 10:24am

snap set lxd criu.enable=false;
error: cannot perform the following tasks:
- Run configure hook of "lxd" snap (snap "lxd" option "criu" is not a map)

I can execute snap set lxd criu=false but it doesn’t has an effect to the websocket: close 1006 error.

tomp · June 7, 2022, 10:25am

On both sides

robberer · June 7, 2022, 10:27am

Did it on both sides with a reload. But the command you posted does not work see last answer

tomp · June 7, 2022, 10:27am

Oh it seems to persistently break once criu has been enabled once even after being disabled again.

robberer · June 7, 2022, 10:33am

I’m not able to disable it. The command throws an error. After searching for this error i found my own report from last year Run configure hook of "lxd" snap (snap "lxd" option "criu" is not a map)

tomp · June 7, 2022, 10:33am

OK so first thing I need to figure out is whether

sudo snap set lxd criu.enable=false; sudo systemctl reload snap.lxd.daemon

or

sudo snap unset lxd criu.enable; sudo systemctl reload snap.lxd.daemon

is actually working, and if not why not.
If it is, then need to figure out what its not working with CRIU enabled.

BTW, I suspect CRIU was never working properly before as it doesn’t work with containers that have networking AFAIK.

tomp · June 7, 2022, 10:36am

I wonder if your snapd version is out of date (as you’re running Debian on the host?).

Anyway, I would suggest stearing clear of CRIU as its not well supported.
There’s still a clear bug here but in general even without the bugs I wouldn’t expect CRIU to work in your situation.

robberer · June 7, 2022, 10:42am

OK, after executing snap unset lxd criu i was able to set snap set lxd criu.enable=false.

Tried both, with set lxd criu.enable=false and unset lxd criu.enable on source/target with reload. The socket error persists.

robberer · June 7, 2022, 10:44am

I never used criu. I had a try last year and decided that it is not stable enough for our env and then forgot about it.

Debian 11.3 / Snap 2.55.5

tomp · June 7, 2022, 11:12am

I’ve reproduced it without CRIU and have logged this issue for you:

github.com/lxc/lxd

Optimized refresh broken with snapshots and push mode

opened 11:11AM - 07 Jun 22 UTC

tomponline

Bug

Related to https://github.com/lxc/lxd/issues/10186 Host 2: ``` lxd init --a…uto lxc storage create zfs zfs lxc config set core.https_address [::]:8443 lxc config set core.trust_password pw ``` Host 1: ``` lxd init --auto lxc storage create zfs zfs lxc config set core.https_address [::]:8443 lxc config set core.trust_password pw lxc remote add v2 v2 lxc launch images:alpine/3.16 c1 -s zfs lxc copy --mode=push --refresh --stateless c1 v2: # Works OK lxc copy --mode=push --refresh --stateless c1 v2: # Works OK lxc snapshot c1 lxc copy --mode=push --refresh --stateless c1 v2: # Works OK lxc copy --mode=push --refresh --stateless c1 v2: # Fails Error: Failed instance migration: Failed reading migration header: websocket: close 1006 (abnormal closure): unexpected EOF lxc copy --refresh --stateless c1 v2 # Works OK lxc copy --refresh --stateless c1 v2 # Works OK lxc delete c1/snap lxc copy --mode=push --refresh --stateless c1 v2: # Works OK lxc copy --mode=push --refresh --stateless c1 v2: # Works OK lxc snapshot c1 lxc copy --refresh --stateless c1 v2: # Works OK lxc copy --refresh --stateless c1 v2: # Works OK lxc copy --mode=push --refresh --stateless c1 v2: # Fails Error: Failed instance migration: Failed reading migration header: websocket: close 1006 (abnormal closure): unexpected EOF ```

tomp · June 7, 2022, 11:29am

Remember to remove the lxd.debug file and reload otherwise you’ll not get automatic updates to LXD in the future.

robberer · June 7, 2022, 12:14pm

Very nice, the error is gone without --mode push. I’ve experimented with push/pull prior 5.x because --refresh had really bad performance. Now with 5.x --refresh migrations are lightning fast. Thank you @tomp for solving this issue.

tomp · June 7, 2022, 12:23pm

The actual transfers (in theory) are exactly the same with --pull (the default) or --push.
The only difference is how the connection is established. With --pull the target connects back to the source and with --push its the other way round.

What you may have observed as slowness earlier was the delay in establishing a connection when using pull mode, as LXC would try various different IP combinations to reach the source server from the target. Whereas --push mode may have connected quicker.

However there is clearly some issue that is making --push behave differently now, which shouldn’t be there.

Glad using --pull is working for you though