I am unable to move container from one host node to another i want to live migrate my containers without stopping them
Following is the error i am facing
Error: Failed instance creation:
https://10.128.0.21:8443: Error transferring instance data: migration dump failed
(01.356329) Warn (compel/arch/x86/src/lib/infect.c:281): Will restore 11345 with interrupted system call
(01.939712) Error (criu/page-xfer.c:401): page-xfer: No parent image found, though parent directory is set: No such file or directory
(01.951230) Error (criu/page-xfer.c:401): page-xfer: No parent image found, though parent directory is set: No such file or directory
(03.075407) Warn (criu/files-reg.c:1510): Couldn’t find the build-id note for file with fd 16
(03.076880) Error (criu/files-ext.c:96): Can’t dump file 10 of that type [20666] (chr 10:229)
(03.076893) Error (criu/cr-dump.c:1351): Dump files (pid: 11689) failed with -1
(03.084126) Error (criu/cr-dump.c:1768): Dumping FAILED.
https://10.191.26.1:8443: Error transferring instance data: Unable to connect to: 10.191.26.1:8443
https://[fd42:5e36:39c4:e4b::1]:8443: Error transferring instance data: Unable to connect to: [fd42:5e36:39c4:e4b::1]:8443
That’s CRIU failing to serialize your source container.
It’s very rare for random containers to be live migratable, in most cases, it requires conscious thought on the services being run in them and their setup to make this possible.
There are a couple of show stoper issues with CRIU right now which @brauner is working on, but in your case you seem to be hitting a different issue.
The error above says that CRIU failed to dump a file descriptor from process 11689, that’s for fd slot 10 which holds a descriptor of a type which CRIU does not support.
The only options out of that are 1) configure whatever process that is to not use that kind of files 2) do not run whatever process that is 3) add support to the Linux kernel and CRIU to serialize that particular file type
https://10.128.0.21:8443: Error transferring instance data: migration dump failed
(00.834725) Warn (compel/arch/x86/src/lib/infect.c:281): Will restore 9476 with interrupted system call
(02.097016) Error (criu/page-xfer.c:401): page-xfer: No parent image found, though parent directory is set: No such file or directory
(02.124676) Error (criu/page-xfer.c:401): page-xfer: No parent image found, though parent directory is set: No such file or directory
(02.124781) Error (criu/page-xfer.c:401): page-xfer: No parent image found, though parent directory is set: No such file or directory
(04.304474) Warn (compel/arch/x86/src/lib/infect.c:281): Will restore 9951 with interrupted system call
(05.221252) Warn (criu/files-reg.c:1510): Couldn’t find the build-id note for file with fd 16
(05.223459) Error (criu/files-ext.c:96): Can’t dump file 10 of that type [20666] (chr 10:229)
(05.223479) Error (criu/cr-dump.c:1351): Dump files (pid: 9515) failed with -1
(05.234506) Error (criu/cr-dump.c:1768): Dumping FAILED.
https://10.191.26.1:8443: Error transferring instance data: Unable to connect to: 10.191.26.1:8443
https://[fd42:5e36:39c4:e4b::1]:8443: Error transferring instance data: Unable to connect to: [fd42:5e36:39c4:e4b::1]:8443
Ah, you got further with that one, you may have now hit one of the issues that @brauner is working on.
Can you take a look at /var/snap/lxd/common/lxd/logs/migration/ on the target and look for restore log files. The content of those files may indicate what happened.