Issue during live migration LXD

lxd

(Ibrahim mohammedameen) #1

I am trying to live migrate between 2 machines using lxd
lxc move lxd:contianer lxd2:container
but when ever I run the command it give me this error
error: migration dump failed
(00.094811) Error (criu/sk-netlink.c:73): The socket has data to read
(00.094823) Error (criu/cr-dump.c:1313): Dump files (pid: 7431) failed with -1
(00.117718) Error (criu/cr-dump.c:1628): Dumping FAILED.
but I can migrate them if they are offline that mean after I stop the container then send it to the needed destination#

and they told me it might be a CRIU problem and check the containers migration log and this it what it shows

(00.096352) Searching for socket 25c4e (family 16.0)
(00.096357) Error (criu/sk-netlink.c:73): The socket has data to read
(00.096360) ----------------------------------------
(00.096372) Error (criu/cr-dump.c:1313): Dump files (pid: 14964) failed with -1
(00.096428) Waiting for 14964 to trap
(00.096439) Daemon 14964 exited trapping
(00.096452) Sent msg to daemon 5 0 0
pie: 1: __fetched msg: 5 0 0
pie: 1: 1: new_sp=0x7ff9a1664008 ip 0x7ff99fc649b3
(00.096551) 14964 was trapped
(00.096556) - Expecting exit (00.096565) 14964 was trapped (00.096569) 14964 is going to execute the syscall 15 (00.096579) 14964 was stopped (00.096730) 14964 was trapped (00.096738) 14964 is going to execute the syscall 186 (00.096747) 14964 was trapped (00.096750)- Expecting exit
(00.096757) 14964 was trapped
(00.096761) 14964 is going to execute the syscall 1
(00.096768) 14964 was trapped
(00.096771) `- Expecting exit
(00.096778) 14964 was trapped
(00.096782) 14964 is going to execute the syscall 11
(00.096802) 14964 was stopped
(00.096941) Unlock network
(00.096946) Running network-unlock scripts
(00.096952) /tmp/lxd_checkpoint_320574389/action.sh Unfreezing tasks into 1
(00.117611) Unseizing 14964 into 1
(00.117653) Unseizing 15064 into 1
(00.117858) Unseizing 15065 into 1
(00.117882) Unseizing 15464 into 1
(00.117912) Unseizing 15468 into 1
(00.117932) Unseizing 15490 into 1
(00.117944) Unseizing 15491 into 1
(00.117964) Unseizing 15492 into 1
(00.117977) Unseizing 15496 into 1
(00.117991) Unseizing 15515 into 1
(00.118041) Unseizing 15517 into 1
(00.118066) Unseizing 15663 into 1
(00.118090) Unseizing 15679 into 1
(00.118130) Error (criu/cr-dump.c:1628): Dumping FAILED.

hope you can help me with this issue

regards


(Stéphane Graber) #2

(00.096357) Error (criu/sk-netlink.c:73): The socket has data to read

This is the problem, it indicates that one of your processes had a netlink socket open with data buffered in it. This prevents CRIU from dumping/restoring that process and makes migration fail.

So your problem is very much a CRIU/kernel issue. Until CRIU has a way to dump such a process you can either retry the migration hoping that the socket will eventually not have any buffered data, or track down the problematic process and make sure it's not running before attempting live migration.