Infinite `lxc stop`

Nick_Knutov · August 17, 2020, 2:22am

I have strange behavior of lxd stop on one of ours server.

ostemplate="ubuntu/20.04"
ct="c11"
lxc launch images:${ostemplate} ${ct}
lxc stop ${ct}

lxc stop here takes always infinite time.

This happens only on one server (Ubuntu 18.04.5 LTS with zfs, latest lxd from snap).
It works as expected on all others server.
I’ve tried to reboot, nothing changed.
This server has server-grade hardware with rotational disks, with almost no loads (cpu and disk io is 99% free)
I can run in another shell lxc exec c11 -- shutdown -h now and it works.

How to understand what happens and how to fix it?

Thanks

stgraber · August 17, 2020, 3:56am

lxc stop only signals the container’s init system to shutdown, if you do it too early, there’s a good chance the init system in the container hasn’t setup a signal handler yet.

Wait a bit and try lxc stop again. If that still doesn’t work, refer to the logs inside your container to see why it’s not shutting down.

Nick_Knutov · August 17, 2020, 9:51am

Thanks,

ostemplate="ubuntu/20.04"
ct="c11"
time lxc launch images:${ostemplate} ${ct}
sleep 1 # solution
lxc stop ${ct}

works.

Nick_Knutov · August 17, 2020, 3:15pm

It looks like a bigger problem than I think. Possible it is all because go is very asynchronous.

I have my own orchestration around lxd using (if simple) bash scripts with multiple calls lxc ....
Now I have to do sleep 1 between almost all of them, so it looks like most lxc ... commands do some call to the lxd api and do exit before action is actually executed.

As an example - if I have two containers with same public ipv4 and routed network I can not sometimes do lxc stop ct1 ; lxc start ct2 - it causes error message like:

Error: Common start logic: Failed to start device “eth0”: Failed to run: ip -4 route add 1.2.3.4/32 dev veth0c2 proto boot: RTNETLINK answers: File exists

and any sleep does not helps here.

Is it possible to make all calls lxc ... synchronous or any other way to solve the problem?