Stateful start/stop

I notice that incus start has a flag

      --stateless             Ignore the instance state

and incus stop has a flag

      --stateful              Store the instance state

However, I can’t find any documentation for what those flags actually mean. The incus manual just duplicates the above flag definitions.

As far as I can tell from the source code, it looks like a stop --stateful will store some additional state (what kind of state? is it like CRIU?) and start will by default restore that state unless told not to:

                        if shutdownAction == "stateful-stop" {
                                // Attempt to restore state.
                                err = inst.Start(true)
                        } else {
                                // Normal startup.
                                err = inst.Start(false)
                        }

Therefore, this suggest to me that it’s like a suspend/resume operation, i.e. stop --stateful doesn’t shutdown the container but freezes its state. Is that correct?

Does it work for containers and VMs?

Thanks,

Brian.

That’s CRIU or QEMU runtime state, yeah.
The CRIU side of things almost never works so can be mostly ignored :slight_smile:

On the VM side of things, if your VM has migration.stateful=true set and you have a sufficiently large size.state set on the root device, you can then do incus stop --stateful foo which will not really stop the VM so much as do an on-disk hibernate of it.

Then incus start will restore it the way it was, unless you pass incus start --stateless in which case that state will be ignored and the VM will go through a normal full boot.

1 Like

Is it possible to reduce the requirement or introduce a new stateful config for stateful shudown(hibernate?)
Currently with migration.stateful set to true, then a lot features will become unusable because it’s designed to migration to server that may have different characteristic. However with ‘stateful shudown’ its performed on local machine,therefore do not have such concern.

I don’t see why a new config would be required; incus could allow incus stop --stateful even where migration.stateful is not set.

I suspect the reason this is not allowed is because of the possibility someone might try to restart a hibernated VM on a different cluster node with different CPU capabilities. IMO it would be reasonable to give an error in that case; the user can then either choose to restart the VM on a different node which is compatible, or to incus start --stateless to reboot from scratch.

Currently with migration.stateful set to true, then a lot features will become unusable

It would be good if those were documented. All I can find is this:

Enabling this option prevents the use of some features that are incompatible with it.

Stateful shutdown at the QEMU level is still a live migration, just one to a local file.
Devices that cannot have their state saved and which prevent live migration will also prevent stateful stop.

What kinds of device conflict with migration.stateful? (Until now, I thought it might be different CPUs with different feature flags)

From the top of my head, at least:

  • NVME
  • VirtioFS
  • GPU

I remove the limit with my own fork. The instance save and restore fine. The only situation it fails is that if I access the VFS once. I don’t use nvme and gpu. For VFS I can replace with samba share.
Possibly this is not a very pratical way but at least works for me.