Dealing with time drift after VM resume

Hello there !

I’m starting playing with VM pausing or/and suspension in my homelab to somehow stop my instances during the night on my server. I’m planning to use mostly the suspend feature (a.k.a stateful stop), but I’m facing an issue with time drift after VM resume :

root@ck8s01:~# timedatectl
               Local time: Mon 2023-07-03 21:02:51 UTC
           Universal time: Mon 2023-07-03 21:02:51 UTC
                 RTC time: Mon 2023-07-03 21:03:42

Is there any clean solution to deal with this issue with LXD VM ? I mean I would like to avoid having a cron to trigger an hwclock to synchronize local time with RTC time if possible. Moreover, I’m using Chrony in my VM instances, I don’t know if it’s related (a misconfiguration ?). Here is my Chrony configuration :

pool 0.fr.pool.ntp.org iburst
pool 1.fr.pool.ntp.org iburst
pool 2.fr.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
keyfile /etc/chrony.keys
logdir /var/log/chrony

I’m using Ubuntu 22.04.2 with 5.15 virtual kernel.

Thanks !

I dig a bit further and found that I should probably need to tweak my chrony configuration:
Is chronyd allowed to step the system clock?
Bug 1780165 - VM clocks do not resync quickly after being suspended

By default, chronyd adjusts the clock gradually by slowing it down or speeding it up. If the clock is too far from the true time, it will take a long time to correct the error. The System time value printed by the chronyc's tracking command is the remaining correction that needs to be applied to the system clock.

The makestep directive can be used to allow chronyd to step the clock. For example, if chrony.conf had

makestep 1 3

the clock would be stepped in the first three updates if its offset was larger than one second. Normally, it is recommended to allow the step only in the first few updates, but in some cases (e.g. a computer without an RTC or virtual machine which can be suspended and resumed with an incorrect time) it might be necessary to allow the step on any clock update. The example above would change to

makestep 1 -1

I will update my configurations this evening and check if it helps to solve the issue (but be careful with applications like Kubernetes when doing direct clock change like this) :slight_smile:

FYI, running a NTP daemon in a VM isn’t strictly needed if your host’s time is good. By default, VMs will use the kvm-clock as their clocksource:

root@v1:~# dmesg | grep kvm-clock
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr e401001, primary cpu clock
[    0.000002] kvm-clock: using sched offset of 5160329154 cycles
[    0.000007] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.393160] clocksource: Switched to clocksource kvm-clock

For more info on that topic:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/chap-kvm_guest_timing_management

Yep, I’m aware of kvm-clock, but I heard multiple stories of issues with clock sync with virtual machines, especially those which are mostly idle on multiple hypervisors (due to how kvm-clock does his sync). I used chrony for years like this and there was no issues, but didn’t play much with migration or stateful stop until now.

I checked quickly my server, and the clock drift was solved in the end, but around 10 minutes after.

I continue to play a bit with chrony and his settings, but the clock still take a bit of time to be synchronized (more or less 5 minutes). I even removed chrony and checked what is going on with only the kvm-clock source, and the VM doesn’t sync at all the local time (only the RTC time is sync) :

root@ck8s01:~# timedatectl
               Local time: Tue 2023-07-04 18:25:01 UTC
           Universal time: Tue 2023-07-04 18:25:01 UTC
                 RTC time: Tue 2023-07-04 18:27:42
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: no
              NTP service: n/a
          RTC in local TZ: no

So in the end, I think it’s a common problem and not so much can be done at this level, I think I will just shutdown the VM to avoid any issues.