Btrfs csum warnings when using virtual machines

Now I m not sure on this one and this definitely needs more testing. And barely lxd related but it is easy to test…

On my side it seems to correlate with usage of lxd-vm on btrfs backed and 5.4 kernel. I have lxd-vm running windows10 which runs mssql(does it even matter?). And now I noticed that I got several(a day) btrfs csum warnings, I checked ssds at least smart doesnt report anything and before(vm) I didint have such errors. After moving vm away to other host linux 5.3 errors stopped and there is no errors on that machine either.

This might be kernel specific issue or probably hardware, I m using Ubuntu, now this server has latest kernel, others have a bit older 5.3 and now I kinda dont want to upgrade them :slight_smile:
Linux universe-linux 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

This is easy to test if someone has same kernel and buntu18 and a write intensive vm just do dmesg | grep -i btrfs and look for funny stuff. Interestingly btrfs scrub doesnt show anything.

I tried googling closest thing I found is this.

Direct IO and CRCs

Direct IO writes to Btrfs files can result in checksum warnings. This can happen with other filesystems, but most don’t have checksums, so a mismatch between (updated) data and (out-of-date) checksum cannot arise.

This is the issue described in this email: “where the application will modify the page while it’s inflight” (see also this article on stable writes). This results in checksum verification messages that are warnings instead of errors, as in for example:

BTRFS warning (device dm-1): csum failed ino 252784 off 62910578688 csum 802263983 expected csum 4110970844

Details of affected versions TBD.

Okay and now I get errors on a new machine in which VM currently is. And on the old one there is no new errors. Old one runs 5.3 kernel so its not 5.4 kernel problem at least.