btrfs corruption

I experienced my first file corruption using btrfs recently. This was using Kubuntu 24.04 and the 6.8.0-39-generic kernel.

First off, I realize what I’m about to describe is a bit stupid.

For local virtual machines, I mostly use the qcow2 image format. This is QEMU’s copy-on-write file format. Btrfs is also a copy-on-write file system. This is where the stupid part comes in. Why do copy-on-write on top of copy-on-write? ¯_(ツ)_/¯ Laziness? Habit? Ignorance?

I often use the -snapshot option in QEMU to make virtual machines immutable. Every so often, I will update the VM, fill the drive with zeros, then use qemu-img to shrink the QCOW2.

The last time I did this, qemu-img failed with an error. Unfortunately, I didn’t capture the exact message. But it was something helpful like “error while writing at byte 1234”. I was also unable to reproduce this on another machine at the time of this writing.

The kernel log showed messages like:

[ 1234.123456] BTRFS warning (device dm-0): checksum error at logical 123456789012 on

I used btrfs scrub start / to scrub the file system, which eventually came back and told me that my qcow2 was corrupt and a SQLITE file database used by Firefox.

I deleted the two files, copied the qcow2 from a backup, re-scrubbed the file system, and also ran a SMART test on the drive to be safe (smartctl -t short /dev/nvme0n1). With everything clean and happy, I restarted my VM and re-ran my updates.

When I tried to shrink the image, qemu-img gave me the same error.

After searching online for a bit, this article seemed to point me somewhere helpful:

https://unix.stackexchange.com/questions/394973/why-would-i-want-to-disable-copy-on-write-while-creating-qemu-images

I ended up taking the chattr approach, which worked for me.

I turned off copy-on-write, recursively, for my directory of QCOW2 images:

# chattr -R +C /home/user/VMs/

You can use lsattr to verify “C” was enabled.

Since then, no more corruption issues with the qcow2 file.

Circling back to the stupid part…

  • Yes, I know that I could use snapshots in btrfs against a raw image to achieve the same effect.
  • Yes, I know I could also disable copy-on-write in the qcow2 file itself (rather than the whole folder).
  • Yes, I know that there’s probably a (significant) performance hit to using qcow2 on top of btrfs.
  • Yes, I know btrfs has other/better ways to handle virtual machine volumes.
  • Do I care? Not especially.

QCOW2s allow for portability between systems. They’re also plain files, so they backup and move around without any shenanigans.

So, don’t COW your COW. Or if you do, and you run into errors, remove one COW from the equation.