btrfs corruption
I experienced my first file corruption using btrfs recently. This was using Kubuntu 24.04 and the 6.8.0-39-generic kernel.
First off, I realize what I’m about to describe is a bit stupid.
For local virtual machines, I mostly use the qcow2 image format. This is QEMU’s copy-on-write file format. Btrfs is also a copy-on-write file system. This is where the stupid part comes in. Why do copy-on-write on top of copy-on-write? ¯_(ツ)_/¯ Laziness? Habit? Ignorance?
I often use the -snapshot
option in QEMU to make virtual machines
immutable. Every so often, I will update the VM, fill the drive with
zeros, then use qemu-img
to shrink the QCOW2.
The last time I did this, qemu-img
failed with an
error. Unfortunately, I didn’t capture the exact message. But it was
something helpful like “error while writing at byte 1234”. I was also
unable to reproduce this on another machine at the time of this
writing.
The kernel log showed messages like:
[ 1234.123456] BTRFS warning (device dm-0): checksum error at logical 123456789012 on
I used btrfs scrub start /
to scrub the file system, which
eventually came back and told me that my qcow2 was corrupt and a
SQLITE file database used by Firefox.
I deleted the two files, copied the qcow2 from a backup, re-scrubbed
the file system, and also ran a SMART test on the drive to be safe
(smartctl -t short /dev/nvme0n1
). With everything clean and happy, I
restarted my VM and re-ran my updates.
When I tried to shrink the image, qemu-img
gave me the same error.
After searching online for a bit, this article seemed to point me somewhere helpful:
I ended up taking the chattr
approach, which worked for me.
I turned off copy-on-write, recursively, for my directory of QCOW2 images:
# chattr -R +C /home/user/VMs/
You can use lsattr
to verify “C” was enabled.
Since then, no more corruption issues with the qcow2 file.
Circling back to the stupid part…
- Yes, I know that I could use snapshots in btrfs against a raw image to achieve the same effect.
- Yes, I know I could also disable copy-on-write in the qcow2 file itself (rather than the whole folder).
- Yes, I know that there’s probably a (significant) performance hit to using qcow2 on top of btrfs.
- Yes, I know btrfs has other/better ways to handle virtual machine volumes.
- Do I care? Not especially.
QCOW2s allow for portability between systems. They’re also plain files, so they backup and move around without any shenanigans.
So, don’t COW your COW. Or if you do, and you run into errors, remove one COW from the equation.