Btrfs data integrity
Perhaps the biggest advantage of Btrfs is it's ability to detect data corruption. It does this by verifying a checksum hash each time a data or metadata block is read from disk. If the checksum does not match, the filesystem will check if there is a good copy and repair the block. If it cannot repair the data block and error will be logged and the user will be prevented from reading the file.
Storage media can degrade over time, eventually leading to something called bitrot. This means that the data read from disk is damaged. On traditional filesystems like ext4, NTFS and Fat32, this kind of damage would likely go unnoticed. Eventually the damage might affect an applications' stability or it causes visible corruption in image files.
Bitrot is the reason I switched from ext4/LVM to Btrfs, having found damage in several images in my photography archive.
It is worth noting that most HDD, SSD and NVMe based storage do have internal checksums, but due to buggy firmware implementations, corruptions in-transit, dropped caches or due to filesystem bugs, bitrot is a very real problem even on modern hardware.
While Btrfs does check against corruptions on every read of the data, any corruptions in rarely read data would be left unnoticed for a long time.
btrfs scrub reads all filesystem data and metadata and verifies the checksums. If the the data or metadata is in a redundant profile such as DUP or RAID1, all copies are verified and any errors would be repaired using a healthy copy. Any uncorrectable errors would be logged in the kernel logs.
It is recommended to run scrubs regularly so that problems are detected as early as possible. The btrfs-maintenance scripts are available in most popular distributions. It has systemd timers to run scrub on a regular basis. Depending on your backup schemes, running scrubs once a month should suffice.
Scrub is not a filesystem checker (fsck) and does not verify or repair structural damage in the filesystem. It only checks checksums of data and metadata blocks, but it does not ensure that the filesystem tress are consistent. There is some validation done by the kernel when reading data, but it is not extensive and does not replace a full
Running Btrfs scrub
Scrub should be run on a mounted filesystem. It runs in the background and uses idle I/O priority class so it should not severely affect normal usage.
Start scrub with
btrfs scrub start <mountpoint>.
# btrfs scrub start /media/userData/
scrub started on /media/userData/, fsid fe0a1142-51ab-4181-b635-adbf9f4ea6e6 (pid=38087)
The progress of scrub can be view with
btrfs scrub status <mountpoint>.
# btrfs scrub status /media/userData/
UUID: fe0a1142-51ab-4181-b635-adbf9f4ea6e6 Scrub started: Sat Feb 19 10:48:15 2022 Status: running Duration: 0:04:05 Time left: 15:48:21 ETA: Sun Feb 20 02:40:43 2022 Total to scrub: 17.48TiB Bytes scrubbed: 76.73GiB (0.43%) Rate: 320.69MiB/s Error summary: no errors found
Cancelling a running scrub is done with
btrfs scrub cancel <mountpoint>.
# btrfs scrub cancel /media/userData/
# btrfs scrub status /media/userData/
UUID: fe0a1142-51ab-4181-b635-adbf9f4ea6e6 Scrub started: Sat Feb 19 10:48:15 2022 Status: aborted Duration: 0:06:07 Total to scrub: 17.48TiB Rate: 321.81MiB/s Error summary: no errors found
btrfs scrub manual with detailed options is available at https://btrfs.readthedocs.io/en/latest/btrfs-scrub.html
The striped redundancy profiles RAID5 and RAID6 presents a challenge. Normally
btrfs scrub checks each device in parallel, but with striped profiles, scrub has to read from all members of the stripe to verify each data block. This means that several concurrent reads on each device would happen with normal scrub, which is really bad for performance.
Instead, a better solution is to run scrub separately on each disk.
1) First identify which disks belong to the filesystem
# btrfs fi sh /media/userData/
Label: '6TB' uuid: fe0a1142-51ab-4181-b635-adbf9f4ea6e6 Total devices 2 FS bytes used 17.45TiB devid 3 size 9.09TiB used 8.74TiB path /dev/sdb2 devid 4 size 9.09TiB used 8.74TiB path /dev/sdd2
2) Then start scrub on only one of the devices using
btrfs scrub start <device>.
# btrfs scrub start /dev/sdb2
scrub started on /dev/sdb2, fsid fe0a1142-51ab-4181-b635-adbf9f4ea6e6 (pid=42675)
3) Monitor status and using
btrfs scrub status <mountpoint> and when scrub is finished, start a new scrub on the second device.