Btrfs/Checksum Algorithms

From Forza's ramblings
Underside of a CPU with its pins showing.

Btrfs computes checksums for all pieces of data and metadata that is stored on disk. This allows Btrfs to detect any corruptions that may happen. If DUP or RAID profiles are used then Btrfs will automatically repair the error.

The default is crc32c which is very fast. One problem with small checksums (digest size) is that the likelihood of a collision is high. A collision means that the two different sets of data computes to the same checksum. Btrfs offers several more modern checksums that provide stronger protection against collisions or malintent alterations.

Algorithm Digest size Description
CRC32C 32bit fast, weak, default, CPU support, backward compatible
XXHASH 64bit fast, weak, good on modern CPUs
SHA256 256bit slow, strong, FIPS certified, CPU support
BLAKE2B 256bit slow, strong, good on 64bit architectures
Only crc32c is supported before Linux kernel 5.5

mkfs.btrfs[edit | edit source]

You have to decide what checksum to use when you create the filesystem.

mkfs.btrfs --csum xxhash

Se the mkfs.btrfs man-page for detailed usage.

SMHasher[edit | edit source]

Generally, stronger algorithms are slower than weak ones. You can use SMHasher to benchmark your computer. For example on AMD Ryzens, xxhash64 is faster than crc32c.

Example on an AMD Athlon 3000G (2 cores, 4 threads, 3GHz)

# ./SMHasher --test=Speed <hash>
xxhash64:     11397 MiB/sec
crc32_pclmul:  6661 MiB/sec

Note that the difference is generally smaller due to how checksums integrates with the filesystem. You should benchmark your specific workload to determine the impact. Some CPUs and systems have hardware acceleration support.