Btrfs/Defrag

From Forza's ramblings
Image depicting male yellow-headed blackbirds sitting on a wooden fence
Male yellow-headed blackbirds sitting in a row

Defragment[edit | edit source]

The Copy-on-Write nature of Btrfs can lead to higher amounts of fragmentation in files compared to ext4. This is because for each new write, Btrfs have to write to a new location. Especially on spinning HDD's, this can lead to extra seeks and slowness.

btrfs filesystem defragment is used to defragment files, but it is also used to apply compression to them.

btrfs fi defrag --help[edit | edit source]

# btrfs filesystem defragment --help
usage: btrfs filesystem defragment [options] <file>|<dir> [<file>|<dir>...]

    Defragment a file or a directory

    -r                  defragment files recursively
    -c[zlib,lzo,zstd]   compress the file while defragmenting
    -f                  flush data to disk immediately after defragmenting
    -s start            defragment only from byte onward
    -l len              defragment only up to len bytes
    -t size             target extent size hint (default: 32M)
    -v                  deprecated, alias for global -v option
    
    Global options:
    -v|--verbose       increase output verbosity
    
    Warning: most Linux kernels will break up the ref-links of COW data
    (e.g., files copied with 'cp --reflink', snapshots) which may cause
    considerable increase of space usage. See btrfs-filesystem(8) for
    more information.

The target extent size is an important option. Btrfs tries to merge smaller extents into extents of this size. It is only an advisory number and Btrfs may not be able to reach this target, even if you run defragment multiple times. Free space fragmentation and other factors affects this. Reasonable values highly depends on your use-case, but could be in the range of a few hundred KiB up to maximum of 4GiB.

On files that are mostly read from, such as media files, it might be worth using 4GiB target extent. On files that are frequently written too, such as databases or virtual machine files, a smaller target extent size is probably better. This is because extents are immutable (cannot be changed). If you have a 1MiB file that consists of one extent, and then write another 512KiB to it, you end up with two extents; 1MiB+512KiB. Only when all the data from the first extent is re-written into a new extent, it gets freed up. You can see this in effect on the qBit_root.img VM image in the example further down. It is using 5.5GiB disk space, while only having 4.1GiB of actual data.

Defragmenting files[edit | edit source]

Compsize (also called btrfs-compsize) is a great tool to see how fragmented files are. The standard e2fsprogs tool filefrag is not always accurate on Btrfs.

Here I have a virtual machine disk image that is quite fragmented, so let's defragment it.

/media/vm/libvirt/images # compsize  qBit_root.img
Processed 1 file, 34524 regular extents (42507 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       99%      5.5G         5.5G         4.1G
none       100%      5.5G         5.5G         4.1G
zstd         3%       24K         768K         768K
/media/vm/libvirt/images # btrfs fi defrag -v -t4G qBit_root.img
WARNING: target extent size 4294967296 too big, trimmed to 4294967295
qBit_root.img
/media/vm/libvirt/images # compsize qBit_root.img
Processed 1 file, 616 regular extents (1447 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL      100%      4.3G         4.3G         4.1G       
none       100%      4.3G         4.3G         4.1G

We see how the file size change from 5.5GiB to 4.3GiB and that the amount of fragments was reduced from 34524 fragments (extents) to only 616.

Compressing files[edit | edit source]

It is possible to use lzo, zlib or zstd compression when defragmenting files. There is more information on how to enable compression at Btrfs/Compression. If the filesystem is mounted with -o compress or -o compress-force, defragmenting files will also re-compress them.

/media/vm/libvirt/images # btrfs fi defrag -v -czstd qBit_root.img
qBit_root.img
/media/vm/libvirt/images # compsize  qBit_root.img
Processed 1 file, 9460 regular extents (9471 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       81%      3.3G         4.1G         4.1G       
none       100%      3.0G         3.0G         3.0G       
zstd        29%      317M         1.0G         1.0G     

Here we saved about 1GiB disk space using compression. The amount of fragments increase because the maximum extent size with compression is 128KiB. This limitation exists because Btrfs has to decompress the whole extent in order to read a single block of data from it. Too large extents would mean excessive overhead.

Tip! -c uses default compression level 3, unless a different level is set using the compress or compress-force mount options

Autodefrag[edit | edit source]

Btrfs has a mount option that enables automatic defragmenting of files. When it is enabled, small random writes into files are detected and queued up for defragmentation using a background process. The benefit and performance impact of autodefrag highly dependends on the workload. One use case for the development of the autodefrag feature was virtual machine images, as they can become very fragmented.

You can enable autodefrag with mount -o autodefrag or as an option in /etc/fstab.

File: /etc/fstab
UUID=fe0a1142-51ab-4181-b635-adbf9f4ea6e6    /media/vm    btrfs    noatime,autodefrag,subvol=volume/vm    0 0

Autodefrag can also be enabled and disabled on-the-fly on existing mountpoints using mount -o remount,autodefrag /mnt/btrfs or mount -o remount,noautodefrag /mnt/btrfs

Defragmenting the subvolume and extent trees[edit | edit source]

A lesser known feature of btrfs filesystem defrag is that can also defragment the subvolume tree as well as the extent trees. These hold the metadata that references where extents are stored on disk. Defrag groups these metadata blocks together, which speeds up access on subvolumes and directories with lots of files or extents. This can be especially beneficial on spinning HDD's as it reduces the amount and distance of seeks.

To defragment the subvolume the metadata trees you simply point btrfs to the subvolume root.

# btrfs fi defrag -v /media/vm
WARNING: directory specified but recursive mode not requested: /media/vm
WARNING: a directory passed to the defrag ioctl will not process the files
recursively but will defragment the subvolume tree and the extent tree.
If this is not intended, please use option -r .
/media/vm 

You can recursively defrag all subvols using find. Read-only snapshots cannot be defragmented. Every subvolume has the inode number 256. So, we use -inum 256 together with -type d to limit the search to directories and pass the result to btrfs using -exec:

# find /mnt/btrfs/ -type d -inum 256 -exec btrfs fi defrag {} \;

Defragmenting free space[edit | edit source]

While btrfs filesystem defrag compacts files into larger continues extents, btrfs balance is used to compact free space into larger contious areas. This can improve write performance, especially for large writes on spinning harddrives.

For normal usage, you should only ever balance data chunks and not metadata chunks. Use the -dusage= option to limit how full, in percent, data chunks should be in ordered to be considered for compacting.

# btrfs balance start -dusage=20 /
Done, had to relocate 1 out of 104 chunks

Regular balancing is important to keep your filesystem healthy and avoiding ENOSPC (no disk space) errors caused by no free unallocated space. See the Btrfs/Balance page for more information about the Btrfs allocator how balancing works.

Image is depicting three hard disk drives stacked ontop of eachother
One 3.5" and two 2.5" Serial-ATA hard disk drives