Btrfs/Balance

From Forza's ramblings
A Yellow Bittern, trying to find the most comfortable position for its morning food hunt.

Balance[edit | edit source]

Btrfs Balance is used as normal regular maintenance tool as well as a tool to manage the filesystem. It is important to learn about why and when to use Balance in order to keep the filesystem healthy.

Btrfs uses a two-stage allocator. The first stage allocates large regions of space known as chunks for specific types of data, then the second stage allocates blocks like a regular (old-fashioned) filesystem within these chunks. Btrfs combines chunks into three types of block groups:

Type Description
DATA Stores normal user file data
METADATA Stores internal metadata. Small files can also stored inline
SYSTEM Stores mapping between physical devices and the logical space representing the filesystem
UNALLOCATED Any unallocated space
Only the type of data that the chunk is allocated for can be stored in that block group.

With some usage patterns, the ratio between the various chunks becomes skewed. This in turn can lead to ENOSPC (No free disk space) errors if left unchecked. Btrfs balance is a tool to re-arrange the layout of chunks and free up unallocated disk space.

How to see actual disk usage (don't trust 'df')[edit | edit source]

In most cases you the normal df tool is used to see available disk space of a filesystem:

# df -h /
 Filesystem  Size  Used  Avail  Use% Mounted on
 /dev/sdb1   32G   2.2G  29G    8%   /

This is fine for most filesystems, but not for Btrfs because of its two-stage allocator. In order to see how the the space is actually used you need to use btrfs filesystem usage which shows how each type of block group is allocated.

# btrfs fi us /
Overall:
    Device size:                  32.00GiB
    Device allocated:              4.52GiB
    Device unallocated:           27.48GiB
    Device missing:                  0.00B
    Used:                          2.17GiB
    Free (estimated):             28.08GiB      (min: 14.34GiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:               16.03MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:2.01GiB, Used:1.41GiB (70.04%)
   /dev/sdb1       2.01GiB

Metadata,DUP: Size:1.25GiB, Used:392.84MiB (30.69%)
   /dev/sdb1       2.50GiB

System,DUP: Size:8.00MiB, Used:16.00KiB (0.20%)
   /dev/sdb1      16.00MiB

Unallocated:
   /dev/sdb1      27.48GiB

As you can see we have 27GiB unallocated space while df shows 29GiB. It is because the calculation here is DATA size (2GiB) + Unallocated size (27.48GiB) is ~29GiB. This does not take into account that we will most likely need further Metadata chunks as the filesystem fills up.

How much metadata that is needed varies greatly depending on how you use the filesystem. Lots of snapshots, fragmentation and small files uses more metadata space than a single large file.

Normally Btrfs manages the usage in Data and Metadata chunks wihout the need for user intervention. But sometimes you can still end up with too little unallocated space so that Btrfs cannot allocate more Metadata chunks. This would force your filesystem into read-only mode due to ENOSPC error. See https://wiki.tnonline.net/w/Btrfs/ENOSPC. To avoid this you can do regular 'btrfs balance' to compact the allocated block groups and free up unallocated space.

It is a good way to monitor your disk usage using] btrfs filesystem usage (or short form: btrfs fi us) and run balance as needed.

WARNING[edit | edit source]

WARNING: Do not run balance on Metadata chunks as this can increase the risk for ENOSPC errors. Only run Metadata balance when converting between RAID profiles or when changing the number of devices in the filesystem.

It is is good to have plenty of free space inside Metadata chunks. The filesystem uses the metatdata space in its normal operations. Without free metadata space, the filesystem can end up in ENOSPC and prevent you from even delete files.

If you have no unallocated space available when the filesystem needs to allocate more metadata chunks, the filesystem will turn read-only and will require manual intervention to solve.

Btrfs balance[edit | edit source]

Usage[edit | edit source]

# btrfs balance start --help
usage: btrfs balance start [options] <path>

    Balance chunks across the devices

    Balance and/or convert (change allocation profile of) chunks that
    passed all filters in a comma-separated list of filters for a
    particular chunk type.  If filter list is not given balance all
    chunks of that type.  In case none of the -d, -m or -s options is
    given balance all chunks in a filesystem. This is potentially
    long operation and the user is warned before this start, with
    a delay to stop it.

    -d[filters]    act on data chunks
    -m[filters]    act on metadata chunks
    -s[filters]    act on system chunks (only under -f)
    -f             force a reduction of metadata integrity
    --full-balance do not print warning and do not delay start
    --background|--bg
                   run the balance as a background process
    --enqueue      wait if there's another exclusive operation running,
                   otherwise continue
    -v|--verbose   deprecated, alias for global -v option

    Global options:
    -v|--verbose       increase output verbosity
    -q|--quiet         print only errors

Full man page of btrfs-balance is available at https://btrfs.readthedocs.io/en/latest/btrfs-balance.html

Running Balance[edit | edit source]

Running btrfs balance start without any filters, would re-write every Data and Metadata chunk on the disk. Usually, this is not what we want. Instead use the usage filter to limit what blocks should be balanced.

Using -dusage=5 we limit balance to compact data blocks that are less than 5% full. This is a good start, and we can increase it to 10-15% or more if needed. A small (less than 100GiB) filesystem may need a higher number.

# btrfs balance start -dusage=5 /
Done, had to relocate 1 out of 68 chunks

Before balance:

# btrfs fi us -T /
Overall:
    Device size:                 229.47GiB
    Device allocated:             74.06GiB
    Device unallocated:          155.41GiB
    Device missing:                  0.00B
    Used:                         57.10GiB
    Free (estimated):            162.65GiB      (min: 84.94GiB)
    Free (statfs, df):           162.65GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              233.92MiB      (used: 0.00B)
    Multiple profiles:                  no
 
             Data     Metadata System
Id Path      single   DUP      DUP      Unallocated
-- --------- -------- -------- -------- -----------
 1 /dev/sda3 60.00GiB 14.00GiB 64.00MiB   159.41GiB
-- --------- -------- -------- -------- -----------
   Total     60.00GiB  7.00GiB 32.00MiB   159.41GiB
   Used      52.76GiB  2.17GiB 16.00KiB

After balance:

# btrfs fi us -T /
Overall:
    Device size:                 229.47GiB
    Device allocated:             73.06GiB
    Device unallocated:          156.41GiB
    Device missing:                  0.00B
    Used:                         57.01GiB
    Free (estimated):            162.72GiB      (min: 84.52GiB)
    Free (statfs, df):           162.72GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              233.92MiB      (used: 0.00B)
    Multiple profiles:                  no
 
             Data     Metadata System
Id Path      single   DUP      DUP      Unallocated
-- --------- -------- -------- -------- -----------
 1 /dev/sda3 59.00GiB 14.00GiB 64.00MiB   160.41GiB
-- --------- -------- -------- -------- -----------
   Total     59.00GiB  7.00GiB 32.00MiB   160.41GiB
   Used      52.68GiB  2.16GiB 16.00KiB

We can see we freed up 1GiB of Unallocated disk space by compacting the Data chunks. We now have 59 Data chunks to hold 52.68GiB of data. Before we needed 60 Data chunks.

Scheduling Balance[edit | edit source]

It may be a good idea to schedule a balance job once a week. You can use cron (as the example below) or systemd timers to do the same.

Example crontab that runs balance 3am every Sunday:

/etc/cron.d/btrfs-balance
# For details see man 5 crontab
# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  | 
# *  *  *  *  * user-name  command to be executed
  0  3  *  *  6 root       btrfs balance start -dusage=5 /mnt/some/mountpoint >/dev/null 2>&1