Btrfs/Mount Options

From Forza's ramblings

Mounting a Btrfs filesystem[edit | edit source]

Picture of a radio telescope ontop of a hill with cloudy sky.
The 26m Radio Telescope at Mount Pleasant Radio Observatory, Tasmania, Australia

Btrfs has several mount options that controls how the filesystem behaves. Some are used in recovery situations and some are used for performance or quality tradeoffs.

Except for the subvol and subvolid, all mount options affect the whole filesystem, and not the individual subvolume mounts. This means, for example, that specifying compress=zstd on the first mount, all subsequent mounts, including other subvolumes, will inherit the compression option.

Generic Linux VFS mount options like atime/noatime, auto/noauto, dev/nodev, and others, do apply per mount point. See the chapter about filesystem independent mount options in the mount man page.

The basic mount command is mount -o <options> <device> <path>

# mount -o compress=zstd,subvol=@homes /dev/sdc1 /home

Btrfs mount options[edit | edit source]

The full documentation of the various btrfs mount options are available in the official Btrfs administration docs.

Btrfs Mount Options
Option Description Default Value
acl, noacl Enable/disable POSIX Access Control Lists (ACLs) on
autodefrag, noautodefrag Enable automatic file defragmentation. Small random writes into files are detected and queued up for defragmentation. off
atime, strictatime, relatime, noatime noatime is useful to use use by default because atime updates increases metadata writes. Atimes are especially costly performance-wise when you have many snapshots. relatime
barrier, nobarrier Ensure IO write operations are stored permanently on
commit=<seconds> Set interval of periodic transaction commit 30
compress, compress=<type[:level]>, compress-force, compress-force=<type[:level]> Control file data compression? Available algorithms are zlib, lzo and zstd. off
datacow, nodatacow Enable/disable data copy-on-write for newly created files. nodatacow also implies nodatasum. on
datasum, nodatasum Enable/disable data checksumming for newly created files on
device=<devicepath> Specify path to a device scanned for Btrfs filesystem -
discard, discard=sync, discard=async, nodiscard Enable discarding of freed file blocks async (since 6.2)
flushoncommit, noflushoncommit Force data dirtied by write in prior transaction to commit off
max_inline=<bytes> Specify maximum space that can be inlined in metadata b-tree leaf min(2048, page size)
metadata_ratio=<value> Specify metadata chunk allocation frequency 0 (internal logic)
skip_balance Skip automatic resume of an interrupted balance operation off
space_cache, space_cache=<version>, nospace_cache Options to control the free space cache Default depends on mkfs time, space_cache=v2 since btrfs-progs 5.15.
ssd, ssd_spread, nossd, nossd_spread Options to control SSD allocation schemes SSD autodetected
subvol=<path> Mount subvolume from path rather than the toplevel subvolume '/'
subvolid=<subvolid> Mount subvolume specified by subvolid number 5
NOTE: If both subvolid and subvol are specified, they must point at the same subvolume, otherwise the mount will fail.
thread_pool=<number> Number of worker threads to start Number of CPU threads +2, but not more than 8
user_subvol_rm_allowed Allow subvolumes to be deleted by their respective owner off
Recovery Btrfs Mount Options
Option Description Default Value
clear_cache Force clearing and rebuilding of free space cache -
degraded Is used to mount a filesystem that has a missing device, within RAID profile constraints off
nologreplay Disable tree log replay on mount off
norecovery Skip data recovery at mount time off
rescan_uuid_tree Force check and rebuild procedure of the UUID tree off
rescue=<option,option,...> Modes allowing mount with damaged filesystem structures
  • usebackuproot (since: 5.9, replaces standalone option usebackuproot)
  • nologreplay (since: 5.9, replaces standalone option nologreplay)
  • ignorebadroots, ibadroots (since: 5.11)
  • ignoredatacsums, idatacsums (since: 5.11)
  • all (since: 5.9)
treelog, notreelog Enable tree logging used for fsync and O_SYNC writes on
usebackuproot Enable autorecovery attempts if a bad tree root is found off
WARNING! These are advanced options and can cause further data loss unless used carefully. Ask for help before attempting recovery.
Debugging and Developer Btrfs Mount Options
Option Description Default Value
enospc_debug, noenospc_debug Enable verbose output for some ENOSPC conditions off
check_int, check_int_data, check_int_print_mask=<value> Debugging options for integrity checking off
fatal_errors=<action> Action to take on encountering a fatal error bug
fragment=<type> Debugging helper to intentionally fragment block groups off
WARNING! These are developer options that should not be used on regular filesystems.
Deprecated Btrfs Mount Options
Option Description Default Value
recovery Enable data recovery at mount time off (deprecated since kernel 4.5)
Removed Btrfs Mount Options
Option Description Default Value
inode_cache, noinode_cache Enable/disable inode cache off (removed since kernel 5.11)

fstab[edit | edit source]

/etc/fstab is the standard configuration file for mounting filesystems on Linux. It is used to define what filesystems get mounted where, and also any mount options that are needed. One exception is perhaps portable USB drives as many desktop environments handle those automatically.

The format for fstab is one line per mount point. Each line consists of the following 6 sections, in order:

# device-spec     mount-point     fs-type     options     dump pass
Column Description
device-spec Specifies the device to be mounted. Can be </dev/path>, UUID=<fs uuid>, or LABEL=<fs label>
mount-point The directory where the device will be mounted.
fs-type Specifies the file system type, for example btrfs, xfs or ext4.
options Defines mount options. If no specific options are needed, 'default' can be used as placeholder.
dump Used by the dump program. Not used by Btrfs and should be 0.
pass Specifies the order in which filesystem checks are performed during boot. Use 0 with Btrfs (see Btrfs/scrub).

0 = do not check
1 = check immediately during boot
2 = check after boot

fstab example[edit | edit source]

This is fstab on one of my machines. The disk is an SSD so I use compress-force=zstd:2 as this reduces the reduce amount of data written to the disk, increasing its life span.

Notice the use of the subvol mount option and how it is used to mount different subvolumes.

/etc/fstab
# root filesystem
UUID=446d32cb-a6da-45f0-9246-1483ad3420e0   /               btrfs   compress-force=zstd:2,noatime,subvol=volume/root            0 0

# /home subvol
UUID=446d32cb-a6da-45f0-9246-1483ad3420e0   /home           btrfs   compress-force=zstd:2,noatime,subvol=volume/home            0 0

# A subvol for /var/tmp that is not being backed up.
UUID=446d32cb-a6da-45f0-9246-1483ad3420e0   /var/tmp        btrfs   compress-force=zstd:2,noatime,noexec,subvol=volume/var_tmp  0 0

# Btrfs filesystem toplevel
UUID=446d32cb-a6da-45f0-9246-1483ad3420e0   /mnt/rootvol/   btrfs   compress-force=zstd:2,noatime,subvolid=5                    0 0

# Use a ramdisk for temp files
tmpfs                                       /tmp            tmpfs   rw,nosuid,noexec,nodev,size=4G,mode=1777                    0 0

/mnt/rootvol is where the filesystem top level volume mounted. From here it is possible to access all snapshots and subvolumes directly, as shown in the tree list below.

The structure I use is a flat structure (all subvols are in one directory) that separates subvolumes and snapshots. It makes it easy to manage with backup software such as btrbk.

# tree -d -L 2 /mnt/rootvol/
/mnt/rootvol/
├── snapshots
│   ├── home.20240101T1801
│   ├── home.20240101T1901
...
│   ├── root.20240101T1801
│   ├── root.20240101T1901
│   ├── www.20240101T1801
│   └── www.20240101T1901
└── volume
    ├── boinc
    ├── home
    ├── home_root
    ├── jabber
    ├── mail
    ├── mysql
    ├── preview
    ├── redis
    ├── repos
    ├── root
    ├── src
    ├── unifi
    ├── var_cache
    ├── var_db
    ├── var_log
    ├── var_tmp
    └── www

Many distributions prefix subvolume names with @ to make it is easier to distinguish them from normal directories. The symbol @ doesn't have any inherent meaning in Btrfs and Btrfs does not impose any specific restrictions on a naming scheme, which makes it easy to adapt to your own specific needs.

This is is the fstab of a Debian machine running inside a QEMU virtual machine.

/etc/fstab
UUID=32234b01-c599-4eaf-a6b2-fafd35034062       /               btrfs   noatime,discard=async,subvol=@rootfs 0 0
UUID=32234b01-c599-4eaf-a6b2-fafd35034062       /home           btrfs   noatime,discard=async,subvol=@home   0 0
UUID=32234b01-c599-4eaf-a6b2-fafd35034062       /mnt/rootvol    btrfs   noatime,discard=async                0 0
LABEL=swap          none                        swap            sw,discard                                   0 0
apt_cache           /var/cache/apt/archives     virtiofs        noatime,noexec,nodev,nosuid                  0 0

Take a moment to read up on flat vs nested subvolume layouts over at Btrfs/Getting_Started#Subvolumes

Special Considerations[edit | edit source]

Multi device filesystems[edit | edit source]

Btrfs automatically assembles all the required devices with the same FS UUID so it is not necessary to specify each device during mount or in fstab. You can use any of the devices as mount device.

Create a RAID mirrored filesystem on sdb1 and sdb2
mkfs.btrfs -L my-volume -d raid1 /dev/sdb1 /dev/sdc1
Mount it using either sdb1 or sdc1
mount /dev/sdc1 /mnt/btrfs
# or
mount /dev/sdb1 /mnt/btrfs

There are some corner cases. For example if the initramfs does not scan for devices before attempting to mount root filesystem. In such cases it may help to use the device mount option on the kernel command line to force Btrfs to consider them.

Cloned filesystems - duplicate UUID[edit | edit source]

Btrfs considers all devices with the same FS UUID as part of the same filesystem. It maintains an internal list of member device's UUIDs. While this is an integral part of Btrfs multi-device support, it needs special consideration if you want to clone a device using tools like dd or ddrescue.

When cloning a device, identical device UUIDs will confuse Btrfs. If the filesystem is already mounted, Btrfs will issue a warning in the kernel log (dmesg) when duplicate UUIDs are detected. If you try to mount the filesystem, the cloned device might get used instead of the original one - leading to data corruption.

It is better to use btrfs device replace, which will handle UUIDs correctly.

Read only mounts[edit | edit source]

The read only ro mount option is a VFS mount option. This means it only prevents writes and changes from user space through that specific mount point. The Btrfs filesystem will still make changes such as logreplay and delayed transactions during mount.

Since Btrfs allows for multiple mount points, another mount point - even using the same subvolume, can be mounted as read-write.

Access time updates - atime[edit | edit source]

Access times, or atime, is a timestamp of when a file was last accessed. On ext2/3/4, this is a relatively cheap operation, but with Btrfs it is very expensive because of the copy-on-write CoW nature. When atime is updated, Btrfs has to make a new metadata extent to store the new timestamp. This problem is compounded if there are many snapshots, as all references have to be updated as well.

Until relatively recently, Linux updated this timestamp on each and every access. Now it uses relatime, which updates atimes based on last modification time. This will still have a negative impact when using snapshots.

Most people do not need access time updates, and would benefit from the extra performance of using the noatime mount option.

flushoncommit, commit[edit | edit source]

In a copy-on-write (COW) filesystem like Btrfs, when a file is modified, the changes are not written directly to the existing data blocks. Instead, a new set of blocks (extents) is created to store the modified data, leaving the original data intact. This process ensures that the original data remains unchanged until the new data is successfully written. Only after the new data is written, the file's metadata is updated to point to the new blocks.

Now, let's consider the scenario where a partial write occurs, such as an application saving modifications to a document, and a crash happens during this process. If a crash occurs before the new data blocks are completely written, the filesystem remains in a consistent state. This is because the original data blocks are still intact, and the file metadata still points to them. The partially written new data blocks are not yet linked to the file.

Many operations in Btrfs are atomic transactions, such as when the metadata is updated to point to the new data or when a snapshot is made. Depending on the load on the filesystem, a transaction can last quite a long time. If a crash happens, Btrfs will roll back to the previous commit.

The flushoncommit option forces data dirtied by a write in a prior transaction to commit as part of the current commit. This ensures that writes don't span multiple commit periods (see the commit mount option, reducing the amount of data lost during a rollback after a crash. In very extreme cases, such transactions can otherwise span minutes or hours. flushoncommit can negatively affect performance, but it reduces the time writes remain uncommitted.

commit=<seconds> option sets the interval of periodic transaction commit when data is synchronized to permanent storage. The value is specified in seconds and determines how frequently Btrfs commits changes to disk. Higher interval values lead to a larger amount of unwritten data, which could have consequences in the event of a system crash, as uncommitted data will be lost after a crash.

Note that this is the periodic commit interval. Btrfs can still make commits more frequently, as needed.