Btrfs/Mount Options
Mounting a Btrfs filesystem[edit | edit source]
Btrfs has several mount options that controls how the filesystem behaves. Some are used in recovery situations and some are used for performance or quality tradeoffs.
Except for the subvol
and subvolid
, all mount options affect the whole filesystem, and not the individual subvolume mounts. This means, for example, that specifying compress=zstd
on the first mount, all subsequent mounts, including other subvolumes, will inherit the compression option.
Generic Linux VFS mount options like atime/noatime
, auto/noauto
, dev/nodev
, and others, do apply per mount point. See the chapter about filesystem independent mount options in the mount man page.
The basic mount command is mount -o <options> <device> <path>
# mount -o compress=zstd,subvol=@homes /dev/sdc1 /home
Btrfs mount options[edit | edit source]
The full documentation of the various btrfs mount options are available in the official Btrfs administration docs.
Option | Description | Default Value |
---|---|---|
acl, noacl | Enable/disable POSIX Access Control Lists (ACLs) | on |
autodefrag, noautodefrag | Enable automatic file defragmentation. Small random writes into files are detected and queued up for defragmentation. | off |
atime, strictatime, relatime, noatime | noatime is useful to use use by default because atime updates increases metadata writes. Atimes are especially costly performance-wise when you have many snapshots. | relatime |
barrier, nobarrier | Ensure IO write operations are stored permanently | on |
commit=<seconds> | Set interval of periodic transaction commit | 30 |
compress, compress=<type[:level]>, compress-force, compress-force=<type[:level]> | Control file data compression? Available algorithms are zlib, lzo and zstd. | off |
datacow, nodatacow | Enable/disable data copy-on-write for newly created files. nodatacow also implies nodatasum. | on |
datasum, nodatasum | Enable/disable data checksumming for newly created files | on |
device=<devicepath> | Specify path to a device scanned for Btrfs filesystem | - |
discard, discard=sync, discard=async, nodiscard | Enable discarding of freed file blocks | async (since 6.2) |
flushoncommit, noflushoncommit | Force data dirtied by write in prior transaction to commit | off |
max_inline=<bytes> | Specify maximum space that can be inlined in metadata b-tree leaf | min(2048, page size) |
metadata_ratio=<value> | Specify metadata chunk allocation frequency | 0 (internal logic) |
skip_balance | Skip automatic resume of an interrupted balance operation | off |
space_cache, space_cache=<version>, nospace_cache | Options to control the free space cache | Default depends on mkfs time, space_cache=v2 since btrfs-progs 5.15. |
ssd, ssd_spread, nossd, nossd_spread | Options to control SSD allocation schemes | SSD autodetected |
subvol=<path> | Mount subvolume from path rather than the toplevel subvolume | '/' |
subvolid=<subvolid> | Mount subvolume specified by subvolid number | 5 |
NOTE: If both subvolid and subvol are specified, they must point at the same subvolume, otherwise the mount will fail. | ||
thread_pool=<number> | Number of worker threads to start | Number of CPU threads +2, but not more than 8 |
user_subvol_rm_allowed | Allow subvolumes to be deleted by their respective owner | off |
Option | Description | Default Value |
---|---|---|
clear_cache | Force clearing and rebuilding of free space cache | - |
degraded | Is used to mount a filesystem that has a missing device, within RAID profile constraints | off |
nologreplay | Disable tree log replay on mount | off |
norecovery | Skip data recovery at mount time | off |
rescan_uuid_tree | Force check and rebuild procedure of the UUID tree | off |
rescue=<option,option,...> | Modes allowing mount with damaged filesystem structures |
|
treelog, notreelog | Enable tree logging used for fsync and O_SYNC writes | on |
usebackuproot | Enable autorecovery attempts if a bad tree root is found | off |
WARNING! These are advanced options and can cause further data loss unless used carefully. Ask for help before attempting recovery. |
Option | Description | Default Value |
---|---|---|
enospc_debug, noenospc_debug | Enable verbose output for some ENOSPC conditions | off |
check_int, check_int_data, check_int_print_mask=<value> | Debugging options for integrity checking | off |
fatal_errors=<action> | Action to take on encountering a fatal error | bug |
fragment=<type> | Debugging helper to intentionally fragment block groups | off |
WARNING! These are developer options that should not be used on regular filesystems. |
Option | Description | Default Value |
---|---|---|
recovery | Enable data recovery at mount time | off (deprecated since kernel 4.5) |
Option | Description | Default Value |
---|---|---|
inode_cache, noinode_cache | Enable/disable inode cache | off (removed since kernel 5.11) |
fstab[edit | edit source]
/etc/fstab
is the standard configuration file for mounting filesystems on Linux. It is used to define what filesystems get mounted where, and also any mount options that are needed. One exception is perhaps portable USB drives as many desktop environments handle those automatically.
The format for fstab is one line per mount point. Each line consists of the following 6 sections, in order:
# device-spec mount-point fs-type options dump pass
Column | Description |
---|---|
device-spec | Specifies the device to be mounted. Can be </dev/path>, UUID=<fs uuid>, or LABEL=<fs label> |
mount-point | The directory where the device will be mounted. |
fs-type | Specifies the file system type, for example btrfs, xfs or ext4. |
options | Defines mount options. If no specific options are needed, 'default' can be used as placeholder. |
dump | Used by the dump program. Not used by Btrfs and should be 0. |
pass | Specifies the order in which filesystem checks are performed during boot. Use 0 with Btrfs (see Btrfs/scrub).
0 = do not check |
fstab example[edit | edit source]
This is fstab on one of my machines. The disk is an SSD so I use compress-force=zstd:2
as this reduces the reduce amount of data written to the disk, increasing its life span.
Notice the use of the subvol
mount option and how it is used to mount different subvolumes.
/etc/fstab
# root filesystem UUID=446d32cb-a6da-45f0-9246-1483ad3420e0 / btrfs compress-force=zstd:2,noatime,subvol=volume/root 0 0 # /home subvol UUID=446d32cb-a6da-45f0-9246-1483ad3420e0 /home btrfs compress-force=zstd:2,noatime,subvol=volume/home 0 0 # A subvol for /var/tmp that is not being backed up. UUID=446d32cb-a6da-45f0-9246-1483ad3420e0 /var/tmp btrfs compress-force=zstd:2,noatime,noexec,subvol=volume/var_tmp 0 0 # Btrfs filesystem toplevel UUID=446d32cb-a6da-45f0-9246-1483ad3420e0 /mnt/rootvol/ btrfs compress-force=zstd:2,noatime,subvolid=5 0 0 # Use a ramdisk for temp files tmpfs /tmp tmpfs rw,nosuid,noexec,nodev,size=4G,mode=1777 0 0
/mnt/rootvol
is where the filesystem top level volume mounted. From here it is possible to access all snapshots and subvolumes directly, as shown in the tree list below.
The structure I use is a flat structure (all subvols are in one directory) that separates subvolumes and snapshots. It makes it easy to manage with backup software such as btrbk.
# tree -d -L 2 /mnt/rootvol/
/mnt/rootvol/ ├── snapshots │ ├── home.20240101T1801 │ ├── home.20240101T1901 ... │ ├── root.20240101T1801 │ ├── root.20240101T1901 │ ├── www.20240101T1801 │ └── www.20240101T1901 └── volume ├── boinc ├── home ├── home_root ├── jabber ├── mail ├── mysql ├── preview ├── redis ├── repos ├── root ├── src ├── unifi ├── var_cache ├── var_db ├── var_log ├── var_tmp └── www
Many distributions prefix subvolume names with @
to make it is easier to distinguish them from normal directories. The symbol @
doesn't have any inherent meaning in Btrfs and Btrfs does not impose any specific restrictions on a naming scheme, which makes it easy to adapt to your own specific needs.
This is is the fstab of a Debian machine running inside a QEMU virtual machine.
/etc/fstab
UUID=32234b01-c599-4eaf-a6b2-fafd35034062 / btrfs noatime,discard=async,subvol=@rootfs 0 0 UUID=32234b01-c599-4eaf-a6b2-fafd35034062 /home btrfs noatime,discard=async,subvol=@home 0 0 UUID=32234b01-c599-4eaf-a6b2-fafd35034062 /mnt/rootvol btrfs noatime,discard=async 0 0 LABEL=swap none swap sw,discard 0 0 apt_cache /var/cache/apt/archives virtiofs noatime,noexec,nodev,nosuid 0 0
Take a moment to read up on flat vs nested subvolume layouts over at Btrfs/Getting_Started#Subvolumes
Special Considerations[edit | edit source]
Multi device filesystems[edit | edit source]
Btrfs automatically assembles all the required devices with the same FS UUID so it is not necessary to specify each device during mount or in fstab. You can use any of the devices as mount device.
Create a RAID mirrored filesystem on sdb1 and sdb2
mkfs.btrfs -L my-volume -d raid1 /dev/sdb1 /dev/sdc1
Mount it using either sdb1 or sdc1
mount /dev/sdc1 /mnt/btrfs # or mount /dev/sdb1 /mnt/btrfs
There are some corner cases. For example if the initramfs does not scan for devices before attempting to mount root filesystem. In such cases it may help to use the device
mount option on the kernel command line to force Btrfs to consider them.
Cloned filesystems - duplicate UUID[edit | edit source]
Btrfs considers all devices with the same FS UUID as part of the same filesystem. It maintains an internal list of member device's UUIDs. While this is an integral part of Btrfs multi-device support, it needs special consideration if you want to clone a device using tools like dd
or ddrescue
.
When cloning a device, identical device UUIDs will confuse Btrfs. If the filesystem is already mounted, Btrfs will issue a warning in the kernel log (dmesg) when duplicate UUIDs are detected. If you try to mount the filesystem, the cloned device might get used instead of the original one - leading to data corruption.
It is better to use btrfs device replace, which will handle UUIDs correctly.
Read only mounts[edit | edit source]
The read only ro
mount option is a VFS mount option. This means it only prevents writes and changes from user space through that specific mount point. The Btrfs filesystem will still make changes such as logreplay and delayed transactions during mount.
Since Btrfs allows for multiple mount points, another mount point - even using the same subvolume, can be mounted as read-write.
Access time updates - atime[edit | edit source]
Access times, or atime, is a timestamp of when a file was last accessed. On ext2/3/4
, this is a relatively cheap operation, but with Btrfs it is very expensive because of the copy-on-write CoW
nature. When atime is updated, Btrfs has to make a new metadata extent to store the new timestamp. This problem is compounded if there are many snapshots, as all references have to be updated as well.
Until relatively recently, Linux updated this timestamp on each and every access. Now it uses relatime
, which updates atimes based on last modification time. This will still have a negative impact when using snapshots.
Most people do not need access time updates, and would benefit from the extra performance of using the noatime
mount option.
flushoncommit, commit[edit | edit source]
In a copy-on-write (COW) filesystem like Btrfs, when a file is modified, the changes are not written directly to the existing data blocks. Instead, a new set of blocks (extents) is created to store the modified data, leaving the original data intact. This process ensures that the original data remains unchanged until the new data is successfully written. Only after the new data is written, the file's metadata is updated to point to the new blocks.
Now, let's consider the scenario where a partial write occurs, such as an application saving modifications to a document, and a crash happens during this process. If a crash occurs before the new data blocks are completely written, the filesystem remains in a consistent state. This is because the original data blocks are still intact, and the file metadata still points to them. The partially written new data blocks are not yet linked to the file.
Many operations in Btrfs are atomic transactions, such as when the metadata is updated to point to the new data or when a snapshot is made. Depending on the load on the filesystem, a transaction can last quite a long time. If a crash happens, Btrfs will roll back to the previous commit.
The flushoncommit
option forces data dirtied by a write in a prior transaction to commit as part of the current commit. This ensures that writes don't span multiple commit periods (see the commit
mount option, reducing the amount of data lost during a rollback after a crash. In very extreme cases, such transactions can otherwise span minutes or hours. flushoncommit
can negatively affect performance, but it reduces the time writes remain uncommitted.
commit=<seconds>
option sets the interval of periodic transaction commit when data is synchronized to permanent storage. The value is specified in seconds and determines how frequently Btrfs commits changes to disk. Higher interval values lead to a larger amount of unwritten data, which could have consequences in the event of a system crash, as uncommitted data will be lost after a crash.
Note that this is the periodic commit interval. Btrfs can still make commits more frequently, as needed.