Btrfs[edit | edit source]
Btrfs is a modern filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Btrfs can be used as a generic filesystem in most situations.
Originally developed in 2007, Btrfs has evolved steadily and continues to see heavy active development. It's on-disk format has been considered stable since 2013.
Btrfs combines many features traditionally found in md-raid and LVM, as well as introducing new concepts such as subvolumes and reflinks. This makes it difficult to compare Btrfs with traditional Linux filesystem like ext4.
One very significant feature is that Btrfs keeps checksums for all data, not only metadata. This means it can reliably detect (and automatically repair, depending on chosen profile) corruptions where it would go unnoticed in other filsystems or storage setups.
It important to understand that Btrfs is quite different than a traditional Linux filesystem because it bridges traditionally distinct storage layers; multiple device management (md RAID), volume management (LVM), data integrity verification (dm-integrity) and self-healing. This adds a great deal of flexibility that is very difficult to achieve across these boundaries with separate tools.
Btrfs is currently the only Linux filesystem with native support for Zoned storage. This useful on host managed SMR HDD's and on NVMe drives that supports the Zoned namespace. Read more about Zoned storage at https://zonedstorage.io/
Features[edit | edit source]
Copy-on-Write[edit | edit source]
Btrfs uses a technique called Copy-on-Write (CoW) for all writes to the filesystem. CoW means that a write always happens in a new block on the disk instead of overwriting an existing data block. Once the block is updated on disk, the metadata is updated to point to the new block. This ensures data integrity in case of a failed write - you either have the original data or the new data. If a write fails in a traditional filesystem, the contents of a datablock may instead be incomplete or wrong.
As of Linux Kernel 5.0 Btrfs has the following features:
Data checksum and integrity[edit | edit source]
- Checksums on all data and metadata (crc32c, xxhash64, sha256 or blake2)
- Self-healing in some configurations due to the nature of copy-on-write
- Tree-checker, post-read and pre-write metadata verification
- Online data scrubbing for finding errors and automatically fixing them for files with redundant copies
- Offline filesystem check
- Transparent compression via zlib, LZO and ZSTD, configurable per file or volume
- Data deduplication using userspace tools
- Online defragmentation as well as autodefrag mount option
- In-place  from ext3/4 to Btrfs (with rollback)
- Swap files
- Block discard (trim support)
- Offline filesystem check
- File cloning (reflink, copy-on-write)
- Quotas, subvolume-aware
Volume management[edit | edit source]
Volume management in Btrfs is the ability to combine and manage several disks as one filesystem.
- Data and metadata profiles: SINGLE, DUP, RAID 0, RAID 1, RAID 1c34 and RAID 10
- Subvolumes (one or more separately mountable filesystem roots within each volume)
- Online volume growth and shrinking
- Online block device addition and removal
- Online balancing (moving blocks to balance load and make more efficient space-usage)
- Online conversion between data profiles (convert between different RAID levels or RAID<->SINGLE/DUP)
- Snapshots, writable and read-only
- Incremental backup
- Send/receive (saving diffs between snapshots to a binary stream)
- Seed devices. Create a (read-only) filesystem that acts as a template to seed other Btrfs filesystems. Using copy on write, all modifications are stored on different devices and the original is unchanged.
- Zoned device support (SMR/ZBC/ZNS friendly allocation)
The following profiles are supported:
|SINGLE||For single disks or for spanned volumes (A.K.A. Just a Bunch Of Drives - JBOD)||1 disk or more.||100%||None|
|MIXED||Combines metadata and data chunks into one. Useful for very small devices. Can be used on multiple devices.||1 disk or more.||100%||None|
|DUP*||DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures.||1 disk or more||50%||Some (*)|
|RAID0||Similar to SINGLE, but with data allocated in parallel stripes on all drives. Can increases performance in some workloads.||2 disks or more||100%||None|
|RAID1||Like DUP, but stores each of the 2 copies on separate disks.||2 disks or more||50%||1 disk failure|
|RAID1c3||Stores 3 copies on separate disks.||3 disks or more||33.3%||2 disk failures|
|RAID1c4||Stores 4 copies on separate disks.||4 disks or more||25%||3 disk failures|
|RAID10||A combination of RAID1+RAID0 modes for increased performance and redundancy.||4 disks or more||50%||1 disk failure|
|RAID5*||A striped mode with 1 disk as redundancy. Can increases performance in some workloads.||3 disks or more||(N-1)/N||1 disk failure|
|RAID6*||A striped mode with 2 disks as redundancy. Can increases performance in some workloads.||4 disks or more||(N-1)/N||2 disk failures|
|DUP mode protects against data or metadata corruption, but not disk failures|
|RAID 5/6 modes are not yet stable or suitable for production use.|
It is possible to use different profiles for metadata chunks and normal data chunks. For example
dup profile for metadata and
single profile for data chunks on a single disk.
Subvolumes[edit | edit source]
A subvolume is a part of filesystem with its own independent file and directory hierarchy. Subvolumes can be mounted as normal filesystems and they can be renamed or moved like normal directories. Nesting subvolumes inside each other is also possible.
A subvolume in Btrfs can be accessed in two ways:
- like any other directory that is accessible to the user
- as a separately mounted filesystem
When a Btrfs filesystem is created with mkfs.btrfs, an initial subvolume is created. Often referred to as top-level or root volume. It is common to create /home and other mountpoints as subvolumes rather than dividing the physical disk into partitions.
A comparison between traditional disk partition with Btrfs subvolumes:
- Subvolumes can share file extents (file data) between each other.
- partitions are block-level separations and cannot share data.
- All Subvolumes share the same available space as the whole filesystem.
- Subvolumes can be snapshotted, renamed, deleted or made read-only.
Snapshots[edit | edit source]
A snapshot is a subvolume that is a clone (reflink) of another subvolume. They can be created as read-write or read-only. File modifications in a snapshot do not affect the files in the original subvolume.
- Snapshots only store differences, so initially they take no additional disk space.
- Snapshots can be used to store several revisions of the subvolume.
- Snapshots do not have an incremental relationship. They do not depend on keeping the previous snapshots to remain valid.
- Efficient incremental backups are possible using Btrfs send|receive. Snapshots can be sent to a another btrfs filesystem or to a different backup-location over the network. When using incremental snapshots, only the differences between each snapshot is sent, greatly reducing the space and time needed to make the backup.
Cloning and Deduplication[edit | edit source]
A great feature of Btrfs is the concept of cloning files in an atomic way. This usually called a reflink.
This allows the user to make an instant copy of a file, similar to a hard link. When either the original file or the copy is modified, CoW, ensures that the files remain unique from each other.
File cloning (reflink, copy-on-write) via cp:
cp --reflink <source file> <destination file>
Reflink copies of files and directories are useful ways to make instant copies before making changes, like a MediaWiki or WordPress upgrade.
Deduplication means to take two or more files and join equal parts as reflinked copies. If one of the files is changed, CoW ensures that the file remain unique from eachother. Deduplication can save much disk space. See the depuplication page for more in-depth usage.
Data Allocation[edit | edit source]
Btrfs allocates all data in chunks, also referred to as block groups. There are three different types of chunks; SYSTEM, METADATA and DATA.
|DATA||Stores normal user file data|
|METADATA||Stores internal metadata. Small files can also stored inline|
|SYSTEM||Stores mapping between physical devices and the logical space representing the filesystem|
|UNALLOCATED||Any unallocated space|
It is possible to use different profiles for DATA and METADATA in order to maximize space usage or resiliency against corruption. For example, it is common to use DATA as SINGLE and METADATA and DUP profile on single disk filesystems.
Each block group is allocated from the unallocated space as needed. DATA and METADATA block groups are normally allocated 1GiB at the time, multiplied by what PROFILE is used. For a RAID1 filesystem, 2x1GiB block groups will be allocated each time.
Because of the dynamic way Btrfs allocates block groups, it is somewhat difficult to calculate available disk space. You have to account for the fact that METADATA is dynamic and that you can have different PROFILES.
Example of a single disk filesystem using DUP and SINGLE profiles. You can see how METADATA DUP profile doubles the allocated space to 12GiB:
# btrfs filesystem usage /mnt
Overall: Device size: 233.47GiB Device allocated: 108.06GiB Device unallocated: 125.41GiB Device missing: 0.00B Used: 71.30GiB Free (estimated): 153.02GiB (min: 90.32GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 195.05MiB (used: 0.00B) Multiple profiles: no Data,single: Size:96.00GiB, Used:68.38GiB (71.23%) /dev/sda3 96.00GiB Metadata,DUP: Size:6.00GiB, Used:1.46GiB (24.29%) /dev/sda3 12.00GiB System,DUP: Size:32.00MiB, Used:16.00KiB (0.05%) /dev/sda3 64.00MiB Unallocated: /dev/sda3 125.41GiB
- Btrfs glossary