Btrfs[edit | edit source]
Btrfs is a modern filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Btrfs can be used as a generic filesystem in most situations.
Originally developed in 2007, Btrfs has evolved steadily and continues to see heavy active development. It's on-disk format has been considered stable since 2013.
Btrfs combines many features traditionally found in md and LVM, as well as introducing new concepts such as subvolumes. This makes it difficult to compare with traditional Linux filesystem like ext4.
One important benefit is that Btrfs keeps checksums for all data, not only metadata. This means it can reliably detect (and automatically repair, depending on chosen profile) corruptions where it would go unnoticed in other filsystems.
It important to understand that Btrfs is rather different than a traditional Linux filesystem such as Ext4 and XFS. Btrfs bridges traditionally distinct storage layers; multiple device management (md RAID), volume management (LVM), data integrity verification (dm-integrity) and self-healing. This adds a great deal of flexibility that is very difficult to achieve with distinctly separate tools.
Features[edit | edit source]
Copy-on-Write[edit | edit source]
Compared to traditional filesystems like ext4, FAT32 and NTFS, Btrfs uses a technique called Copy-on-Write (CoW) for all writes to the filesystem. CoW means that a write happens in a new block on the disk, rather than overwriting an existing data block. Once the block is updated on disk, the metadata is updated to point to the new block. This ensures data integrity in case of a failed write - you either have the original data or the new data. If a write fails in a traditional filesystem, the contents of a datablock may be incomplete or wrong.
As of Linux Kernel 5.0 Btrfs has the following features:
Data checksum and integrity[edit | edit source]
- Checksums on all data and metadata (crc32c, xxhash, sha256 or blake2)
- Self-healing in some configurations due to the nature of copy-on-write
- Online data scrubbing for finding errors and automatically fixing them for files with redundant copies
- Transparent compression via zlib, LZO and ZSTD, configurable per file or volume
- Out-of-band data deduplication (requires userspace tools)
- Online defragmentation as well as autodefrag mount option
- In-place conversion from ext3/4 to Btrfs (with rollback)
- Swap files
- Block discard (A.K.A. trim support)
- Offline filesystem check
- File cloning (reflink, copy-on-write)
Volume management[edit | edit source]
Volume management in Btrfs is the ability to combine and manage several disks as one filesystem.
- Data and metadata profiles: SINGLE, DUP, RAID 0, RAID 1, RAID1c34 and RAID 10
- Subvolumes (one or more separately mountable filesystem roots within each volume)
- Online volume growth and shrinking
- Online block device addition and removal
- Online balancing (movement of objects between block devices to balance load)
- Online conversion between data profiles (convert between different RAID levels or RAID<->SINGLE/DUP)
- Snapshots, writable and read-only
- Incremental backup
- Send/receive (saving diffs between snapshots to a binary stream)
- Union mounting of read-only storage, known as file system seeding (read-only storage used as a copy-on-write backing for a writable Btrfs)
The following profiles are supported:
|SINGLE||For single disks or for spanned volumes (A.K.A. Just a Bunch Of Drives - JBOD)||1 disk or more.||100%|
|DUP||DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures.||1 disk or more||50%|
|RAID0||Similar to SINGLE, but with data allocated in parallel stripes on all drives. Can increases performance in some workloads.||2 disks or more||100%|
|RAID1||Like DUP, but stores 2 copies on separate disks.||2 disks or more||50%|
|RAID1c3||Stores 3 copies on separate disks.||3 disks or more||33.3%|
|RAID1c4||Stores 4 copies on separate disks.||4 disks or more||25%|
|RAID10||A combination of RAID1+RAID0 modes for increased performance in some workloads.||4 disks or more||50%|
|RAID5*||Adds 1 disk as redundancy.||3 disks or more||(N-1)/N|
|RAID6*||Adds 2 disks as redundancy.||4 disks or more||(N-1)/N|
|Note that RAID 5/6 modes are not yet stable|
Subvolumes[edit | edit source]
A subvolume is a part of filesystem with its own independent file/directory hierarchy. Subvolumes can be mounted as normal filesystems and they can be renamed or moved. Nesting subvolumes inside each other is also possible.
A subvolume in btrfs can be accessed in two ways:
- like any other directory that is accessible to the user
- as a separately mounted filesystem
When a Btrfs filesystem is created with mkfs.btrfs, an initial subvolume is created. Often referred to as top-level or root volume. It is common to create /home and other mountpoints as subvolumes rather than dividing the physical disk into partitions.
A comparison between traditional disk partition with Btrfs subvolumes:
- Subvolumes can share file extents (file data) between each other.
- partitions are block-level separations and cannot share data.
- All Subvolumes share the same space as the whole filesystem.
- Subvolumes can be snapshotted, renamed, deleted or made read-only.
Snapshots[edit | edit source]
A snapshot is a subvolume that is a clone (A.K.A reflink) of another subvolume. By default, snapshots are created read-write. File modifications in a snapshot do not affect the files in the original subvolume.
Read-only snapshots can be used to store incremental revisions of the filesystem. Btrfs send|receive can be used to send a snapshot to a another btrfs filesystem or to a backup-location.
Cloning and Deduplication[edit | edit source]
A rather unique feature of Btrfs is the concept of cloning files in an atomic way. This usually called a reflink.
This allows the user to make an instant copy of a file, similar to a hard link. When the original file or the copy is modified, COW, ensures that the files remain unique from each other.
File cloning (reflink, copy-on-write) via cp:
cp --reflink <source file> <destination file>
Tip: Put an alias in your .bash_profile or /etc/profile.d/ for cp to always do reflinks. Blog/Bash Aliases.
Deduplication means to two take two or more files and join equal parts as reflinked copies. If one of the files is changed, COW makes sure that the file remain unique from echother. Deduplication can save much disk space. See the depuplication page for more in-depth usage.
Data Allocation[edit | edit source]
Btrfs allocates all data in block groups. There are different types; SYSTEM, METADATA and DATA.
|DATA||Stores normal user file data|
|METADATA||Stores internal metadata. Small files can also stored inline|
|SYSTEM||Stores mapping between physical devices and the logical space representing the filesystem|
|UNALLOCATED||Any unallocated space|
It is possible to use different profiles for DATA and METADATA in order to maximize space usage or resiliency against corruption. For example, it is common to use DATA as SINGLE and METADATA and DUP profile on single disk filesystems.
Each block group is allocated from the unallocated space as needed. DATA and METADATA block groups are allocated 1GiB at the time, multiplied by what PROFILE is used.
Because of the dynamic way Btrfs allocates block groups, it is somewhat difficult to calculate available disk space. You have to account for the fact that METADATA is dynamic and that you can have different PROFILES.
Example of a single disk filesystem using DUP and SINGLE profiles. You can see how METADATA DUP profile doubles the allocated space to 12GiB:
# btrfs filesystem usage /mnt Overall: Device size: 233.47GiB Device allocated: 108.06GiB Device unallocated: 125.41GiB Device missing: 0.00B Used: 71.30GiB Free (estimated): 153.02GiB (min: 90.32GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 195.05MiB (used: 0.00B) Multiple profiles: no Data,single: Size:96.00GiB, Used:68.38GiB (71.23%) /dev/sda3 96.00GiB Metadata,DUP: Size:6.00GiB, Used:1.46GiB (24.29%) /dev/sda3 12.00GiB System,DUP: Size:32.00MiB, Used:16.00KiB (0.05%) /dev/sda3 64.00MiB Unallocated: /dev/sda3 125.41GiB
- Btrfs glossary