From Forza's ramblings

Btrfs[edit | edit source]

Btrfs logo.png

Btrfs is a modern filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Btrfs can be used as a generic filesystem in most situations.

Originally developed in 2007, Btrfs has evolved steadily and continues to see heavy active development. It's on-disk format has been considered stable since 2013.

Btrfs combines many features traditionally found in md and LVM, as well as introducing new concepts such as subvolumes. This makes it difficult to compare with traditional Linux filesystem like ext4.

One important benefit is that Btrfs keeps checksums for all data, not only metadata. This means it can reliably detect (and automatically repair, depending on chosen profile) corruptions where it would go unnoticed in other filsystems.

It important to understand that Btrfs is rather different than a traditional Linux filesystem such as Ext4 and XFS. Btrfs bridges traditionally distinct storage layers; multiple device management (md RAID), volume management (LVM), data integrity verification (dm-integrity) and self-healing. This adds a great deal of flexibility that is very difficult to achieve with distinctly separate tools.

Features[edit | edit source]

Copy-on-Write[edit | edit source]

Compared to traditional filesystems like ext4, FAT32 and NTFS, Btrfs uses a technique called Copy-on-Write (CoW) for all writes to the filesystem. CoW means that a write happens in a new block on the disk, rather than overwriting an existing data block. Once the block is updated on disk, the metadata is updated to point to the new block. This ensures data integrity in case of a failed write - you either have the original data or the new data. If a write fails in a traditional filesystem, the contents of a datablock may be incomplete or wrong.

As of Linux Kernel 5.0 Btrfs has the following features:

Data checksum and integrity[edit | edit source]

  • Checksums on all data and metadata (crc32c, xxhash, sha256 or blake2)
  • Self-healing in some configurations due to the nature of copy-on-write
  • Online data scrubbing for finding errors and automatically fixing them for files with redundant copies
  • Transparent compression via zlib, LZO and ZSTD, configurable per file or volume
  • Out-of-band data deduplication (requires userspace tools)
  • Online defragmentation as well as autodefrag mount option
  • In-place conversion from ext3/4 to Btrfs (with rollback)
  • Swap files
  • Block discard (A.K.A. trim support)
  • Offline filesystem check
  • File cloning (reflink, copy-on-write)
  • Quotas

Volume management[edit | edit source]

Volume management in Btrfs is the ability to combine and manage several disks as one filesystem.

  • Data and metadata profiles: SINGLE, DUP, RAID 0, RAID 1, RAID1c34 and RAID 10
  • Subvolumes (one or more separately mountable filesystem roots within each volume)
  • Online volume growth and shrinking
  • Online block device addition and removal
  • Online balancing (movement of objects between block devices to balance load)
  • Online conversion between data profiles (convert between different RAID levels or RAID<->SINGLE/DUP)
  • Snapshots, writable and read-only
  • Incremental backup
  • Send/receive (saving diffs between snapshots to a binary stream)
  • Union mounting of read-only storage, known as file system seeding (read-only storage used as a copy-on-write backing for a writable Btrfs)

The following profiles are supported:

Profile Description Disks Space Efficiency
SINGLE For single disks or for spanned volumes (A.K.A. Just a Bunch Of Drives - JBOD) 1 disk or more. 100%
DUP DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures. 1 disk or more 50%
RAID0 Similar to SINGLE, but with data allocated in parallel stripes on all drives. Can increases performance in some workloads. 2 disks or more 100%
RAID1 Like DUP, but stores 2 copies on separate disks. 2 disks or more 50%
RAID1c3 Stores 3 copies on separate disks. 3 disks or more 33.3%
RAID1c4 Stores 4 copies on separate disks. 4 disks or more 25%
RAID10 A combination of RAID1+RAID0 modes for increased performance in some workloads. 4 disks or more 50%
RAID5* Adds 1 disk as redundancy. 3 disks or more (N-1)/N
RAID6* Adds 2 disks as redundancy. 4 disks or more (N-1)/N
Note that RAID 5/6 modes are not yet stable

Subvolumes[edit | edit source]

A subvolume is a part of filesystem with its own independent file/directory hierarchy. Subvolumes can be mounted as normal filesystems and they can be renamed or moved. Nesting subvolumes inside each other is also possible.

A subvolume in btrfs can be accessed in two ways:

  • like any other directory that is accessible to the user
  • as a separately mounted filesystem

When a Btrfs filesystem is created with mkfs.btrfs, an initial subvolume is created. Often referred to as top-level[1] or root volume. It is common to create /home and other mountpoints as subvolumes rather than dividing the physical disk into partitions.

A comparison between traditional disk partition with Btrfs subvolumes:

  • Subvolumes can share file extents (file data) between each other.
  • partitions are block-level separations and cannot share data.
  • All Subvolumes share the same space as the whole filesystem.
  • Subvolumes can be snapshotted, renamed, deleted or made read-only.

Snapshots[edit | edit source]

A snapshot is a subvolume that is a clone (A.K.A reflink) of another subvolume. By default, snapshots are created read-write. File modifications in a snapshot do not affect the files in the original subvolume.

Read-only snapshots can be used to store incremental revisions of the filesystem. Btrfs send|receive can be used to send a snapshot to a another btrfs filesystem or to a backup-location.

Cloning and Deduplication[edit | edit source]

A rather unique feature of Btrfs is the concept of cloning files in an atomic way. This usually called a reflink.

This allows the user to make an instant copy of a file, similar to a hard link. When the original file or the copy is modified, COW, ensures that the files remain unique from each other.

File cloning (reflink, copy-on-write) via cp:

cp --reflink <source file> <destination file>

Tip: Put an alias in your .bash_profile or /etc/profile.d/ for cp to always do reflinks. Blog/Bash Aliases.

Deduplication means to two take two or more files and join equal parts as reflinked copies. If one of the files is changed, COW makes sure that the file remain unique from echother. Deduplication can save much disk space. See the depuplication page for more in-depth usage.

Data Allocation[edit | edit source]

Btrfs allocates all data in block groups. There are different types; SYSTEM, METADATA and DATA.

Type Description
DATA Stores normal user file data
METADATA Stores internal metadata. Small files can also stored inline
SYSTEM Stores mapping between physical devices and the logical space representing the filesystem
UNALLOCATED Any unallocated space

It is possible to use different profiles for DATA and METADATA in order to maximize space usage or resiliency against corruption. For example, it is common to use DATA as SINGLE and METADATA and DUP profile on single disk filesystems.

Each block group is allocated from the unallocated space as needed. DATA and METADATA block groups are allocated 1GiB at the time, multiplied by what PROFILE is used.

Because of the dynamic way Btrfs allocates block groups, it is somewhat difficult to calculate available disk space. You have to account for the fact that METADATA is dynamic and that you can have different PROFILES.

Example of a single disk filesystem using DUP and SINGLE profiles. You can see how METADATA DUP profile doubles the allocated space to 12GiB:

# btrfs filesystem usage /mnt
   Device size:                 233.47GiB
   Device allocated:            108.06GiB
   Device unallocated:          125.41GiB
   Device missing:                  0.00B
   Used:                         71.30GiB
   Free (estimated):            153.02GiB      (min: 90.32GiB)
   Data ratio:                       1.00
   Metadata ratio:                   2.00
   Global reserve:              195.05MiB      (used: 0.00B)
   Multiple profiles:                  no

Data,single: Size:96.00GiB, Used:68.38GiB (71.23%)
  /dev/sda3      96.00GiB

Metadata,DUP: Size:6.00GiB, Used:1.46GiB (24.29%)
  /dev/sda3      12.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB (0.05%)
  /dev/sda3      64.00MiB

  /dev/sda3     125.41GiB

  1. Btrfs glossary[1]