Btrfs/Profiles

From Forza's ramblings
(Redirected from Btrfs/RAID)

Btrfs profiles (RAID modes)[edit | edit source]

An early version of a RAID array.

Btrfs offers various allocation profiles that determine the layout of data across the disks in a filesystem. These profiles, often referred to as RAID modes, provide different levels of data redundancy, space efficiency and performance.

Profiles Overview[edit | edit source]

On a single disk filesystem the default profile for data chunks is SINGLE and DUP for metadata chunks. With btrfs-progs v5.14 and earlier, defaults varied between HDDs and SSDs/NVMe, but from btrfs-progs v5.15 onwards, the default metadata profile is DUP to enhance filesystem resilience.

With multiple disk filesystems, the default profile for data chunks is SINGLE and RAID1 for metadata chunks.

Currently supported profiles
Profile Description Min. Number
of Disks
Space Efficiency Resiliency
SINGLE For single disks or for spanned volumes (A.K.A. Just a Bunch Of Drives - JBOD) 1 disk or more. 100% None
MIXED* Combines metadata and data chunks into one. Useful for very small devices. Can be used on multiple devices. 1 disk or more. 100% None
DUP* DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures. 1 disk or more 50% Some (*)
RAID0 Similar to SINGLE, but with data allocated in parallel stripes on all drives. Can increases performance in some workloads. 2 disks or more 100% None
RAID1 Like DUP, but stores each of the 2 copies on separate disks. 2 disks or more 50% 1 disk failure
RAID1c3 Stores 3 copies on separate disks. 3 disks or more 33.3% 2 disk failures
RAID1c4 Stores 4 copies on separate disks. 4 disks or more 25% 3 disk failures
RAID10 A combination of RAID1+RAID0 modes for increased performance and redundancy. 4 disks or more 50% 1 disk failure
RAID5* A striped mode with 1 disk as redundancy. Can increase performance in some workloads. 3 disks or more (N-1)/N 1 disk failure
RAID6* A striped mode with 2 disks as redundancy. Can increase performance in some workloads. 4 disks or more (N-2)/N 2 disk failures.
* Mixed mode combines data and metadata in the same block groups. It can only be set when creating the filesystem with mkfs.btrfs and cannot be changed afterwards.
* DUP mode protects against data or metadata corruption, but not disk failures.
* RAID 5/6 modes are not yet stable or suitable for production use. Do not use for metadata.

Choosing a profile[edit | edit source]

Hard disk drive (HDD) with its casing removed

Selecting the appropriate profile depends on the specific use case. Redundant profiles like DUP or RAID1 for metadata are recommended to protect against corruption. Always use a redundant profile such as DUP or RAID1 for metadata, even if you use SINGLE or RAID0 for data as this protects the filesystem from many types of otherwise irreparable damage if some corruption would happen on the storage media.

Users often opt for a higher redundancy level for metadata compared to data to ensure filesystem integrity during device failures. For example choosing RAID1c3 metadata with a RAID1 data profile (requires at least three devices). This protects the filesystem if some additional corruption happens before a damaged device is replaced.

Despite recent improvements, RAID5 and RAID6 profiles are advised against until officially declared stable.

WARNING! Never use RAID5/6 profiles for metadata as it can lead to severe issues.

Striped profiles such as RAID0 and RAID10 has higher sequential read and write performance than other profiles. They may benefit other workloads too, but not always. The best is to benchmark the specific use-case.

For small devices under 16GiB, consider the MIXED profile to prevent ENOSPC issues. Note that Mixed profile does not offer any redundancy.

RAID1 on Btrfs means two copies on different devices. This means it can only reliably survive one faulty or missing device, even if the filesystem consists of many devices. The resiliency column in the matrix above explains how many devices in total that can be lost without fatal filesystem errors.

NOTE: It is not possible to recover from a missing or faulty device if there are no redundant copies available.

Choosing profile at mkfs time[edit | edit source]

It is possible to choose data and metadata profiles with mkfs.btrfs.

# mkfs.btrfs --help
Usage: mkfs.btrfs [options] dev [ dev ... ]
Options:
  allocation profiles:
        -d|--data PROFILE           data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single
        -m|--metadata PROFILE       metadata profile, values like for data profile
        -M|--mixed                  mix metadata and data together

Here we create a 6 device Btrfs filesystem with RAID10 data profile and RAID1c3 metadata profile:

# mkfs.btrfs -mraid1c3 -draid10 disk1 disk2 disk3 disk4 disk5 disk6 -L my-btrfs
btrfs-progs v5.16.2 
See http://btrfs.wiki.kernel.org for more information.

NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              my-btrfs
UUID:               ebc53cec-8ec1-42c6-8e30-9ca0cea7c2a9
Node size:          16384
Sector size:        4096
Filesystem size:    60.00GiB
Block group profiles:
  Data:             RAID10            3.00GiB
  Metadata:         RAID1C3           1.00GiB
  System:           RAID1C3           8.00MiB
SSD detected:       no
Zoned device:       no
Incompat features:  extref, skinny-metadata, no-holes, raid1c34
Runtime features:   free-space-tree
Checksum:           crc32c
Number of devices:  6
Devices:
   ID        SIZE  PATH
    1    10.00GiB  disk1
    2    10.00GiB  disk2
    3    10.00GiB  disk3
    4    10.00GiB  disk4
    5    10.00GiB  disk5
    6    10.00GiB  disk6

Changing profile on an existing filesystem[edit | edit source]

It is possible to change both the data and metadata profiles on an existing Btrfs filesystem while it is mounted using the btrfs balance filters.

Using btrfs filesystem usage we can see what profiles are used and how much utilisation they have.

# btrfs filesystem usage -T /media/my-btrfs/
Overall:
    Device size:                  60.00GiB
    Device allocated:              9.02GiB
    Device unallocated:           50.98GiB
    Device missing:                  0.00B
    Used:                        816.00KiB
    Free (estimated):             28.49GiB      (min: 19.99GiB)
    Free (statfs, df):            28.49GiB
    Data ratio:                       2.00
    Metadata ratio:                   3.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

              Data    Metadata  System              
Id Path       RAID10  RAID1C3   RAID1C3  Unallocated
-- ---------- ------- --------- -------- -----------
 1 /dev/loop0 1.00GiB         -        -     9.00GiB
 2 /dev/loop1 1.00GiB         -        -     9.00GiB
 3 /dev/loop2 1.00GiB         -        -     9.00GiB
 4 /dev/loop3 1.00GiB   1.00GiB  8.00MiB     7.99GiB
 5 /dev/loop4 1.00GiB   1.00GiB  8.00MiB     7.99GiB
 6 /dev/loop5 1.00GiB   1.00GiB  8.00MiB     7.99GiB
-- ---------- ------- --------- -------- -----------
   Total      3.00GiB   1.00GiB  8.00MiB    50.98GiB
   Used         0.00B 256.00KiB 16.00KiB        

To change the data profile to RAID1, we simply issue btrfs balance start with the -d convert filter.

# btrfs balance start -dconvert=raid1 /media/my-btrfs/
Done, had to relocate 1 out of 3 chunks

Converting profiles can take a very long time since all data on disk has to be re-written. It is possible to monitor an ongoing balance using btrfs balance status:

# btrfs balance status /media/my-btrfs
Balance on '/media/my-btrfs/' is running
0 out of about 5 chunks balanced (1 considered), 100% left

When balance is finished we can see that the allocation has changed:

# btrfs filesystem usage -T /media/my-btrfs/
Overall:
    Device size:                  60.00GiB
    Device allocated:              7.02GiB
    Device unallocated:           52.98GiB
    Device missing:                  0.00B
    Used:                        432.00KiB
    Free (estimated):             28.49GiB      (min: 19.66GiB)
    Free (statfs, df):            27.99GiB
    Data ratio:                       2.00
    Metadata ratio:                   3.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

              Data    Metadata  System              
Id Path       RAID1   RAID1C3   RAID1C3  Unallocated
-- ---------- ------- --------- -------- -----------
 1 /dev/loop0 2.00GiB         -        -     8.00GiB
 2 /dev/loop1 1.00GiB         -        -     9.00GiB
 3 /dev/loop2 1.00GiB         -        -     9.00GiB
 4 /dev/loop3       -   1.00GiB  8.00MiB     8.99GiB
 5 /dev/loop4       -   1.00GiB  8.00MiB     8.99GiB
 6 /dev/loop5       -   1.00GiB  8.00MiB     8.99GiB
-- ---------- ------- --------- -------- -----------
   Total      2.00GiB   1.00GiB  8.00MiB    52.98GiB
   Used         0.00B 128.00KiB 16.00KiB            

It is possible to convert between any profile combinations with the exception for the mixed profile.

Size restrictions with multiple devices[edit | edit source]

The Btrfs space calculator showing three disks in a RAID1 profile.

All profiles can be used on a multi-device filesystem, accommodating disks of varying sizes. However, not all space may be available for data depending on the profile. Btrfs also allows for the use of different sized disks, even in RAID profiles.

Depending on the profile used, not all added space would be available for data. For example if you have a RAID1 filesystem with two 3TiB drives, and you add a third 8TiB drive, the total usable space will be 6TiB, while 2TiB will be unused.

Use the excellent btrfs disk usage calculator to evaluate space efficiency with different sized disks.

RAID or backups[edit | edit source]

It's important to distinguish RAID from backups. While RAID with redundancy protects against hardware failures, backups protect against data loss from accidental deletion, software errors, or malicious attacks. Both have their place in data protection strategies.

To ensure data safety, the rule of thumb is to ALWAYS HAVE BACKUPS.

Generally speaking, RAID protects against downtime (reduces time to get the computer back online) if a device fails, and backups protects against data loss.

Btrfs send/receive is an efficient method for creating backups of subvolumes.

Self-Healing with Redundant Profiles[edit | edit source]

Btrfs is designed with advanced features that contribute to the filesystem's integrity and resilience. One such feature is its self-healing capability when using redundant profiles such as DUP and RAID1/1c3/1c4/10/5/6. This self-healing mechanism is an automatic process that takes place during normal filesystem operations.

How Self-Healing Works[edit | edit source]

When Btrfs is configured with redundant profiles, multiple copies of data and metadata are stored across different devices. If a read error occurs, Btrfs detects it through its checksum verification process. The filesystem will then attempt to read from another copy of the corrupted data. If a valid copy is found, Btrfs automatically replaces the corrupted block with the good one, effectively healing the data on-the-fly.

While Btrfs can continue to operate on a device with damaged sectors by using the redundancy provided by profiles like DUP and RAID, this is not a long-term solution. Continuously using a failing disk may lead to further degradation and eventual data loss that cannot be recovered by self-healing. Therefore, while Btrfs can mitigate the impact of disk errors to an extent, it's advisable to replace faulty hardware as soon as possible to maintain the integrity and reliability of the filesystem.

Regular Maintenance with Btrfs Scrub[edit | edit source]

Btrfs includes a maintenance utility called scrub that is designed to proactively detect and rectify silent data corruptions, also known as bit rot. The scrub process reads all the data on the filesystem and verifies it against their checksums. If it finds any discrepancies, it attempts to automatically repair the data using a valid copy from a redundant profile.

For more detailed options and information about this command, refer to the btrfs-scrub manual page or the Btrfs/Scrub wiki page.