Btrfs/Profiles

From Forza's ramblings
(Redirected from Btrfs/RAID)

Btrfs profiles (RAID modes)

An early version of a RAID array.

Btrfs has support for various allocation profiles which determines how data is written to the disks in the filesystem. With the exception of Mixed profile, it is possible to use different profiles for metadata chunks and normal data chunks.

On a single disk filesystem the default is SINGLE profile for data and DUP for metadata chunks.

With btrfs-progs 5.14 and earlier the defaults were different on flash based media such as NVMe and SSD and normal rotational HDDs. The default was DUP metadata on HDDs and SINGLE metadata on NVMe and SSD. It was changed because it increases reliability and resiliency of the filesystem.

Currently supported profiles

Profile Description Disks Space Efficiency Resiliency
SINGLE For single disks or for spanned volumes (A.K.A. Just a Bunch Of Drives - JBOD) 1 disk or more. 100% None
MIXED* Combines metadata and data chunks into one. Useful for very small devices. Can be used on multiple devices. 1 disk or more. 100% None
DUP* DUP means duplicate. This ensures two copies exists on the same disk. Can be used on one or several drives like SINGLE mode but does not protect against disk failures. 1 disk or more 50% Some (*)
RAID0 Similar to SINGLE, but with data allocated in parallel stripes on all drives. Can increases performance in some workloads. 2 disks or more 100% None
RAID1 Like DUP, but stores each of the 2 copies on separate disks. 2 disks or more 50% 1 disk failure
RAID1c3 Stores 3 copies on separate disks. 3 disks or more 33.3% 2 disk failures
RAID1c4 Stores 4 copies on separate disks. 4 disks or more 25% 3 disk failures
RAID10 A combination of RAID1+RAID0 modes for increased performance and redundancy. 4 disks or more 50% 1 disk failure
RAID5* A striped mode with 1 disk as redundancy. Can increase performance in some workloads. 3 disks or more (N-1)/N 1 disk failure
RAID6* A striped mode with 2 disks as redundancy. Can increase performance in some workloads. 4 disks or more (N-2)/N 2 disk failures.
Mixed mode combines data and metadata in the same block groups. It can only be set when creating the filesystem with mkfs.btrfs and cannot be changed afterwards.
DUP mode protects against data or metadata corruption, but not disk failures.
RAID 5/6 modes are not yet stable or suitable for production use.

Choosing profile at mkfs time

It is possible to choose data and metadata profiles with mkfs.btrfs.

mkfs.btrfs --help
Usage: mkfs.btrfs [options] dev [ dev ... ]
Options:
  allocation profiles:
        -d|--data PROFILE           data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single
        -m|--metadata PROFILE       metadata profile, values like for data profile
        -M|--mixed                  mix metadata and data together

Here we create a 6 device Btrfs filesystem with RAID10 data profile and RAID1c3 metadata profile:

# mkfs.btrfs -mraid1c3 -draid10 disk1 disk2 disk3 disk4 disk5 disk6 -L my-btrfs
btrfs-progs v5.16.2 
See http://btrfs.wiki.kernel.org for more information.

NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              my-btrfs
UUID:               ebc53cec-8ec1-42c6-8e30-9ca0cea7c2a9
Node size:          16384
Sector size:        4096
Filesystem size:    60.00GiB
Block group profiles:
  Data:             RAID10            3.00GiB
  Metadata:         RAID1C3           1.00GiB
  System:           RAID1C3           8.00MiB
SSD detected:       no
Zoned device:       no
Incompat features:  extref, skinny-metadata, no-holes, raid1c34
Runtime features:   free-space-tree
Checksum:           crc32c
Number of devices:  6
Devices:
   ID        SIZE  PATH
    1    10.00GiB  disk1
    2    10.00GiB  disk2
    3    10.00GiB  disk3
    4    10.00GiB  disk4
    5    10.00GiB  disk5
    6    10.00GiB  disk6

Changing profile on an existing filesystem

It is possible to change both the data and metadata profiles on an existing Btrfs filesystem while it is mounted using the btrfs balance filters.

Using btrfs filesystem usage we can see what profiles are used and how much utilisation they have.

# btrfs filesystem usage -T /media/my-btrfs/
Overall:
    Device size:                  60.00GiB
    Device allocated:              9.02GiB
    Device unallocated:           50.98GiB
    Device missing:                  0.00B
    Used:                        816.00KiB
    Free (estimated):             28.49GiB      (min: 19.99GiB)
    Free (statfs, df):            28.49GiB
    Data ratio:                       2.00
    Metadata ratio:                   3.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

              Data    Metadata  System              
Id Path       RAID10  RAID1C3   RAID1C3  Unallocated
-- ---------- ------- --------- -------- -----------
 1 /dev/loop0 1.00GiB         -        -     9.00GiB
 2 /dev/loop1 1.00GiB         -        -     9.00GiB
 3 /dev/loop2 1.00GiB         -        -     9.00GiB
 4 /dev/loop3 1.00GiB   1.00GiB  8.00MiB     7.99GiB
 5 /dev/loop4 1.00GiB   1.00GiB  8.00MiB     7.99GiB
 6 /dev/loop5 1.00GiB   1.00GiB  8.00MiB     7.99GiB
-- ---------- ------- --------- -------- -----------
   Total      3.00GiB   1.00GiB  8.00MiB    50.98GiB
   Used         0.00B 256.00KiB 16.00KiB        

To change the data profile to RAID1, we simply issue btrfs balance start with the -d convert filter.

# btrfs balance start -dconvert=raid1 --background /media/my-btrfs/
Done, had to relocate 1 out of 3 chunks

Converting profiles can take a very long time since all data on disk has to be re-written. It is possible to monitor an ongoing balance using btrfs balance status:

# btrfs balance status /media/my-btrfs
Balance on '/media/my-btrfs/' is running
0 out of about 5 chunks balanced (1 considered), 100% left

When balance is finished we can see that the allocation has changed:

# btrfs filesystem usage -T /media/my-btrfs/
Overall:
    Device size:                  60.00GiB
    Device allocated:              7.02GiB
    Device unallocated:           52.98GiB
    Device missing:                  0.00B
    Used:                        432.00KiB
    Free (estimated):             28.49GiB      (min: 19.66GiB)
    Free (statfs, df):            27.99GiB
    Data ratio:                       2.00
    Metadata ratio:                   3.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

              Data    Metadata  System              
Id Path       RAID1   RAID1C3   RAID1C3  Unallocated
-- ---------- ------- --------- -------- -----------
 1 /dev/loop0 2.00GiB         -        -     8.00GiB
 2 /dev/loop1 1.00GiB         -        -     9.00GiB
 3 /dev/loop2 1.00GiB         -        -     9.00GiB
 4 /dev/loop3       -   1.00GiB  8.00MiB     8.99GiB
 5 /dev/loop4       -   1.00GiB  8.00MiB     8.99GiB
 6 /dev/loop5       -   1.00GiB  8.00MiB     8.99GiB
-- ---------- ------- --------- -------- -----------
   Total      2.00GiB   1.00GiB  8.00MiB    52.98GiB
   Used         0.00B 128.00KiB 16.00KiB            

It is possible to convert between any profile combinations with the exception for the mixed profile.

Choosing a profile

Hard Drive with its casing removed

There are many different use-cases for different profiles and it is difficult to give in depth suggestions on which ones to use.

Generally, always use a redundant profile such as DUP or RAID1 for metadata, even if you use SINGLE or RAID0 for data. This protects the filesystem from many forms of irreparable damage if there were some corruptions in the metadata.

Many people often choose one higher mode of redundancy for metadata than data in order to better protect the filesystem if a device is broken and some additional corruption happens. For example choosing RAID1c3 metadata with a RAID1 data profile (requires at least three devices).

Striped profiles such as RAID0 and RAID10 has higher sequential read and write performance than the other profiles. They may benefit other workloads too, but not always. The best is to benchmark the specific use-case.

On small devices below 16GiB it can be worth considering the MIXED profile as this avoids the complications caused by the separate data and metadata block groups, which can lead to a ENOSPC situation.

It is not possible to recover from a missing or damaged device if there is no redundant copies available. For RAID1 only one device can be missing, even if the filesystem consists of many devices. The resiliency column in the matrix above explains how many devices in total that can be lost without fatal filesystem errors.

Size restrictions with multiple devices

It is possible to use all profiles on a multi-device filesystem. Btrfs also allows for the use of different sized disks, even in RAID profiles.

Depending on the profile used, not all added space would be available for data. For example if you have a RAID1 filesystem with two 3TiB drives, and you add a third 8TiB drive, the total usable space will be 6TiB, while 2TiB will be unused.

Use the excellent btrfs disk usage calculator to see how efficient the disk space usage would be.

The Btrfs space calculator showing three disks in a RAID1 profile.

RAID or backups

A common mistake is to confuse RAID with backups. Both thave their natural uses and it is important to carefully consider each option.

If you are unsure, the safe suggestion is to ALWAYS HAVE BACKUPS.