Btrfs/Replacing a disk

From Forza's ramblings


Replacing a disk in a btrfs filesystem[edit | edit source]

The read/write head crashed on the platter in a HDD

btrfs replace replaces an existing disk by doing a byte-for-byte copy of all data from the old to the new disk. This method is very easy and very fast compared to btrfs send|receive.

Prepare the new disk[edit | edit source]

Before you start you should prepare the new disk. Even though Btrfs supports raw disks, it is recommended that you do have a partition table on your disk to avoid confusion with other filesystems and tools. You can use fdisk or cfdisk from the util-linux package or GNU parted to create a partition table and a partion. I recommend you create a GUID Partition Table (GPT) instead of the old DOS MBR style partition tables.

Replacing with equal sized or a larger disk[edit | edit source]

The most common case is when you want to replace a disk with a new disk of equal or larger size.

First you need to check what devid the old disk has:

# btrfs filesystem usage
Label: 'my-vault'  uuid: df68a30d-d26e-4b9c-9606-a130e66ce63d
        Total devices 1 FS bytes used 658.88GiB
        devid    1 size 931.51GiB used 667.02GiB path /dev/sdc1 

Now you can start the replacing process.

# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start 1   /dev/sdd   /mnt/my-vault/

This will move all data from the old disk (/dev/sdc) to the new disk (/dev/sdd). When it is complete (see status monitoring) you can physically remove the old disk from your system.

Since the new disk is larger than the old, you need to resize the filesystem to take advantage of the new size:

# btrfs filesystem resize 1:max /mnt/my-vault

Replacing a disk in a RAID array[edit | edit source]

The process of replacing a disk in a multi-disk filesystem works the same way.

Consider this 4-disk RAID-1 filesystem:

# btrfs filesystem show /mnt/raid1/
Label: none  uuid: 3d7a895a-445e-400e-acbf-fb952e532fed
        Total devices 4 FS bytes used 3.88GiB
        devid    1 size 8.00GiB used 3.00GiB path /dev/nvme0n1
        devid    2 size 8.00GiB used 3.00GiB path /dev/nvme0n2
        devid    3 size 8.00GiB used 2.26GiB path /dev/nvme0n3
        devid    4 size 8.00GiB used 2.26GiB path /dev/nvme0n4

To replace one of the disks simply use the same command as before:

# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start 1 /dev/nvme0n5 /mnt/raid1/

Once replacing is completed (see status monitoring) you can use btrfs filesystem show to see the new layout.

# btrfs filesystem show /mnt/raid1/
Label: none  uuid: 3d7a895a-445e-400e-acbf-fb952e532fed
        Total devices 4 FS bytes used 3.88GiB
        devid    1 size 8.00GiB used 3.00GiB path /dev/nvme0n5
        devid    2 size 8.00GiB used 3.00GiB path /dev/nvme0n2
        devid    3 size 8.00GiB used 2.26GiB path /dev/nvme0n3
        devid    4 size 8.00GiB used 2.26GiB path /dev/nvme0n4

Replacing with a smaller disk[edit | edit source]

Btrfs replace can only replace a disk of equal size or bigger. If your new disk is smaller than the disk you intend to replace you need to shrink the filesystem before you can attempt a replacement.

You need to check what devid and size the old disk has:

# btrfs filesystem usage
Label: 'my-vault'  uuid: df68a30d-d26e-4b9c-9606-a130e66ce63d
        Total devices 1 FS bytes used 658.88GiB
        devid    1 size 931.51GiB used 667.02GiB path /dev/sdc1

For example, if the new disk is only 800GiB so we need to resize /dev/sdc1 to less than that

# btrfs filesystem resize <id>:<size> <mount-point>
# btrfs filesystem resize  1:799GiB   /mnt/my-vault

Now you can start the replacing process.

# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start  1   /dev/sdd   /mnt/my-vault/

This will move all data from the old (/dev/sdc) to the new disk (/dev/sdd). When it is complete you can physically remove the old disk from your system. Once replacing is completed (see status monitoring) you finish by making sure the filesystem uses all space on the new disk.

# btrfs filesystem resize <id>:<size> <mount-point>
# btrfs filesystem resize 1:max /mnt/my-vault

The special max keyword ensures btrfs uses all available space on the disk.

Replacing a failed disk[edit | edit source]

Replacing a failed disk in a RAID array can be done in two ways depending on your situation. It is advised that you ask help if you are unsure about how to repair your failing filesystem. https://wiki.tnonline.net/w/Category:Btrfs#Help

Disk is online but is having errors[edit | edit source]

If the disk is still online you can use the btrfs replace start -r option to avoid reading from the failing disk.

# btrfs replace start --help
-r     only read from <srcdev> if no other zero-defect mirror exists
       (enable this if your drive has lots of read errors, the access
       would be very slow)
# btrfs fi sh /mnt/raid10/

btrfs filesystem show /mnt/raid10/ Label: none uuid: 3d7a895a-445e-400e-acbf-fb952e532fed

       Total devices 4 FS bytes used 3.88GiB
       devid    2 size 8.00GiB used 2.28GiB path /dev/nvme0n2
       devid    3 size 8.00GiB used 2.28GiB path /dev/nvme0n3
       devid    4 size 8.00GiB used 2.28GiB path /dev/nvme0n4
       devid    5 size 8.00GiB used 2.28GiB path /dev/nvme0n5


# btrfs replace start '-r <id> <new-disk> <mount-point>
# btrfs replace start -r 5 /dev/nvme0n5 /mnt/raid10/

Replace continues in the background. See the status monitoring chapter.

# btrfs filesystem show /mnt/raid10/
Label: none  uuid: 3d7a895a-445e-400e-acbf-fb952e532fed
       Total devices 4 FS bytes used 3.88GiB
       devid    2 size 8.00GiB used 2.28GiB path /dev/nvme0n2
       devid    3 size 8.00GiB used 2.28GiB path /dev/nvme0n3
       devid    4 size 8.00GiB used 2.28GiB path /dev/nvme0n4
       devid    5 size 8.00GiB used 2.28GiB path /dev/nvme0n5

Disk is dead or removed from the system[edit | edit source]

If the filesystem is not mounted, you need to mount your disk with -o degraded, as the kernel won't mount a filesystem if some disks are missing.

# mount /dev/nvme0n2 /mnt/raid10/
mount: /mnt/raid10: wrong fs type, bad option, bad superblock on /dev/nvme0n2, missing codepage or helper program, or other error.

We can see in dmesg why the mount command failed.

# dmesg
[   31.695616] BTRFS error (device nvme0n2): devid 1 uuid b5d75e11-3262-48ef-8224-754290ebe0cd is missing
[   31.695620] BTRFS error (device nvme0n2): failed to read the system array: -2
[   31.696105] BTRFS error (device nvme0n2): open_ctree failed
# mount /dev/nvme0n2 /mnt/raid10/ -o degraded
# btrfs filesystem show /mnt/raid10/
 Label: none  uuid: 3d7a895a-445e-400e-acbf-fb952e532fed
       Total devices 4 FS bytes used 3.88GiB
       devid    2 size 8.00GiB used 2.28GiB path /dev/nvme0n2
       devid    3 size 8.00GiB used 2.28GiB path /dev/nvme0n3
       devid    4 size 8.00GiB used 2.28GiB path /dev/nvme0n4
       *** Some devices missing

First you need to find out the devid of the missing disk.

# btrfs device usage /mnt/raid10
/dev/nvme0n2, ID: 2
  Device size:             8.00GiB
  Device slack:              0.00B
  Data,RAID10:             4.00GiB
  Metadata,RAID10:       256.00MiB
  System,RAID10:          32.00MiB
  Unallocated:             3.72GiB

/dev/nvme0n3, ID: 3
  Device size:             8.00GiB
  Device slack:              0.00B
  Data,RAID10:             4.00GiB
  Metadata,RAID10:       256.00MiB
  System,RAID10:          32.00MiB
  Unallocated:             3.72GiB

missing, ID: 4
  Device size:               0.00B
  Device slack:              0.00B
  Data,RAID10:             4.00GiB
  Metadata,RAID10:       256.00MiB
  System,RAID10:          32.00MiB
  Unallocated:             3.72GiB

/dev/nvme0n1, ID: 5
  Device size:             8.00GiB
  Device slack:              0.00B
  Data,RAID10:             4.00GiB
  Metadata,RAID10:       256.00MiB
  System,RAID10:          32.00MiB
  Unallocated:             3.72GiB

Then we can remove the missing (id 4) device.

# btrfs replace start 4 /dev/nvme0n4 /mnt/raid10/

Replace continues in the background. See the status monitoring chapter.

# btrfs filesystem show /mnt/raid10/
Label: none  uuid: 3d7a895a-445e-400e-acbf-fb952e532fed
       Total devices 4 FS bytes used 3.88GiB
       devid    2 size 8.00GiB used 2.41GiB path /dev/nvme0n2
       devid    3 size 8.00GiB used 3.16GiB path /dev/nvme0n3
       devid    4 size 8.00GiB used 2.16GiB path /dev/nvme0n4
       devid    5 size 8.00GiB used 3.22GiB path /dev/nvme0n1

IMPORTANT: Because of the degraded mount we need to balance the filesystem to convert chunks to the correct RAID profile.

In this example we are using the RAID10 profile:

# btrfs balance  start -dconvert=raid10,soft -mconvert=raid10,soft /mnt/raid10/
Done, had to relocate 4 out of 8 chunks

Status monitoring[edit | edit source]

On a large disk, a replacement can take several hours. It is possible to monitor the status using btrfs replace status <mount-point>.

# btrfs replace status /mnt/my-vault
Started on 24.Jul 11:02:41, finished on 24.Jul 11:41:51, 0 write errs, 0 uncorr. read errs

Reference[edit | edit source]

The btrfs-replace reference manual can be found at https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-replace