Btrfs/Replacing a disk
Replacing a disk in a btrfs filesystem[edit | edit source]
btrfs replace replaces an existing disk by doing a byte-for-byte copy of all data from the old to the new disk. It is similar to a dd
clone, but works on the filesystem level.
btrfs replace
is the preferred method of replacing a disk in a btrfs filesystem, especially when there is a damaged or missing device. While btrfs device add
+ btrfs device remove
also works, it is a much slower method and can cause issues if there are read/write errors. btrfs replace
is not only faster, it handles failures and errors better.
Btrfs replace should be used on a mounted filesystem. If you have a missing disk in a filesystem with a redundant RAID profile, you can mount the filesystem using the degraded
mount option.
mount -o degraded /dev/sdb1 /mnt/btrfs
systemctl daemon-reload
before mounting the filesystem in degraded modePrepare the new disk[edit | edit source]
Before you start replacing the old disk you should prepare the new disk.
Even though Btrfs supports raw disks, it is recommended that you do have a partition table on your disk to avoid confusion with other filesystems and tools. You can use fdisk or cfdisk from the util-linux package or GNU parted to create a partition table and a partion to hold your new btrfs filesystem. Use a GUID Partition Table (GPT) instead of the old DOS MBR style partition table. GPT supports larger than 2TiB sized disks and has a backup copy.
If you have an NVME or SSD disk, it is good practice to empty it using blkdiscard
. Discard tells the drive's firmware that the disk is empty and it improves it's performance and wear. Do this before you create any partition tables as it will erase everything of the disk.
Here is a basic example on how to use GNU partedcreate a GPT partition table and one partition that fills the whole device /dev/nvme0n4
1) First we issue blkdiscard
to clear the new disk.
# blkdiscard /dev/nvme0n4 -v
/dev/nvme0n4: Discarded 10737418240 bytes from the offset 0
2) Then we create a new partition table using parted
.
# parted /dev/nvme0n4
GNU Parted 3.4 Using /dev/nvme0n4 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) mklabel gpt (parted) mkpart primary btrfs 4MiB 100% (parted) print Model: ORCL-VBOX-NVME-VER12 (nvme) Disk /dev/nvme0n4: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 4194kB 10.7GB 10.7GB btrfs primary (parted) quit Information: You may need to update /etc/fstab.
Replacing with equal sized or a larger disk[edit | edit source]
The most common case is when you want to replace a disk with a new disk of equal or larger size.
First you need to check what devid the old disk has:
# btrfs filesystem show
Label: 'my-vault' uuid: df68a30d-d26e-4b9c-9606-a130e66ce63d Total devices 1 FS bytes used 658.88GiB devid 1 size 931.51GiB used 667.02GiB path /dev/sdc1
Now you can start the replacing process.
# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start 1 /dev/sdd1 /mnt/my-vault/
This will move all data from the old ddisk /dev/sdc1
to the new disk/dev/sdd1
.
Replace continues in the background. When it is complete you can use the disk for other purposes or remove it from the system. See the status monitoring chapter for checking the progress.
If the new disk is larger than the old, you need to resize the filesystem to take advantage of the new size:
# btrfs filesystem resize 1:max /mnt/my-vault
The special max keyword ensures that Btrfs uses all available space on the disk.
Replacing a disk in a RAID array[edit | edit source]
The process of replacing a disk in a multi-disk filesystem works the same way.
Consider this 4-disk RAID-1 filesystem:
# btrfs filesystem show /mnt/raid1/
Label: 'vault' uuid: da2028f1-377d-4100-849b-29e39b869137 Total devices 4 FS bytes used 6.19GiB devid 1 size 8.00GiB used 3.25GiB path /dev/nvme0n1 devid 2 size 8.00GiB used 3.25GiB path /dev/nvme0n2 devid 3 size 8.00GiB used 3.26GiB path /dev/nvme0n3 devid 4 size 8.00GiB used 3.26GiB path /dev/nvme0n4
To replace /dev/nvme0n1 use the same command as before:
# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start 1 /dev/nvme0n5 /mnt/raid1/
Once replacing is completed (see status monitoring) you can use btrfs filesystem show
to see the new layout.
# btrfs filesystem show /mnt/raid1/
Label: 'vault' uuid: da2028f1-377d-4100-849b-29e39b869137 Total devices 4 FS bytes used 6.19GiB devid 1 size 8.00GiB used 3.25GiB path /dev/nvme0n5 devid 2 size 8.00GiB used 3.25GiB path /dev/nvme0n2 devid 3 size 8.00GiB used 3.26GiB path /dev/nvme0n3 devid 4 size 8.00GiB used 3.26GiB path /dev/nvme0n4
If the new disk is larger than the old, you need to resize the filesystem to take advantage of the new size:
# btrfs filesystem resize 1:max /mnt/my-vault
The special max keyword ensures that Btrfs uses all available space on the disk.
Replacing with a smaller disk[edit | edit source]
btrfs replace
can only replace a disk of equal size or bigger. If your new disk is smaller than the disk you intend to replace you need to shrink the filesystem before you can attempt a replacement.
You need to determine what devid and size the old disk has:
# btrfs filesystem show
Label: 'my-vault' uuid: df68a30d-d26e-4b9c-9606-a130e66ce63d Total devices 1 FS bytes used 658.88GiB devid 1 size 931.51GiB used 667.02GiB path /dev/sdc1
For example, if the new disk is only 800GiB so we need to resize /dev/sdc1 to less than that
# btrfs filesystem resize <id>:<size> <mount-point>
# btrfs filesystem resize 1:799GiB /mnt/my-vault
Now you can start the replacing process.
# btrfs replace start <id> <new-disk> <mount-point>
# btrfs replace start 1 /dev/sdd1 /mnt/my-vault/
This will move all data from the old disk /dev/sdc1
to the new disk/dev/sdd1
. When it is complete you can physically remove the old disk from your system.
Once replace is completed (see status monitoring) you should make sure the filesystem uses all space on the new disk.
# btrfs filesystem resize <id>:<size> <mount-point>
# btrfs filesystem resize 1:max /mnt/my-vault
The special max keyword ensures btrfs uses all available space on the disk.
Replacing a failed disk[edit | edit source]
Replacing a failed disk in a RAID array can be done in two ways depending on your situation. It is advised that you ask help before attempting to repair your failing filesystem. https://wiki.tnonline.net/w/Category:Btrfs#Help
Disk is online but is having errors[edit | edit source]
If a disk is having read errors you can use the same process described chapter 3. btrfs replace
has a special option to avoid reading from a failed disk when possible. Reading from disks with bad blocks can be very slow, so this option will help a lot.
# btrfs replace start --help
-r only read from <srcdev> if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow)
Example:
# btrfs replace start -r <id> <new-disk> <mount-point>
# btrfs replace start -r 5 /dev/nvme0n5 /mnt/raid10/
Disk is dead or removed from the system[edit | edit source]
If the filesystem is not mounted, you need to mount your disk with -o degraded
, as the kernel won't mount a filesystem if some disks are missing.
# mount /dev/nvme0n1 /mnt/my-vault/
mount: /mnt/my-vault: wrong fs type, bad option, bad superblock on /dev/nvme0n1, missing codepage or helper program, or other error.
We can see in kernel logs using dmesg
why the mount command failed.
# dmesg
[ 39.537920] BTRFS error (device nvme0n1): devid 4 uuid a14e2826-db7b-41cc-a4b9-6f0d599a0c24 is missing [ 39.537925] BTRFS error (device nvme0n1): failed to read the system array: -2 [ 39.538465] BTRFS error (device nvme0n1): open_ctree failed
# mount /dev/nvme0n2 /mnt/my-vault/ -o degraded
# btrfs filesystem show /mnt/my-vault
Label: 'vault' uuid: 7714d5de-5407-4fbe-b356-82bd086f6ded Total devices 4 FS bytes used 5.62GiB devid 1 size 8.00GiB used 3.20GiB path /dev/nvme0n1 devid 2 size 8.00GiB used 3.20GiB path /dev/nvme0n2 devid 3 size 8.00GiB used 3.20GiB path /dev/nvme0n3 *** Some devices missing
First you need to find out the device ID of the missing disk.
# btrfs device usage /mnt/my-vault/
/dev/nvme0n1, ID: 1 Device size: 8.00GiB Device slack: 0.00B Data,RAID10/4: 3.00GiB Metadata,RAID10/4: 192.00MiB System,RAID10/4: 8.00MiB Unallocated: 4.80GiB /dev/nvme0n2, ID: 2 Device size: 8.00GiB Device slack: 0.00B Data,RAID10/4: 3.00GiB Metadata,RAID10/4: 192.00MiB System,RAID10/4: 8.00MiB Unallocated: 4.80GiB /dev/nvme0n3, ID: 3 Device size: 8.00GiB Device slack: 0.00B Data,RAID10/4: 3.00GiB Metadata,RAID10/4: 192.00MiB System,RAID10/4: 8.00MiB Unallocated: 4.80GiB missing, ID: 4 Device size: 0.00B Device slack: 0.00B Data,RAID10/4: 3.00GiB Metadata,RAID10/4: 192.00MiB System,RAID10/4: 8.00MiB Unallocated: 4.80GiB
Now can replace the missing device with a new disk.
# btrfs replace start 4 /dev/nvme0n5 /mnt/my-vault
Replace continues in the background. See the status monitoring chapter on how to monitor the progress.
# btrfs filesystem show /mnt/my-vault/
Label: 'vault' uuid: 7714d5de-5407-4fbe-b356-82bd086f6ded Total devices 4 FS bytes used 5.62GiB devid 1 size 8.00GiB used 4.48GiB path /dev/nvme0n1 devid 2 size 8.00GiB used 5.45GiB path /dev/nvme0n2 devid 3 size 8.00GiB used 4.45GiB path /dev/nvme0n3 devid 4 size 8.00GiB used 3.20GiB path /dev/sde1
Restoring redundancy after a replaced disk[edit | edit source]
IMPORTANT: Because btrfs can not write any data to a missing device, it writes data to single profile chunks. To restore full redundancy you should run btrfs balance
to convert chunks to the correct RAID profile.
Use btrfs filesystem usage -T
to see how chunks are allocated.
# btrfs fi usage -T /mnt/my-vault/
Overall: Device size: 32.00GiB Device allocated: 17.56GiB Device unallocated: 14.44GiB Device missing: 0.00B Used: 11.25GiB Free (estimated): 12.26GiB (min: 10.46GiB) Free (statfs, df): 15.40GiB Data ratio: 1.60 Metadata ratio: 1.33 Global reserve: 17.92MiB (used: 0.00B) Multiple profiles: yes (data, metadata, system) Data Data Metadata Metadata System System Id Path single RAID10 single RAID10 single RAID10 Unallocated -- ------------ -------- ------- --------- --------- -------- -------- ----------- 1 /dev/nvme0n1 1.00GiB 3.00GiB 256.00MiB 192.00MiB 32.00MiB 8.00MiB 3.52GiB 2 /dev/nvme0n2 2.00GiB 3.00GiB 256.00MiB 192.00MiB - 8.00MiB 2.55GiB 3 /dev/nvme0n3 1.00GiB 3.00GiB 256.00MiB 192.00MiB - 8.00MiB 3.55GiB 4 /dev/sde1 - 3.00GiB - 192.00MiB - 8.00MiB 6.80GiB -- ------------ -------- ------- --------- --------- -------- -------- ----------- Total 4.00GiB 6.00GiB 768.00MiB 384.00MiB 32.00MiB 16.00MiB 16.44GiB Used 64.00KiB 5.41GiB 16.00KiB 222.62MiB 0.00B 16.00KiB
Use the convert
and soft
keywords to convert the single chunks to the correct profile:
# btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft /mnt/raid10/
Done, had to relocate 4 out of 8 chunks
You have now restored full redundancy:
# btrfs fi usage /mnt/my-vault/ -T
Overall: Device size: 32.00GiB Device allocated: 28.00GiB Device unallocated: 4.00GiB Device missing: 0.00B Used: 11.25GiB Free (estimated): 9.45GiB (min: 9.45GiB) Free (statfs, df): 9.45GiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 17.92MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID10 RAID10 RAID10 Unallocated -- ------------ -------- --------- -------- ----------- 1 /dev/nvme0n1 6.43GiB 544.00MiB 40.00MiB 1.00GiB 2 /dev/nvme0n2 6.43GiB 544.00MiB 40.00MiB 1.00GiB 3 /dev/nvme0n3 6.43GiB 544.00MiB 40.00MiB 1.00GiB 4 /dev/sde1 6.43GiB 544.00MiB 40.00MiB 3.00GiB -- ------------ -------- --------- -------- ----------- Total 12.86GiB 1.06GiB 80.00MiB 6.00GiB Used 5.41GiB 222.64MiB 16.00KiB
Status monitoring[edit | edit source]
A disk replacement can take several hours. Luckily it is is possible to monitor the status using btrfs replace status <mount-point>
.
# btrfs replace status /mnt/my-vault
Started on 24.Jul 11:02:41, finished on 24.Jul 11:41:51, 0 write errs, 0 uncorr. read errs
You can also see status messages in the kernel log:
# dmesg -H
[Aug 7 14:52] BTRFS info (device nvme0n1): dev_replace from /dev/nvme0n1 (devid 1) to /dev/nvme0n5 started [ +14.731116] BTRFS info (device nvme0n1): dev_replace from /dev/nvme0n1 (devid 1) to /dev/nvme0n5 finished
Reference[edit | edit source]
The btrfs-replace reference manual can be found at https://btrfs.readthedocs.io/en/latest/btrfs-replace.html