Btrfs/Allocator Hints
Allocator Hints for Btrfs[edit | edit source]
Allocator hints were introduced in a series of patches for Btrfs, allowing users to configure the chunk allocator to prioritise specific devices for metadata or data allocation. The idea is to optimise mixed-device setups, such as SSDs and HDDs in a single filesystem by using faster disks for the latency sensitive metadata, while storing the bulk data on slower devices.
Various versions of the patches were submitted to the Linux Btrfs mailing list over the years. For example the preferred_metadata patches in 2020 and allocation hint in 2022 by Goffredo Baroncelli. Since then, the user kakra has collected and maintained these patches, ensuring they work with newer kernels while adding fixes and enhancements. Kakra's work is available on GitHub: Allocator Hint Patch Pull Request.
Benefits of Allocator Hints[edit | edit source]
The allocator hints patch provides several advantages:
- Performance Improvements: By allocating metadata to SSD or NVMe devices, users can greatly improve filesystem responsiveness, as metadata operations are typically small and random.
- Efficient Storage Usage: Allocating data to HDDs ensures that their large capacity is utilised effectively.
- Compatibility: Filesystems remain compatible with non-patched kernels, though allocation preferences will not be respected without the patch.
This feature is ideal for setups combining NVMe and SSDs or SSDs with HDDs, offering a way to balance performance and cost-efficiency.
How to Enable Allocator Hints[edit | edit source]
To use allocator hints, you need a patched kernel. Follow these steps:
Applying the Patch to Kernel Sources[edit | edit source]
Compiling your own kernel is not very difficult. However, each Linux distribution has their own methods for building their kernels. The steps below are just the generic steps. Please look up how it is done for your specific distribution.
- Download and unpack the kernel sources for your desired version.
- Download the patch from https://github.com/kakra/linux/pull/36.
cd
into the kernel directory.- Apply the patch:
# patch -p1 < ../btrfs_allocator_hints-6.12_v1.patch
patching file include/uapi/linux/btrfs_tree.h patching file fs/btrfs/sysfs.c patching file fs/btrfs/sysfs.c patching file fs/btrfs/volumes.c patching file fs/btrfs/volumes.h patching file fs/btrfs/volumes.c patching file fs/btrfs/volumes.h patching file fs/btrfs/volumes.c patching file include/uapi/linux/btrfs_tree.h
- Rebuild and install the kernel.
- Reboot into the patched kernel.
Once running the patched kernel, use the allocator hints as described below.
Configuring Allocator Hints[edit | edit source]
Every Btrfs filesystem has a unique UUID
and each device in the filesystem is identified by a device ID
number.
Setting a allocation hint is done by writing the allocation hint type for each device to its type file, /sys/fs/btrfs/<uuid>/devinfo/<id>/type
.
There are 6 different hints (types 0-5) that can be set.
Type | Description | Recommended Use |
---|---|---|
0 |
Prefer writing data to this device. Btrfs will prioritise allocating data chunks from this device before considering others. | Recommended for HDDs. This is the default setting. |
1 |
Prefer writing metadata to this device. Btrfs will prioritise allocating metadata chunks from this device before considering others. | Recommended for SSDs. |
2 |
Write metadata only to this device. | Not recommended; can lead to early no-space situations. |
3 |
Write data only to this device. | Not recommended; can lead to early no-space situations. |
4 |
Avoid allocating new chunks to this device. Useful if planning to remove the device from the filesystem in the future. | Use for devices you plan to decommission or remove. |
5 |
Prevent allocating new chunks to this device. Useful if you plan on removing multiple devices from the pool in parallel | Use for devices you plan to decommission or remove. |
Note: Types 0 and 1 set a preference, meaning Btrfs will prioritise these devices but can still allocate chunks to others if needed. Types 2 and 3 enforce an exclusive allocation, restricting data or metadata entirely to the specified device, which can lead to early no-space (ENOSPC) errors.
|
Identify your device IDs by running btrfs device show
:
# btrfs device show /media/backup
Label: '3t-backup' uuid: aa358efb-ce43-498c-9997-0d35ba13261f Total devices 3 FS bytes used 1.68TiB devid 1 size 2.72TiB used 1.97TiB path /dev/mapper/3t_backup devid 2 size 50.00GiB used 38.03GiB path /dev/mapper/vg_800g-3TB_meta1 devid 3 size 50.00GiB used 38.03GiB path /dev/mapper/vg_800g-3TB_meta2
Note the IDs of each device. In this example, ID 1 is a HDD and IDs 2 and 3 are SSDs.
To set data preference to the HDD and metadata preference to the SSDs, simply write 0 and 1 to the corresponding sysfs file:
echo 0 > /sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/1/type echo 1 > /sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/2/type echo 1 > /sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/3/type
Setting or changing the allication hint will only affect allocation of new chunks. Existing chunks have to be balanced to take advantage of the new hints.
Use grep
to list the current configuration for all devices:
# grep . /sys/fs/btrfs/*/devinfo/*/type
/sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/1/type:0x00000000 /sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/2/type:0x00000001 /sys/fs/btrfs/aa358efb-ce43-498c-9997-0d35ba13261f/devinfo/3/type:0x00000001 /sys/fs/btrfs/c08bb98b-3b98-4dbb-a7c0-5540c2af781b/devinfo/1/type:0x00000000 /sys/fs/btrfs/c3c00bf0-73a6-4aca-91bb-b5e32e76a08c/devinfo/1/type:0x00000000 /sys/fs/btrfs/c3c00bf0-73a6-4aca-91bb-b5e32e76a08c/devinfo/2/type:0x00000001 /sys/fs/btrfs/c3c00bf0-73a6-4aca-91bb-b5e32e76a08c/devinfo/3/type:0x00000001
Balancing After Changing Preferences[edit | edit source]
After setting the hints, run a balance to apply the changes. If you added a SSD or NVMe device with metadata preference, you need to run a balance on metadata chunks so they are moved to the new device.
btrfs balance start -musage=100 /path/to/btrfs
You can see the distribution of data and metadata using btrfs filesystem usage -T
:
# btrfs fi usage -T /media/backup/
Overall: Device size: 2.82TiB Device allocated: 2.04TiB Device unallocated: 801.63GiB Device missing: 0.00B Device slack: 0.00B Used: 1.71TiB Free (estimated): 1.09TiB (min: 718.16GiB) Free (statfs, df): 1.09TiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path single RAID1 RAID1 Unallocated Total Slack -- ----------------------------- ------- -------- --------- ----------- -------- ----- 1 /dev/mapper/3t_backup 1.97TiB - - 777.70GiB 2.72TiB - 2 /dev/mapper/vg_800g-3TB_meta1 - 38.00GiB 32.00MiB 11.97GiB 50.00GiB - 3 /dev/mapper/vg_800g-3TB_meta2 - 38.00GiB 32.00MiB 11.97GiB 50.00GiB - -- ----------------------------- ------- -------- --------- ----------- -------- ----- Total 1.97TiB 38.00GiB 32.00MiB 801.63GiB 2.82TiB 0.00B Used 1.66TiB 25.63GiB 384.00KiB
Important Considerations[edit | edit source]
One of the reasons why these patches are not included in the kernel is that the free space calculations do not work properly. It is therefore important to monitor the allocation of data and metadata using btrfs device usage
and not rely on df
.
Avoid using types 2
(metadata only) or 3
(data only) unless absolutely necessary, as they can lead to early no-space (ENOSPC) errors. Make sure that you monitor the allocation extra closely.
Example Use Case[edit | edit source]
Consider a mixed pool with one 512GB SSD and two 4TB HDDs:
- Set the SSD to prefer metadata (
echo 1
). - Set the HDDs to prefer data (
echo 0
). - Run a balance to optimise allocation.
This configuration ensures fast metadata access while maximising the storage capacity of the HDDs.
Conclusion[edit | edit source]
Allocator hints provide a powerful way to optimise performance and storage in Btrfs. By leveraging small but fast devices for metadata and larger but slower devices for data, users can achieve a balance of speed and capacity. As with any advanced feature, careful planning and monitoring are essential to avoid pitfalls like no-space errors. With proper use, allocator hints can significantly enhance the performance and flexibility of your Btrfs setup.
There are other advanced use-cases with a mix of NVMe, SSD and HDD devices, combined with bcache or dm-cache. Some are mentioned on kakra's GitHub page. There is also an interesting discussion on implementation details and requirements on https://github.com/btrfs/btrfs-todo/issues/19.
In the earlier example /dev/mapper/3t_backup
is actually a dm-cache setup of a HDD with a SSD cache. You can read more about dm-cache on Blog/dm-cache: Linux Accelerated Storage and Linux/dm-cache.